Posts by Tags

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Non-convex optimization for Over-parameterized Neural Nets: Reproducing Kernel Hilbert Space and Neural Tangent Kernel

3 minute read

Published: February 01, 2026

This blog is based on Real Analysis by Elias M. Stein and Rami Shakarchi, and Learning Theory on First Principles by Francis Bach.

Triton Notes

less than 1 minute read

Published: February 04, 2026

TBC

Triton Notes

less than 1 minute read

Published: February 04, 2026

TBC

Note on Submodular Function Optimization, Minimization and Maximization, Lazy Greedy

less than 1 minute read

Published: November 20, 2025

This blog is based on week 10 of PKU Algorithms for Big Data Analysis.

Non-convex optimization for Over-parameterized Neural Nets: Reproducing Kernel Hilbert Space and Neural Tangent Kernel

3 minute read

Published: February 01, 2026

This blog is based on Real Analysis by Elias M. Stein and Rami Shakarchi, and Learning Theory on First Principles by Francis Bach.

Note on Submodular Function Optimization, Minimization and Maximization, Lazy Greedy

less than 1 minute read

Published: November 20, 2025

This blog is based on week 10 of PKU Algorithms for Big Data Analysis.

Efficient Methods for Generative Models 3: Sparse and Adaptive Attention, Dynamic Token Pooling

2 minute read

Published: November 20, 2025

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 2: KV Cache, FlashAttention, vLLM

2 minute read

Published: November 20, 2025

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs

11 minute read

Published: November 20, 2025

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Efficient Methods for Generative Models 3: Sparse and Adaptive Attention, Dynamic Token Pooling

2 minute read

Published: November 20, 2025

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 2: KV Cache, FlashAttention, vLLM

2 minute read

Published: November 20, 2025

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs

11 minute read

Published: November 20, 2025

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Efficient Methods for Generative Models 3: Sparse and Adaptive Attention, Dynamic Token Pooling

2 minute read

Published: November 20, 2025

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 2: KV Cache, FlashAttention, vLLM

2 minute read

Published: November 20, 2025

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs

11 minute read

Published: November 20, 2025

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Online Convex Optimization and Accelerated Gradient Descent Methods for Efficient Training

less than 1 minute read

Published: February 04, 2026

TBC

Non-convex optimization for Over-parameterized Neural Nets: Reproducing Kernel Hilbert Space and Neural Tangent Kernel

3 minute read

Published: February 01, 2026

This blog is based on Real Analysis by Elias M. Stein and Rami Shakarchi, and Learning Theory on First Principles by Francis Bach.

Online Convex Optimization and Accelerated Gradient Descent Methods for Efficient Training

less than 1 minute read

Published: February 04, 2026

TBC

Non-convex optimization for Over-parameterized Neural Nets: Reproducing Kernel Hilbert Space and Neural Tangent Kernel

3 minute read

Published: February 01, 2026

This blog is based on Real Analysis by Elias M. Stein and Rami Shakarchi, and Learning Theory on First Principles by Francis Bach.

Note on Submodular Function Optimization, Minimization and Maximization, Lazy Greedy

less than 1 minute read

Published: November 20, 2025

This blog is based on week 10 of PKU Algorithms for Big Data Analysis.

Xinwei Niu

Posts by Tags

Algorithms

Convex Optimization

Efficient Architecture

Introduction to Recurrent Neural Networks (RNNs)

Introduction to Recurrent Neural Networks (RNNs)

Empirical Risk Minimization

Fused Kernel

GPU programming

Greedy Algorithm

Kernel Methods

Linear Attention

Introduction to Recurrent Neural Networks (RNNs)

Introduction to Recurrent Neural Networks (RNNs)

Long Context

Introduction to Recurrent Neural Networks (RNNs)

Introduction to Recurrent Neural Networks (RNNs)

Machine Learning

Introduction to Recurrent Neural Networks (RNNs)

Introduction to Recurrent Neural Networks (RNNs)

Nesterov Accelerated Gradient

Neural Tangent Kernel

Online Optimization

Reproducing Hilbert Space

Submodular