Blog posts

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Xinwei Niu

Blog posts

2026

Online Convex Optimization and Accelerated Gradient Descent Methods for Efficient Training

Triton Notes

Non-convex optimization for Over-parameterized Neural Nets: Reproducing Kernel Hilbert Space and Neural Tangent Kernel

2025

Note on Submodular Function Optimization, Minimization and Maximization, Lazy Greedy

Efficient Methods for Generative Models 3: Sparse and Adaptive Attention, Dynamic Token Pooling

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 2: KV Cache, FlashAttention, vLLM

Introduction to Recurrent Neural Networks (RNNs)

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs