Posts by Tags

Algorithms

Convex Optimization

Efficient Architecture

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs

11 minute read

Published:

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Empirical Risk Minimization

Fused Kernel

Triton Notes

less than 1 minute read

Published:

TBC

GPU programming

Triton Notes

less than 1 minute read

Published:

TBC

Greedy Algorithm

Kernel Methods

Linear Attention

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs

11 minute read

Published:

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Long Context

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs

11 minute read

Published:

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Machine Learning

Efficient Methods for Generative Models 1: Linear Attention, State-Space Models, and Linear RNNs

11 minute read

Published:

Modern sequence modeling has evolved from recurrent architectures to attention-based models and, more recently, state-space approaches. Traditional RNNs introduced an efficient way to process sequential data but struggled with long-term dependencies. Transformers later revolutionized the field with attention mechanisms, though their quadratic cost limits scalability to long contexts. This has driven research into more efficient alternatives—such as linear attention, state-space models like S4 and Mamba, and newer architectures like DeltaNet, that aim to combine scalability, stability, and strong modeling capacity for long-range sequence tasks.

Nesterov Accelerated Gradient

Neural Tangent Kernel

Online Optimization

Reproducing Hilbert Space

Submodular