Transformer

Transformer: Multi-Head Attention

05 Jun 2025 5 min read Deep Learning

Today, we’re going to dive deeper into the Transformer. However, before discussing its architecture, there's one important concept we need to cover: Multi-Head Attention. If you're not familiar

Transformer: Self-Attention

04 Jun 2025 4 min read Transformer

Recently, one of my friends used LSTM with PPO to train a robot in a simulation aimed at solving a collection task. With a basic understanding of RNNs and LSTMs—an optimized form