Transformer

Transformer: Multi-Head Attention

Today, we’re going to dive deeper into the Transformer. However, before discussing its architecture, there's one important concept we need to cover: Multi-Head Attention. If you're not familiar

Transformer: Self-Attention

Recently, one of my friends used LSTM with PPO to train a robot in a simulation aimed at solving a collection task. With a basic understanding of RNNs and LSTMs—an optimized form