Deep Learning

Proximal Policy Optimization (PPO)

In previous sections on Policy Gradient and REINFORCE, we introduced the core concepts of reinforcement learning (RL) algorithms that rely on gradients applied directly to the policy. These are known as policy-based approaches

Deep Q Network (DQN)

DQN extends Q-Learning simply by replacing Q-table with a neural network, enabling it to handle high-dimensional and continuous state spaces, such as images in video games [Playing Atari with Deep Reinforcement Learning]. In

Transformer: Multi-Head Attention

Today, we’re going to dive deeper into the Transformer. However, before discussing its architecture, there's one important concept we need to cover: Multi-Head Attention. If you're not familiar

Transformer: Self-Attention

Recently, one of my friends used LSTM with PPO to train a robot in a simulation aimed at solving a collection task. With a basic understanding of RNNs and LSTMs—an optimized form