Markov Decision Process

Value/Policy-Based Control

Before diving into corporate reinforcement learning with deep neural networks, it's important to understand two fundamental concepts: the difference between policy-based and value-based approaches, and the distinction between on-policy and off-policy

Temporal Difference

We talked about Monte Carlo in RL its usage. However, the MC method is not well-suited for online learning, especially in tasks such as autonomous driving, where decisions need to be made continuously

Monte Carlo

In previous posts, we discussed the Bellman Equation in the context of Bellman Equation - Policy Iteration and Bellman Equation - Value Iteration, both of which assume access to background knowledge about the

Bellman Equation - Value Iteration

One commonly used algorithm for solving MDP problems, alongside the Bellman Equation and Policy Iteration, is Value Iteration. It is similar to Bellman Equation - Policy Iteration but differs in its internal process.