Policy Gradient Theorem Explained - Reinforcement Learning

59,490
0
Published 2020-11-22
In this video, I explain the policy gradient theorem used in reinforcement learning (RL). Instead of showing the typical mathematical derivation of the proof, I explain the resulting formula by walking through an example of playing a game and figuring out how we can estimate the policy gradient of the expected return by sampling episodes from the environment. I also show some graph visualizations that give an intuition for how the partial derivatives with respect to the action probabilities are backpropagated to get the correct policy gradient within the limited action space (where all probabilities have to sum to 1). I also explain how we can use the log probabilities instead of the direct probabilities (the log-derivative trick) for improved computational efficiency. I also walk through some pseudocode (Python / PyTorch inspired) of the derived policy gradient algorithm, which is a variant of the REINFORCE algorithm. And I show how we can reduce the variance by normalizing the future returns and dividing by the number of steps instead of the number of episodes.

Policy gradient methods are used in many of the current state-of-the-art reinforcement learning algorithms, and I think it is likely that policy gradient methods will play be an important role in advancing the field of RL. I'm excited to continue exploring this field and sharing what I learn along the way.

Join our Discord community:
πŸ’¬ discord.gg/cdQhRgw

Connect with me:
🐦 Twitter - twitter.com/elliotwaite
πŸ“· Instagram - www.instagram.com/elliotwaite
πŸ‘± Facebook - www.facebook.com/elliotwaite
πŸ’Ό LinkedIn - www.linkedin.com/in/elliotwaite

🎡 Kazukii - Return
β†’ soundcloud.com/ohthatkazuki
β†’ open.spotify.com/artist/5d07MpiIaNmmEMTq79KAga
β†’ youtube.com/user/OfficialKazuki

All Comments (21)
  • @electric_sand
    I wasn't expecting to see such a nice, graphic explanation. It's timely, thank you.
  • @haedike
    This is the most intuitive derivation of Policy Gradient Theorem on the internet. Thank you for being my teacher! Every RL intro class should begin with this video.
  • This is a unique and awesome channel. Seeing these cool animations really helps to build an intuition. Thank you for uploading.
  • Really great video! I enjoyed the graphical representation of the gradient. Can't wait to see more in-depth policy gradient related videos.
  • One of the best! I'll have to watch the video a few times, but it's already helped me figure things out more intuitively. The example and animations are excellent. Great work.
  • @hugeride
    This is the only video you need to learn the intuition and basics of Reinforcement Learning. Amazingly done! Thanks!
  • @franky0226
    Hey Elliot, Loved this explanation so much, man! Keep up the awesome work. I'm a beginner and I feel RL is often looked upon as a difficult paradigm due to the heavy math in there, but people like you are a blessing, for putting the ideas in such an outstanding fashion :)
  • This was a very helpful video for me, thank you Elliot! The toy problem with the robot was so helpful to visualize everything, especially showing the the state-action sequences with the probabilities
  • @matheusmslima
    This is awesome! Thank you for this detailed explanation!
  • @jasperangl8933
    Great Video! Using graphics and combining the logical explanation with pseudocode was really helpful to me. Most of the time you only see one or the other.
  • @cgraider
    Love it , never been this much clear as a visual person thank you, and we need alot alot more, Keep up, subscribe 100%
  • Best video on the subject. Thank you, Elliot for creating the such a thorough content.
  • One of the best Videos I have ever watched for RL AI! Great work and thanks a lot.
  • @Behroozifyable
    This is simply the best video I have seen on this topic this year!
  • @teleprint-me
    This is pure gold! Thank you so much for time, hard work, and energy you put into this video. It's highly appreciated.
  • @zV1pi
    A perfect video, I have never seen anything so good πŸ‘πŸ‘ Thanks from Brazil 😁
  • Thank you! It is extremely helpful to see a detailed worked example. Your video is very much appreciated.
  • @neeeajkumar2477
    superb bro, this is something which is missing is most of the other videos...This really help building the intuition for beginners. Keep building such videaos , thatnks a ton