Policy Gradient Theorem Explained - Reinforcement Learning

59,490

2,254 0

Published 2020-11-22

In this video, I explain the policy gradient theorem used in reinforcement learning (RL). Instead of showing the typical mathematical derivation of the proof, I explain the resulting formula by walking through an example of playing a game and figuring out how we can estimate the policy gradient of the expected return by sampling episodes from the environment. I also show some graph visualizations that give an intuition for how the partial derivatives with respect to the action probabilities are backpropagated to get the correct policy gradient within the limited action space (where all probabilities have to sum to 1). I also explain how we can use the log probabilities instead of the direct probabilities (the log-derivative trick) for improved computational efficiency. I also walk through some pseudocode (Python / PyTorch inspired) of the derived policy gradient algorithm, which is a variant of the REINFORCE algorithm. And I show how we can reduce the variance by normalizing the future returns and dividing by the number of steps instead of the number of episodes.

Policy gradient methods are used in many of the current state-of-the-art reinforcement learning algorithms, and I think it is likely that policy gradient methods will play be an important role in advancing the field of RL. I'm excited to continue exploring this field and sharing what I learn along the way.

Join our Discord community:
💬 discord.gg/cdQhRgw

Connect with me:
🐦 Twitter - twitter.com/elliotwaite
📷 Instagram - www.instagram.com/elliotwaite
👱 Facebook - www.facebook.com/elliotwaite
💼 LinkedIn - www.linkedin.com/in/elliotwaite

🎵 Kazukii - Return
→ soundcloud.com/ohthatkazuki
→ open.spotify.com/artist/5d07MpiIaNmmEMTq79KAga
→ youtube.com/user/OfficialKazuki

All Comments (21)

@electric_sand 3 years ago

I wasn't expecting to see such a nice, graphic explanation. It's timely, thank you.
@jollage 1 year ago

Thank you, John Cena.
@haedike 9 months ago

This is the most intuitive derivation of Policy Gradient Theorem on the internet. Thank you for being my teacher! Every RL intro class should begin with this video.
@saminbinkarim6962 3 years ago

This is a unique and awesome channel. Seeing these cool animations really helps to build an intuition. Thank you for uploading.
@marseltukhvatshin4880 3 years ago

it's so entertaining and intriguing, thanks for your work, Elliot!
@mohsenasgari1407 3 years ago

Really great video! I enjoyed the graphical representation of the gradient. Can't wait to see more in-depth policy gradient related videos.
@douglasfinnigan1602 3 years ago

One of the best! I'll have to watch the video a few times, but it's already helped me figure things out more intuitively. The example and animations are excellent. Great work.
@hugeride 2 years ago

This is the only video you need to learn the intuition and basics of Reinforcement Learning. Amazingly done! Thanks!
@franky0226 3 years ago

Hey Elliot, Loved this explanation so much, man! Keep up the awesome work. I'm a beginner and I feel RL is often looked upon as a difficult paradigm due to the heavy math in there, but people like you are a blessing, for putting the ideas in such an outstanding fashion :)
@nutternumberone 1 year ago

This was a very helpful video for me, thank you Elliot! The toy problem with the robot was so helpful to visualize everything, especially showing the the state-action sequences with the probabilities
@matheusmslima 2 years ago

This is awesome! Thank you for this detailed explanation!
@jasperangl8933 3 years ago

Great Video! Using graphics and combining the logical explanation with pseudocode was really helpful to me. Most of the time you only see one or the other.
@cgraider 2 years ago

Love it , never been this much clear as a visual person thank you, and we need alot alot more, Keep up, subscribe 100%
@kanaipathak4426 1 year ago

Best video on the subject. Thank you, Elliot for creating the such a thorough content.
@feifeizhang7757 1 year ago

One of the best Videos I have ever watched for RL AI! Great work and thanks a lot.
@Behroozifyable 2 years ago

This is simply the best video I have seen on this topic this year!
@teleprint-me 9 months ago

This is pure gold! Thank you so much for time, hard work, and energy you put into this video. It's highly appreciated.
@zV1pi 3 years ago

A perfect video, I have never seen anything so good 👏👏 Thanks from Brazil 😁
@redcloudysky7127 1 year ago

Thank you! It is extremely helpful to see a detailed worked example. Your video is very much appreciated.
@neeeajkumar2477 3 years ago

superb bro, this is something which is missing is most of the other videos...This really help building the intuition for beginners. Keep building such videaos , thatnks a ton