Reinforcement Learning: on-policy vs off-policy algorithms

7,496

279 0

Published 2023-11-13

Let's talk about on-policy vs off-policy algorithms in reinforcement learning

ABOUT ME
⭕ Subscribe: youtube.com/c/CodeEmporium?sub_confirmation=1
📚 Medium Blog: medium.com/@dataemporium
💻 Github: github.com/ajhalthor
👔 LinkedIn: www.linkedin.com/in/ajay-halthor-477974bb/

RESOURCES
[1] Reinforcement Learning book: incompleteideas.net/book/RLbook2020.pdf
[2] Paradigms of ML: idapgroup.com/blog/types-of-machine-learning-out-t…

PLAYLISTS FROM MY CHANNEL
⭕ Reinforcement Learning:    • Reinforcement Learning 101
Natural Language Processing:    • Natural Language Processing 101
⭕ Transformers from Scratch:    • Natural Language Processing 101
⭕ ChatGPT Playlist:    • ChatGPT
⭕ Convolutional Neural Networks:    • Convolution Neural Networks
⭕ The Math You Should Know :    • The Math You Should Know
⭕ Probability Theory for Machine Learning:    • Probability Theory for Machine Learning
⭕ Coding Machine Learning:    • Code Machine Learning

MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.net/MathML
📕 Calculus: imp.i384100.net/Calculus
📕 Statistics for Data Science: imp.i384100.net/AdvancedStatistics
📕 Bayesian Statistics: imp.i384100.net/BayesianStatistics
📕 Linear Algebra: imp.i384100.net/LinearAlgebra
📕 Probability: imp.i384100.net/Probability

OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
📕 Python for Everybody: imp.i384100.net/python
📕 MLOps Course: imp.i384100.net/MLOps
📕 Natural Language Processing (NLP): imp.i384100.net/NLP
📕 Machine Learning in Production: imp.i384100.net/MLProduction
📕 Data Science Specialization: imp.i384100.net/DataScience
📕 Tensorflow: imp.i384100.net/Tensorflow

All Comments (15)

@MrFalk358 8 months ago

Ok i will indulge your quiz time questions since your videos are really great! Question 1: A is correct. it would not learn at all, since the target policy is the policy which we are trying to learn. Setting it fixed would imply it not changing, which would imply it staying random, therefore we are not learning Question 2: Im not completely sure but i would say B is correct, since SARSA uses its target policy both to choose action and to "look" (by taking the action according to the target policy) at its follow up state Hope more people comment so the algorithm boosts your channel!
@mumbo2526 7 months ago

Amazing Video, thank you!
@moaaathkhalil 7 months ago

Well explained!
@zhezhe3351 3 months ago

Good video！there is a small typo at the summary page about on-policy
@marcdelabarreraibardalet4754 1 month ago

Nice video, well explained. Question, why would I use one or the other? Are there advantages or disadvantages?
@alonsovalderramahickmann940 7 months ago

Very nice video man
@aamirbadershah887 8 months ago

Great video. Would like to point out a mistake at 13:59 where you talk about ON policy but the heading says "Off Policy". I think that needs correction. Also would love to see content on multi-agent reinforcement learning and Decision Transformers.
@kiranbade9481 3 months ago

well explained brother
@aitorgonzalezgonzalez9395 2 months ago

I think i found an error in the summary, you wrote twice "Off Policy RL Algorithms". Apart from that, thanks so much for the video, it helped me a lot.
@hugeturnip3520 4 months ago

Thank you so much dude
@muralidhar40 14 days ago

QT-1: "Target policies" are supposed to learn from experimental actions undertaken by "Behavior policies" to set their Q values right. If the "Target policy" were set to be "random" instead of "greedy learning", then there is no learning at all. Hence the answer should be first option - The agent does not learn at all.
@broccoli322 8 months ago

Thanks for the video! ☺
@Enerdzizer 5 days ago

Do we really update Q value function at the exploration step in Sarsa method? Seems that we have to skip this update since we make random step while exploring
@user-xv9qk3iz7b 5 months ago

:face-red-heart-shape: