Overview of Deep Reinforcement Learning Methods

58,104

1,400 0

Published 2022-01-21

This video gives an overview of methods for deep reinforcement learning, including deep Q-learning, actor-critic methods, deep policy networks, and policy gradient optimization algorithms.

Citable link for this video: doi.org/10.52843/cassyni.kfnzpy

This is a lecture in a series on reinforcement learning, following the new Chapter 11 from the 2nd edition of our book "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz

Book Website: databookuw.com/
Book PDF: databookuw.com/databook.pdf

Amazon: www.amazon.com/Data-Driven-Science-Engineering-Lea…

Brunton Website: eigensteve.com

This video was produced at the University of Washington

All Comments (21)

@BoltzmannVoid 2 years ago

this is literally the best series for understanding RL ever thank you so much professor for sharing this.
@matiascova 2 years ago

at 10:05, my understanding is that the fact that we do not derivate that probability comes from a local approximation assumption. So that formula is only approximately true for changes that are not too big. This simplification is one of the most important parts of the policy gradient theorem, and informs the design of "soft" policy-gradient algorithms, in which we do not allow the policy to change too much since our update logic only works for small steps.
@metluplast 2 years ago

Thanks professor Steve. Once I hear "welcome back" I just know it's our original professor Steve 😀👍
@MaximYudayev 2 years ago

10:20 I think in this example the state probability density function is assumed stationary for an ergodic environment even in the case of a dynamic policy. So perhaps this assumption implies a static reward function from the given environment, which would not be the case in a dynamic environment like a medical patient whose bodily response to a drug would vary throughout their lifetime/treatment. I checked, Sutton and Barto indeed mention ergodicity of the environment as the reason for policy-independent mu in their book on p.326 and p.333.
@gbbhkk 2 years ago

Excellent video, basically saved my day in trying to wrap my head around all the terms and algo :D The concepts have been presented with unmatched clarity and conciseness. Have been waiting for this since your last video on "Q-Learning". Thank you so much!
@dmochow 1 year ago

This is a fantastic tutorial. Thanks for putting in the time and effort to make it so digestible
@mawkuri5496 2 years ago

i hope you'll create a series where all of the equations in this series is being applied to pytorch and creating simple projects, that would be awesome.
@BlueOwnzU96 1 year ago

Thank you professor. This has been great to dust off some RL concepts I had forgotten
@Rodrigoviverosa 2 years ago

Thanks for the video ! can't wait for that deep MPC video.
@kefre10 2 years ago

great series! thank you so much!
@ramanujanbose6785 2 years ago

Steve I follow all of your lectures. Being a mechanical engineer I really got amazed by watching your turbulence lectures. I personally worked with CFD using scientific python and visualization and computation using python and published a couple of research articles. I'm very eager to work under your guidance in the field of CFD and Fluid dynamics using Machine learning specifically simulation and modelling turbulence fluid flow field and explore the mysterious world of turbulence. How should I reach you for further communication?
@chymoney1 2 years ago

really great stuff, Steve
@OmerBoehm 2 years ago

Thank you so much for another outstanding video
@tarik23boss 2 years ago

Thanks for this video it was very helpful! Do you have any material on adaptive critic designs ? This is a very well cited paper and I wondering how this all plays in actor critic models
@joel.r.h 2 years ago

Excuse me professor, I am not sure about this specific case: If we have a DRL architectute that interacts with an ad-hoc model we have built (which presents a given structure as the Markov Decision Process), but the DRL agent does not have any prior information on the mechanics of such model (it can just measure outputs and generate inputs), this would be considered model-free? Thank you for your amazing work!
@sarvagyagupta1744 2 years ago

10:20, I think it's because we usually use PG in infinite state-action pair models. So in other words, mu(s) is untrackable. It's something like the latent space of an auto-encoder where we can't really track it to generate data.
@FRANKONATOR123 2 years ago

6:22 But Professor, you know we love math derivations!
@budmanso 1 year ago

Thanks for the video!
@wkafa87 3 months ago

@Eigensteve Amazing video lectures. I had watched several of your series. Please if possible make a series about Deep MPC, it would be of great value.
@ryanmckenna2047 5 months ago

@10.33 Steve, maybe mu sub theta is just a vector of constants for the means associated to the asymptotic distribution of each state s to scale the sum of weighted probabilities across all actions for that state in relation to each state's asymptotic distribution?