Overview of Deep Reinforcement Learning Methods

58,104
0
Published 2022-01-21
This video gives an overview of methods for deep reinforcement learning, including deep Q-learning, actor-critic methods, deep policy networks, and policy gradient optimization algorithms.

Citable link for this video: doi.org/10.52843/cassyni.kfnzpy

This is a lecture in a series on reinforcement learning, following the new Chapter 11 from the 2nd edition of our book "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz

Book Website: databookuw.com/
Book PDF: databookuw.com/databook.pdf

Amazon: www.amazon.com/Data-Driven-Science-Engineering-Lea…

Brunton Website: eigensteve.com

This video was produced at the University of Washington

All Comments (21)
  • @BoltzmannVoid
    this is literally the best series for understanding RL ever thank you so much professor for sharing this.
  • @matiascova
    at 10:05, my understanding is that the fact that we do not derivate that probability comes from a local approximation assumption. So that formula is only approximately true for changes that are not too big. This simplification is one of the most important parts of the policy gradient theorem, and informs the design of "soft" policy-gradient algorithms, in which we do not allow the policy to change too much since our update logic only works for small steps.
  • @metluplast
    Thanks professor Steve. Once I hear "welcome back" I just know it's our original professor Steve 😀👍
  • @MaximYudayev
    10:20 I think in this example the state probability density function is assumed stationary for an ergodic environment even in the case of a dynamic policy. So perhaps this assumption implies a static reward function from the given environment, which would not be the case in a dynamic environment like a medical patient whose bodily response to a drug would vary throughout their lifetime/treatment. I checked, Sutton and Barto indeed mention ergodicity of the environment as the reason for policy-independent mu in their book on p.326 and p.333.
  • @gbbhkk
    Excellent video, basically saved my day in trying to wrap my head around all the terms and algo :D The concepts have been presented with unmatched clarity and conciseness. Have been waiting for this since your last video on "Q-Learning". Thank you so much!
  • @dmochow
    This is a fantastic tutorial. Thanks for putting in the time and effort to make it so digestible
  • @mawkuri5496
    i hope you'll create a series where all of the equations in this series is being applied to pytorch and creating simple projects, that would be awesome.
  • @BlueOwnzU96
    Thank you professor. This has been great to dust off some RL concepts I had forgotten
  • @kefre10
    great series! thank you so much!
  • Steve I follow all of your lectures. Being a mechanical engineer I really got amazed by watching your turbulence lectures. I personally worked with CFD using scientific python and visualization and computation using python and published a couple of research articles. I'm very eager to work under your guidance in the field of CFD and Fluid dynamics using Machine learning specifically simulation and modelling turbulence fluid flow field and explore the mysterious world of turbulence. How should I reach you for further communication?
  • @OmerBoehm
    Thank you so much for another outstanding video
  • @tarik23boss
    Thanks for this video it was very helpful! Do you have any material on adaptive critic designs ? This is a very well cited paper and I wondering how this all plays in actor critic models
  • @joel.r.h
    Excuse me professor, I am not sure about this specific case: If we have a DRL architectute that interacts with an ad-hoc model we have built (which presents a given structure as the Markov Decision Process), but the DRL agent does not have any prior information on the mechanics of such model (it can just measure outputs and generate inputs), this would be considered model-free? Thank you for your amazing work!
  • 10:20, I think it's because we usually use PG in infinite state-action pair models. So in other words, mu(s) is untrackable. It's something like the latent space of an auto-encoder where we can't really track it to generate data.
  • @wkafa87
    @Eigensteve Amazing video lectures. I had watched several of your series. Please if possible make a series about Deep MPC, it would be of great value.
  • @ryanmckenna2047
    @10.33 Steve, maybe mu sub theta is just a vector of constants for the means associated to the asymptotic distribution of each state s to scale the sum of weighted probabilities across all actions for that state in relation to each state's asymptotic distribution?