How AI Learned to Feel | 75 Years of Reinforcement Learning

Published 2024-07-31
I follow the history of RL (model free), from learning tic tac toe, checkers, backgammon, as well as physical problems (cart and pole), walking, grasping (OpenAI's dexterous robotic hand)...I explain value functions, q functions, policy functions and how they work together. Including how TD learning was used..

Thanks to Jane Street for sponsoring this video. They are hiring people interested in ML! learn more about their work and open roles (and support me), visit their website: www.janestreet.com/machine-learning/?utm_source=yt…

Along the way, we'll encounter the challenges of transferring simulated skills to the real world (domain randomization) and witness the emergence of eerily human-like behaviors in AI agents. It leaves us with a provocative question: where is the line between actions and words? What is the role of an GPT for actions?
Featuring insights from:
Claude Shannon
Arthur Samuel
Gerald Tesauro
Richard Sutton
David Silver
Deep Mind/Open AI etc.

00:00 - Introduction
00:32 - Learning Tic Tac Toe
02:00 - Learning Cart and pole
04:20 - Shannon & Chess
06:50 - Samuel's Checkers
09:25 - TD Gammon (Gerald Tesaruo)
11:00 - TD Learning
14:30 - Learning Atari (DQN)
17:28 - DIrect Policy Gradiant
19:40 - Domain Randomization

All Comments (21)
  • @ncolmt
    the way you introduce the REAL AI to the world, Nice job
  • @belibem
    Seems like reinforcement learning's been on a wild trip since forever, but the way Brit breaks it down? It's like he's got a secret map of the RL universe. He makes the crazy journey from old-school ideas to today's stuff actually make sense. It's like watching history unfold, but you know, without falling asleep!
  • I reinforced my positive behavior in watching this video with plenty of ice cream.
  • Another great video. It's super interesting to see the DeepMind is attempting to figure out how much real world learning vs simulated learning is optimal while LLM researchers are simultaneously asking questions about the use of "synthetic data", naively (if the "synthetic data" approach proves successful at scale) it seems to vaguely point towards a further generalization in the machine learning field. I think a great follow video to this one would be about multi model models and maybe at the end discuss the idea of synthesizing this robotic action model with something like chatgpt, or maybe not just spitballing. EDIT: just read your pinned comment, seems like your already a few steps ahead of me on this, not surprised
  • @timl2k11
    The music that starts @ 25:40 provides a nice transition and nicely conveys the future potential of the technology.
  • @TheLoneCone
    Seeing this video at 466 views currently and shocked it doesn’t have hundreds of thousands if not millions. Awesome video
  • @Ayel-wl4ix
    Omg it took so much to make machines to this level. Their patience and big brain 😮
  • I really liked the historical perspective on how RL started. It helps stair-step my way up to modern day concepts :)
  • @AdamJeffries-r4f
    Thank you for the credit at the end. You compressed the data well and thus, the info regarding the value function was more easily understood, in my opinion.
  • Another amazingly lucid video. Thank you! By the end it feels like were just getting started.
  • @jimlbeaver
    Really great video! Awesome summary of the history of RL.… Very clear. Nice job.
  • @Dr.Menendez
    All your videos are excellent. Congratulations.
  • @77batering
    I love this channel so much I only wish you made videos faster but it's always such engaging content I can see why it takes a while
  • Awesome stuff!! I just love the way you explain things 🙏💕 I feel like I'm closer than ever to actually understanding AI 😅😅
  • @kingdodongo4126
    You have a magical ability to explain with such eloquence and clarity that you make me feel intelligent. All that lead up to the moment (and also from your previous videos) when you explain Domain randomization 19:45 “you actually need less precise simulation” that realization felt like an explosion in my mind. Thanks for your channel man