How AI Learned to Feel | 75 Years of Reinforcement Learning
3,452
Published 2024-07-31
Thanks to Jane Street for sponsoring this video. They are hiring people interested in ML! learn more about their work and open roles (and support me), visit their website: www.janestreet.com/machine-learning/?utm_source=yt…
Along the way, we'll encounter the challenges of transferring simulated skills to the real world (domain randomization) and witness the emergence of eerily human-like behaviors in AI agents. It leaves us with a provocative question: where is the line between actions and words? What is the role of an GPT for actions?
Featuring insights from:
Claude Shannon
Arthur Samuel
Gerald Tesauro
Richard Sutton
David Silver
Deep Mind/Open AI etc.
00:00 - Introduction
00:32 - Learning Tic Tac Toe
02:00 - Learning Cart and pole
04:20 - Shannon & Chess
06:50 - Samuel's Checkers
09:25 - TD Gammon (Gerald Tesaruo)
11:00 - TD Learning
14:30 - Learning Atari (DQN)
17:28 - DIrect Policy Gradiant
19:40 - Domain Randomization
All Comments (21)
-
the way you introduce the REAL AI to the world, Nice job
-
Seems like reinforcement learning's been on a wild trip since forever, but the way Brit breaks it down? It's like he's got a secret map of the RL universe. He makes the crazy journey from old-school ideas to today's stuff actually make sense. It's like watching history unfold, but you know, without falling asleep!
-
I hope you enjoy this video, please let me know what you think below. 👇 STAY TUNED & SUBSCRIBE: Next video on Road to AGI + new topics please LIKE/SHARE in your network to help AOP grow :_aop: FULL AI series: youtube.com/playlist?list=PLbg3ZX2pWlgKV8K6bFJr5dh… Thanks Jane Street for sponsoring. They are hiring people interested in ML: www.janestreet.com/machine-learning/?utm_source=yt… SUPPORT AOP: www.patreon.com/artoftheproblem
-
Hands down the best AI history channel in the world
-
I reinforced my positive behavior in watching this video with plenty of ice cream.
-
Another great video. It's super interesting to see the DeepMind is attempting to figure out how much real world learning vs simulated learning is optimal while LLM researchers are simultaneously asking questions about the use of "synthetic data", naively (if the "synthetic data" approach proves successful at scale) it seems to vaguely point towards a further generalization in the machine learning field. I think a great follow video to this one would be about multi model models and maybe at the end discuss the idea of synthesizing this robotic action model with something like chatgpt, or maybe not just spitballing. EDIT: just read your pinned comment, seems like your already a few steps ahead of me on this, not surprised
-
The music that starts @ 25:40 provides a nice transition and nicely conveys the future potential of the technology.
-
Seeing this video at 466 views currently and shocked it doesn’t have hundreds of thousands if not millions. Awesome video
-
Omg it took so much to make machines to this level. Their patience and big brain 😮
-
I really liked the historical perspective on how RL started. It helps stair-step my way up to modern day concepts :)
-
Thank you for the credit at the end. You compressed the data well and thus, the info regarding the value function was more easily understood, in my opinion.
-
Another amazingly lucid video. Thank you! By the end it feels like were just getting started.
-
Really great video! Awesome summary of the history of RL.… Very clear. Nice job.
-
All your videos are excellent. Congratulations.
-
great video as always!
-
lucky me for this video today!
-
I love this channel so much I only wish you made videos faster but it's always such engaging content I can see why it takes a while
-
Awesome stuff!! I just love the way you explain things 🙏💕 I feel like I'm closer than ever to actually understanding AI 😅😅
-
Brilliant work!
-
You have a magical ability to explain with such eloquence and clarity that you make me feel intelligent. All that lead up to the moment (and also from your previous videos) when you explain Domain randomization 19:45 “you actually need less precise simulation” that realization felt like an explosion in my mind. Thanks for your channel man