This is why Deep Learning is really weird.

289,406

9,921 0

Published 2023-12-26

In this comprehensive exploration of the field of deep learning with Professor Simon Prince who has just authored an entire text book on Deep Learning, we investigate the technical underpinnings that contribute to the field's unexpected success and confront the enduring conundrums that still perplex AI researchers.

Understanding Deep Learning - Prof. SIMON PRINCE [STAFF FAVOURITE]

Watch behind the scenes, get early access and join private Discord by supporting us on Patreon:
patreon.com/mlst
discord.gg/aNPkGUQtc5
twitter.com/MLStreetTalk

Key points discussed include the surprising efficiency of deep learning models, where high-dimensional loss functions are optimized in ways which defy traditional statistical expectations. Professor Prince provides an exposition on the choice of activation functions, architecture design considerations, and overparameterization. We scrutinize the generalization capabilities of neural networks, addressing the seeming paradox of well-performing overparameterized models. Professor Prince challenges popular misconceptions, shedding light on the manifold hypothesis and the role of data geometry in informing the training process. Professor Prince speaks about how layers within neural networks collaborate, recursively reconfiguring instance representations that contribute to both the stability of learning and the emergence of hierarchical feature representations. In addition to the primary discussion on technical elements and learning dynamics, the conversation briefly diverts to audit the implications of AI advancements with ethical concerns.

Pod version (with no music or sound effects): podcasters.spotify.com/pod/show/machinelearningstr…

Follow Prof. Prince:
twitter.com/SimonPrinceAI
www.linkedin.com/in/simon-prince-615bb9165/

Get the book now!
mitpress.mit.edu/9780262048644/understanding-deep-…
udlbook.github.io/udlbook/

Panel: Dr. Tim Scarfe -
www.linkedin.com/in/ecsquizor/
twitter.com/ecsquendor

TOC:
[00:00:00] Introduction
[00:11:03] General Book Discussion
[00:15:30] The Neural Metaphor
[00:17:56] Back to Book Discussion
[00:18:33] Emergence and the Mind
[00:29:10] Computation in Transformers
[00:31:12] Studio Interview with Prof. Simon Prince
[00:31:46] Why Deep Neural Networks Work: Spline Theory
[00:40:29] Overparameterization in Deep Learning
[00:43:42] Inductive Priors and the Manifold Hypothesis
[00:49:31] Universal Function Approximation and Deep Networks
[00:59:25] Training vs Inference: Model Bias
[01:03:43] Model Generalization Challenges
[01:11:47] Purple Segment: Unknown Topic
[01:12:45] Visualizations in Deep Learning
[01:18:03] Deep Learning Theories Overview
[01:24:29] Tricks in Neural Networks
[01:30:37] Critiques of ChatGPT
[01:42:45] Ethical Considerations in AI

References:

#61: Prof. YANN LECUN: Interpolation, Extrapolation and Linearisation (w/ Dr. Randall Balestriero)
• #61: Prof. YANN LECUN: Interpolation,...

Scaling down Deep Learning [Sam Greydanus]
arxiv.org/abs/2011.14439

"Broken Code" a book about Facebook's internal engineering and algorithmic governance [Jeff Horwitz]
www.penguinrandomhouse.com/books/712678/broken-cod…

Literature on neural tangent kernels as a lens into the training dynamics of neural networks.
en.wikipedia.org/wiki/Neural_tangent_kernel

Zhang, C. et al. "Understanding deep learning requires rethinking generalization." ICLR, 2017.
arxiv.org/abs/1611.03530

Computer Vision: Models, Learning, and Inference, by Simon J.D. Prince
www.amazon.co.uk/Computer-Vision-Models-Learning-I…

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
www.deeplearningbook.org/

Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network
arxiv.org/abs/2210.00881

Computer Vision: Algorithms and Applications, 2nd ed. [Szeliski]
szeliski.org/Book/

A Spline Theory of Deep Networks [Randall Balestriero]
proceedings.mlr.press/v80/balestriero18b/balestrie…

DEEP NEURAL NETWORKS AS GAUSSIAN PROCESSES [Jaehoon Lee]
arxiv.org/abs/1711.00165

Do Transformer Modifications Transfer Across Implementations and Applications [Narang]
arxiv.org/abs/2102.11972

ConvNets Match Vision Transformers at Scale [Smith]
arxiv.org/abs/2310.16764

Dr Travis LaCroix (Wrote Ethics chapter with Simon)
travislacroix.github.io/

All Comments (21)

@MachineLearningStreetTalk 3 months ago

What did you like about this video? What can we improve?!
@samgreydanus6148 2 months ago

I'm the author of the MNIST-1D dataset (discussed at 1h15). Thanks for the positive words! You do an excellent job of explaining what the dataset is and why it's useful. Running exercises in Colab while working through the textbook is an amazing feature.
@Friemelkubus 3 months ago

Currently going through it and it's one of the best textbooks I've read. Period. Not just DL books, books. I love it.
@oncedidactic 4 months ago

Brilliant, clear, direct conversation. Thank you!!
@chazzman4553 3 months ago

The best channel I've seen for AI. Cutting edge, no amateur overhyped BS. Down to earth.
@aprohith1 3 months ago

What a gripping conversation.. Thank you. !
@amesoeurs 4 months ago

i've read most of this book already and it's fantastic. it feels like a spiritual sequel to goodfellow's original DL book.
@dariopotenza3962 2 months ago

Simon taught the first semester of my second year "Machine Learning" module at university! really nice man, we used this book as the module notes. He was very missed when he left in second semester and the rest of the module was never able to live up to his teaching.
@makhalid1999 4 months ago

Love the studio, would love to see more face-to-face podcasts here
@chodnejabko3553 14 days ago

The overparametrization conundrum may be related to the fact we look at what NN are in a wrong way. To me NN is not a "processor" type of object, it's a novel type of memory object, memory which stores and retrieves data by overlaying them on top of one another & also recording the hidden relations that exist in the data set. This is what gets stored in the "in between" places even if the input resolution is low - the logic of coexistence of different images (arrays), which is something not visible on the surface. I'm a philologist by training, and in XX cent. literature there was this big buzz around the concept of "palimpsest". Originally palimpsests were texts written on reused parchment from which previous text were scraped off with a razor. Despite scraping, the old text still remained under the new one, which led to having two texts in the same space on the page. In literature this became a conceptual fashion of merging two different narratives into one, with usually very surreal effect. One of the authors that comes to mind is William S.Burroughs. In the same way merged narratives evoke a novel situation due to novel logical interactions between the inputs, the empty space in an overparametrized NN gets filled with the logic of the world from which the input data comes & this logic exists between them even when the resolution is low. Maybe NN is a Platonic space. Many images of trees somehow hold in them the "logic of the tree", which is something deeper and non obvious to eyes, since in their form alone converge both the principles of molecular biology & the atmospheric fluid dynamics, ecosystemic interactions, up to the very astronomical effects of sun, moon, earth rotation, etc. All of it contributes to this form in one way or another, so the form reflects those contributions & therefore holds partial logic of those interactions within it. Information is a relation between the object and it's context (in linguistics we say - it's dictionary). A dataset not only introduces objects, it also as a whole becomes a context (dictionary) through which each object is read. In that sense maybe upscaling input data sets prior to learning is detrimental to the "truth" of those relations. I would be inclined to assume we'd be better off if we let the NN fill in those spaces based on the logic of the dataset, unless we want the logic of transformations to somehow influence the output data (say - we specifically are designing upscaling engine).
@rajdeepbosemondal7648 2 months ago

Cool thoughts on digging into the nitty-gritty of deep learning frameworks. The connection between language models and our brains, especially in Transformers, really makes you think. Checking out how things stay consistent inside and finding ways to boost brainpower raises some interesting questions. Looking forward to diving deeper into these fancy concepts!
@beagle989 4 months ago

great conversation, appreciate the skepticism
@stevengill1736 4 months ago

Sounds like there are as many questions as answers at this point - looks like a great book with plenty of instructive graphics - look forward to reading it....cheers & happy Gnu year!
@timhaldane7588 2 months ago

I really appreciate being used as an example near the end of the discussion. Version 2.0 is coming along slowly, but I am confident I'll get there.
@dorian.jimenez 4 months ago

Thanks Tim, great video I learned a lot.
@mattsigl1426 3 months ago

It’s interesting that in Integrated Information Theory consciousness literally is a super-high dimensional polytope (with every dimension corresponding to a whole system state in an integrated network) in an abstract space called Qualia space.
@AliMoeeny 3 months ago

Tim, this is incredibly insightful. Thank you
@jasonabc 4 months ago

Best source and community for ml on the internet by far. Love the work you guys do mlst
@truehighs7845 3 months ago

The way I don't understand deep learning is that it is a statistical power modulated by randomness to emulate reasoned speech, but it is really the top 3-5-7 reasonable speeches selected randomly, at every word. So in theory whatever the AI says, it should not be able to say it twice unless you tweak its parameters and taken away the randomness (temp) it will always repeat the same thing. It's good at emulating speech that gives the resemblance of an intelligent articulation, but it is indeed the syntax and vocabulary (data) placed in a statistical congruent manner that gives that illusion. It's like a super sales guy, it will talk very well, but there would be no substance in his apparent passion.
@DailyFrankPeter 2 months ago

The sombre Chopin tones in the background emphasize how deep the learning truly is but leave me with little hope of ever fully understanding it... :D