Yann LeCun, Chief AI Scientist at Meta AI: From Machine Learning to Autonomous Intelligence

Published 2023-05-25
One of the 'Godfathers of AI' and Chief AI Scientist at Meta AI (FAIR), Yann LeCun joined us for a special live talk and fireside chat with our Executive Director, Usama Fayyad at Northeastern University's gorgeous ISEC Auditorium. Watch Yann's complete talk, "From Machine Learning to Autonomous Intelligence" and the fireside chat, where the two AI pioneers cut through the #chatGPT and #generativeAI chaos - and answered audience questions.

#AIevent #AIexpert #AIevent #YannLecun #AI #MetaAI #artificialintelligence #machinelearning #ML #LLM #AIBoston #Boston

All Comments (21)
  • @mbrochh82
    here's a ChatGPT summary: - Welcome to the last distinguished lecture series for the Institute of Experimental AI for the academic year - Introducing Yann LeCun, VP and Chief AI Scientist at META, Silver Professor at NYU, and recipient of the 2018 ACM Turing Award - Overview of current AI systems: specialized and brittle, don't reason and plan, learn new tasks quickly, understand how the world works, but don't have common sense - Self-supervised learning: train system to model its input, chop off last few layers of neural net, use internal representation as input to downstream task - Generative AI systems: autoregressive prediction, trained on 1-2 trillion tokens, produce amazing performance, but make factual errors, logical errors, and inconsistencies - LLMs are not good for reasoning, planning, or arithmetics, and are easily fooled into thinking they are intelligent - Autoregressive LLMs have a short shelf life and will be replaced by better systems in the next 5 years. - Humans and animals learn quickly because they accumulate an enormous amount of background knowledge about how the world works by observation. - AI research needs to focus on learning representations of the world, predictive models of the world, and self-supervised learning. - AI systems need to be able to perceive, reason, predict, and plan complex action sequences. - Hierarchical planning is needed to plan complex actions, as the representations at every level are not known in advance. - Predetermined vision systems are unable to learn hierarchical representations for action plans. - AI systems are difficult to control and can be toxic, but a system designed to minimize a set of objectives will guarantee safety. - To predict videos, a joint embedding architecture is needed, which replaces the generative model. - Energy based models are used to capture the dependency between two sets of variables, and two classes of methods are used to train them: contrastive and regularized. - Regularized methods attempt to maximize the information content of the representations and minimize the prediction error. - LLMs are a new method for learning features for images without having to do data augmentation. - It works by running an image through two encoders, one with the full image and one with a partially masked image. - A predictor is then trained to predict the full feature representation of the full image from the representation obtained from the partial image. - LLMs are used to build world models, which can predict what will happen next in the world given an observation about the state of the world. - Self-supervised learning is the key to this, and uncertainty can be done with an energy-based model method. - LLMs cannot currently say "I don't know the answer to this question" as opposed to attempting to guess the right answer. - Data curation and human intervention through relevance feedback are critical aspects of LLMs that are not talked about often. - The trend is heading towards bigger is better, but in the last few months, smaller systems have been performing as well as larger ones. - The model proposed is an architecture where the task is specified by the objective function, which may include a representation of the prompt. - The inference procedure that produces the output is separated from the world model and the task itself. - Smaller networks can be used for the same performance. - AI and ML community should pivot to open source models to create a vibrant ecosystem. - Biggest gaps in education for AI graduates are in mathematics and physics. - Open source models should be used to prevent control of knowledge and data by companies. - LLMs are doomed and understanding them is likely to be hopeless. - Self-supervised learning is still supervised learning, but with particular architectures. - Reinforcement learning is needed in certain situations. - Yann discussed the idea of amortized inference, which is the idea of training a system to approximate the solution to an optimization problem from the specification of the problem. - Yann believes that most good ideas still come from academia, and that universities should focus on coming up with new ideas rather than beating records on translation. - Yann believes that AI will have a positive impact on humanity, and that it is important to have countermeasures in place to prevent the misuse of AI. - Yann believes that AI should be open and widely accessible to everyone.
  • Best talk on the realities of ai that address the modern architectures and their fundamental flaws + (more importantly) alternative architectures that address these flaws quite clearly. This is the MUST WATCH video for ai practitioners and ai curious.
  • @Achrononmaster
    @56:00 it is because the CNN approach is not the same as statistical curve fitting, it is more akin to Fourier decomposition (to take an overly simplistic analogy). The big difference with a FFT is that a specialized CNN is basically an encoding of a whole giant bundle of generalized Fourier or Daubechies decompositions. An ai task space is vast, so you cannot find a complete set of orthogonal states, but the CNN allows enough of a basis in some cases, those are the cases where NN's work. The statistical algorithm aspect is the search to find a reasonable decomposition in the task space for a given request/prompt/whatever.
  • @emmanuelcassin1225
    La réponse est dans le modèle du vivant, il manque encore une structure importante. Je pense que chat gpt est bon car les tokens ont les bonnes propriétés pour modéliser les idées ou les pensées. Il manque la bonne structure pour modéliser l'interaction avec l'environnement pour faire évoluer vers l'agi. C'est la grande question en modélisation, il faut un objet mathématique qui a les bonnes propriétés. Dans les systèmes complexes c'est le monoide generateur du topos qui permet de tout dénouer et ensuite de recouper des domaines qui paraissent très éloignés à priori.
  • It would be very helpful if there was a list of open problems in machine learning space
  • @BH-BH
    I think that this guy makes the most sense relative to LLM’s and their inability to plan, reason, or think.
  • @kinngrimm
    1:17:25 "at some point it will be too big for us to comprehend" Before that point is reached we should have figured out alignment, not having a blackbox system so we can actually see whats going on in there and a ton of societal changes that will have to be made for societies to be/stay stable.
  • @bipuldas2060
    They left the cnn ship and now travelling in the transformer ship. We are now asking them to leave the transformer ship and onboard into the ssl ship. Not gonna happen imo, llm or transformers in general, which was conspicuously unnoticeable in the talk, are finding their sweet spots in a very large space of business applications where they tend to perform very well to solve some (non-virtual) real problems. Its like asking users to leave search engine to be on social network.
  • Open Source foundation models are the future of democracy and small business development
  • @rim3899
    With respect to LLMs, the evidence suggests that they do know a great deal about how the world works. For example, GPT-like models' weights can be/are trained on data that actually sub-sum all that is currently known about physical laws, chemistry, biology etc. through countless papers, review articles, and textbooks at various levels of sophistication, from 1st-grade level through cutting edge research. The fact that these are given as text (language) is not as problematic, since it appears that the relevant written record is sufficient to explain and convey the current and past knowledge in these subjects. That multi-layer transformers learn context and correlations between (meaningful!) words and their associated concepts and relationships should not to be underestimated. That the models "just produce" the next probable token isn't conceptually trivial either, if one considers that, for example, most of physics can be described through (partial) differential equations that can be integrated step by step, where the context/state-dependent coefficients of the equations (-the trained weights of the network-) ultimately result from the underlying theories these equations are solving. Processing the current state, with these coefficients in context, to predict and specify what happens next, one step at the time, is how these equations are in practice numerically integrated. So what we potentially may have with the current LLMs are models that learn from language and words, that actually do describe in excruciating detail what is known to man, and proceed to "auto-complete", in analogous ways to the best methods used to solve the currently known equations of Science.
  • about the singularity As Ray Kurzweil says When the whole universe becomes a computer, What does it calculate? Even though the purpose for calculating has already disappeared?
  • @Rizhiy13
    22:30 I believe that human ability to interact with the world is fundamental to our learning speed. We automatically mine hard examples around us. People have an internal model of the world, and it constantly evaluates what happens around us. If it is something we expect, we find it boring and if it is unexpected we find it interesting (that is very roughly; there are other rewards in play, e.g. hunger, thirst, sex drive, etc.). By seeking new interesting experiences we constantly improve our model of the world. Machines cannot do that, they have to learn from the data given to them, which can start being repetitive very quickly. The amount of novel data goes down very quickly. So much data is required in part because the amount of new information goes down exponentially if you just collect everything.
  • @babyfox205
    Configurator is the soul, the spirit! :D in case of AIs the configurator is a human operator 😇
  • @nicktasios1862
    Yann is mentioning 1:09:16 that a lot of the mathematics of neural networks comes from statistical physics, but I wonder what mathematics he's referring to, since most of the mathematics I've seen when I learned statistical physics was much more basic than some of the mathematics I've seen by the likes of Yi Ma and Le Cun.
  • When llm are put into ensemble with databases they can be made factual, actually. The llm is good at fusing query results is the reason. When llm are put into ensemble with strategy specialist models they can be made into planners, actually. The Alpha family of models is a planner. When llm are augmented with persistent storage they can be made to remember their learning s. The llm alone is not the way forward, but the llm With various augmentation s seems very promising.