#93 Prof. MURRAY SHANAHAN - Consciousness, Embodiment, Language Models

Published 2022-12-24
Support us! www.patreon.com/mlst

Professor Murray Shanahan is a renowned researcher on sophisticated cognition and its implications for artificial intelligence. His 2016 article ‘Conscious Exotica’ explores the Space of Possible Minds, a concept first proposed by philosopher Aaron Sloman in 1984, which includes all the different forms of minds from those of other animals to those of artificial intelligence. Shanahan rejects the idea of an impenetrable realm of subjective experience and argues that the majority of the space of possible minds may be occupied by non-natural variants, such as the ‘conscious exotica’ of which he speaks. In his paper ‘Talking About Large Language Models’, Shanahan discusses the capabilities and limitations of large language models (LLMs). He argues that prompt engineering is a key element for advanced AI systems, as it involves exploiting prompt prefixes to adjust LLMs to various tasks. However, Shanahan cautions against ascribing human-like characteristics to these systems, as they are fundamentally different and lack a shared comprehension with humans. Even though LLMs can be integrated into embodied systems, it does not mean that they possess human-like language abilities. Ultimately, Shanahan concludes that although LLMs are formidable and versatile, we must be wary of over-simplifying their capacities and limitations.

Pod version (music removed): anchor.fm/machinelearningstreettalk/episodes/93-Pr…

[00:00:00] Introduction
[00:08:51] Consciousness and Consciousness Exotica
[00:34:59] Slightly Consciousness LLMs
[00:38:05] Embodiment
[00:51:32] Symbol Grounding
[00:54:13] Emergence
[00:57:09] Reasoning
[01:03:16] Intentional Stance
[01:07:06] Digression on Chomsky show and Andrew Lampinen
[01:10:31] Prompt Engineering

Find Murray online:
www.doc.ic.ac.uk/~mpsha/
twitter.com/mpshanahan?lang=en
scholar.google.co.uk/citations?user=00bnGpAAAAAJ&h…

MLST Discord: discord.gg/aNPkGUQtc5

References:

Conscious exotica [Aeon/Shannahan]
aeon.co/essays/beyond-humans-what-other-kinds-of-m…

Embodiment and the inner life [Shannahan]
www.amazon.co.uk/Embodiment-inner-life-Cognition-C…

The Technological Singularity [Shannahan]
mitpress.mit.edu/9780262527804/

Talking About Large Language Models [Murray Shanahan]
arxiv.org/abs/2212.03551

en.wikipedia.org/wiki/Global_workspace_theory [Bernard Baars]
In the Theater of Consciousness: The Workspace of the Mind [Bernard Baars]
www.amazon.co.uk/Theater-Consciousness-Workspace-M…

Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts [Stanislas Dehaene]
www.amazon.co.uk/Consciousness-Brain-Deciphering-C…

Roger Penrose On Why Consciousness Does Not Compute [nautil.us/STEVE PAULSON]
nautil.us/roger-penrose-on-why-consciousness-does-…
en.wikipedia.org/wiki/Orchestrated_objective_reduc…

Thomas Nagal - what is it like to be a bat?
warwick.ac.uk/fac/cross_fac/iatl/study/ugmodules/h…

Private Language [Ludwig Wittgenstein]
plato.stanford.edu/entries/private-language/

PHILOSOPHICAL INVESTIGATIONS [Ludwig Wittgenstein] (see §243 for Private Language argument)
static1.squarespace.com/static/54889e73e4b0a2c1f98…

Integrated information theory [Giulio Tononi]
en.wikipedia.org/wiki/Integrated_information_theor…

Being You: A New Science of Consciousness (The Sunday Times Bestseller) [Anil Seth]
www.amazon.co.uk/Being-You-Inside-Story-Universe/d…

Attention schema theory [Michael Graziano]
en.wikipedia.org/wiki/Attention_schema_theory

Rethinking Consciousness: A Scientific Theory of Subjective Experience [Michael Graziano]
www.amazon.co.uk/Rethinking-Consciousness-Scientif…

SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Google/]
say-can.github.io/

THE SYMBOL GROUNDING PROBLEM [Stevan Harnad]
www.cs.ox.ac.uk/activities/ieg/elibrary/sources/ha…

Lewis Carroll Puzzles / Syllogisms
math.hawaii.edu/~hile/math100/logice.htm

In-context Learning and Induction Heads [Catherine Olsson et al / Anthropic]
transformer-circuits.pub/2022/in-context-learning-…

All Comments (21)
  • @PaulTopping1
    One of the best MLST interviews I've ever listened to. It covered so many things that interest me.
  • @sepgorut2492
    There's nothing like an interesting discussion to start off Christmas day.
  • @mikenashtech
    Fascinating discussion Tim and Prof Shanahan. Thank you for sharing. 👏M
  • @luke2642
    Good job Tim! I bet even your Christmas tree has some tesalating spline decorations, they pop up everywhere ;-)

    Shanahan is so likable! He exudes precision, and yet an accomodating reasonableness. He balances a holistic approach with specifics and edge cases.

    "I'm not making a metaphysical claim, I'm just describing how we use the word." Witty too!

    He seems quite genuine, even if he half-castrates reductionism and functionalism. It feels a bit Derridian, différance, but it's hard not to respect when he immediately calls out that 'definitions' aren't 'usage', and the importance of grounding. Clearly loves language too!

    I think my top takeaway is: "Things can be empirically hidden about consciousness, not metaphysically hidden." Such a clear distinction!
  • @smkh2890
    I remember a very simple sci-fi story in which some spacecrafts land on Earth
    and we wait for some aliens to emerge, but they never do.
    Finally they fly away again, complaining among themselves
    that the humans never bothered to engage with them.
  • @_ARCATEC_
    Creating a Novel Neference Frame, that takes into account different perspectives or points of view is interesting to me. reference frames are an important aspect of how we understand and describe the world around us.
    For example
    In physics and mathematics, a reference frame is a set of coordinates that are used to describe the position and orientation of objects in space....
    💓
  • @isajoha9962
    Kind of fascinating to think of "where the LLM is" (from a consciousness perspective) after it has made the output from the latest prompt. 🤔 Considering it is not actually an app with a clear state and is more like a file with a "filter mechanism" called from some kind of multi user app.
  • @dr.mikeybee
    I like the notion that embodiment is something like "having skin in the game." That is to say there are goals and, therefore, the encapsulating agent makes value judgements. And this is the dangerous part of creating synthetic intelligence. This is where alignment is critical. And I believe that agents need to be kept as simple and as self-directed-free as possible.
  • @Artula55
    Just finished your fascinating Sara Hooker video and you already posted another video to watch😅 Thanks!
  • @dr.mikeybee
    How does grounding of symbols differ from maintaining a complete history? If this history can be used to add context to a connectionist model, doesn't that do the same thing as what we humans do? At least doesn't it go a long way towards giving words grounding? When I hear the word dog, I have access to all my memories. A large model has access to all its memories, but unless training was updated to include history or relevant history is added to the prompt, it lacks that.
  • @robin-jewsbury
    Prompts do context including conversation history so if you don’t send a userid you’ll always need prompt or something to do context and history
  • @ozkrhills9624
    Not sure why I like so much this kind of information...
    I hope this video will be a starting of a many others videos
  • @dr.mikeybee
    Wonderful! Yes, talk about these things separately. Throw the word consciousness away. It just creates misunderstanding and contention. Suitcase words have no place in scientific discussion.
  • @dr.mikeybee
    Is Global workspace theory (GWT), at least in part, analogous to hidden prompt additions and modifications done by agents for input into LLMs?
  • @dr.mikeybee
    "Emergent" suggests more than our surprise that something works as it should. What I don't understand, and perhaps I never will in anyway except by some vague intuition, is how layers of abstraction are "correlated" to subject matter detail. And as you can see, I'm not even really sure about the right way to word my confusion. Here's what ChatGPT has to say on the subject:

    "Adding layers of abstraction in a language model can be correlated with greater subject matter detail in several ways:

    Abstraction allows a language model to represent complex concepts and relationships in a more concise and organized way. For example, a language model with an abstract representation of "animals" could include information about different types of animals, their characteristics, and their behaviors, without having to specify each type of animal individually.

    Abstraction enables a language model to generalize and make predictions about new examples based on patterns and structures it has learned from past examples. For example, a language model with an abstract representation of "animals" could be used to classify a new type of animal as "mammal" or "reptile" based on its characteristics, even if it has never seen that particular type of animal before.

    Abstraction allows a language model to handle a larger and more diverse range of inputs and outputs. For example, a language model with an abstract representation of "animals" could be used to answer questions about different types of animals, such as "What do polar bears eat?", or "How do snakes move?", or "What is the lifespan of a dolphin?", without having to specify each type of animal individually.

    Overall, the use of abstraction in a language model can enable it to represent and handle a greater amount of subject matter detail in a more efficient and flexible way."

    To me, this suggests that information is somehow categorized and compressed. It makes sense that this be done, but it is difficult to grasp how the self-organizing mechanism does this using back-prop. That is to say, how is it that guessing the next word results in a compressed hierarchy of abstractions rather than long individual paths to a next token? Yet by optimizing to minimize the loss function, this self-organization happens. One might first of all say that this self-organization is emergent. But rather it is the case that I haven't grasped the full intention or scheme of the programmer. Perhaps it is the case that even the programmer hasn't grasped it fully? I suspect this is true. In other words, in building a function approximator, we may not understand the functions that are built. And therefore we call the characteristics of these functions "emergent" rather than unexpected.  Surprisingly, ChatGPT says the abstractions and organization exist independently in the training data and are not the result of any intention on the part of the programmer:

    "The backward pass is an important part of training a machine learning model, as it allows the model to learn from its errors and improve its performance over time. However, the backward pass itself does not directly cause the model to self-organize or create abstractions.

    A model's self-organization and the formation of abstractions can occur as a result of the training process, but it is typically driven by the structure and patterns in the model's training data, as well as the architecture and learning algorithms used to train the model. For example, a language model trained on a large dataset of text might learn to form abstractions based on the relationships between different words and concepts in the text."

    And as for emergent vs. unexpected, once again, ChatGPt comes to our aid:  

    "Emergent behavior refers to the emergence of complex or unexpected behavior from the interactions of simpler components. In the context of a language model, emergent behavior might refer to the formation of new patterns or structures in the model's output that are not explicitly represented in the model's training data or architecture.


    Unexpected behavior, on the other hand, refers to behavior that is not anticipated or expected based on the model's training data or architecture. This might include errors or anomalies in the model's output, or behavior that is significantly different from what the model was designed to do.

    It is important to note that emergent and unexpected behavior are not mutually exclusive categories, and a language model's behavior may exhibit both emergent and unexpected characteristics."  

    To me this sounds as though "unexpected" takes in the idea of error; otherwise the terms are interchangeable. Unexpected seems to me a humbler term. And if the model's structure is determined primarily by the training set, the results are not emergent from the model but from the data. Of course, a model can be expanded or compressed by using more or fewer nodes in layers, but it is the training data that contains the ability for a model to perform one task or another.
  • @dr.mikeybee
    I wonder what can be done with next word prediction if we create a kind of RNN architecture within the agent, not the model? In other words, if we get the next word or words and add it back in with the original prompt as context. This is an experiment we can try ourselves with prompt engineering. I may spend some time with it. Another experiment would be do scrape all the nouns or named entities from a model's output and add them back as context along with the original prompt.
  • @sgramstrup
    LLM's are still static intelligent automatons with a catch. Biological language models encapsulates and mimics all we know and all our social and cognitive abilities. LLM's encapsulates and mimics that language, and are therefore encapsulating an intelligence similar to our. New studies have shown that very large LLM's also catch other cognitive skill from our language, and are closing in on IQ. They emulate intelligence, and even the underlying consciousness that created the language (the catch), very well, but is otherwise an 'inactive' intelligence.

    Regarding consciousness, then we need an active system, and that means internal feedback loops. If an intelligence can't interact with internal processes, then it cant 'feel' or sense how it feels/thinks, and it's only an intelligence. For an LLM to acquire full consciousness, it absolutely needs to be fed internal states back into the model to generate an 'active' intelligence that can 'think', instead of an 'in-active' intelligence as an automaton. Chaotic systems are active systems and very dynamic. Biological intelligence's are active and have evolved to - more or less successful - control the chaos in there, and make sure we have a homeostatic internal environment. Many mental issues are chaotic mental states that end up in a new strange attractor that can be hard to jump away from without external help.

    Hm, hope it made sense in a short space..