How to make AI self-aware

Regardless of the currently popular text-to-video model SORA or the ChatGPT 3.5 that emerged a year ago, both have brought extreme shock to everyone. During my use of ChatGPT, I became very curious about its mental level and raised a question: Does ChatGPT have self-awareness? If we want an AI to have self-awareness, what should we do? Teacher Zhang Jiang's article inspired me greatly, and this article is my summary of learning on this topic.

The earliest thoughts on whether AI will have self-awareness came from the news in 2022 about the firing of a Google engineer, Blake Lemoine (when ChatGPT 3.5 had not yet appeared). In this news, this senior software engineer claimed that Google's AI chatbot LaMDA had self-awareness and provided chat records with LaMDA as proof. Of course, looking back now, it seems that the level is close to that of ChatGPT 3.5, and people have become accustomed to it.

What is Consciousness#

We must first clarify the meaning of consciousness. In general contexts, we can understand it as a biological entity's subjective perception and experience of itself and its environment. In neuroscience, consciousness is typically divided into two main components:

Wakefulness: This refers to the alertness of the organism or its ability to respond to external stimuli. For example, when you are awake and alert, you are in a state of wakefulness.
Awareness: This refers to the organism's subjective experience of specific thoughts, feelings, perceptions, and memories. For instance, when you see a cat and know that you are looking at a cat, that is your awareness of your visual perception.

This still does not cover all concepts of consciousness. For example, some philosophical views, such as panpsychism, argue that all matter has some degree of consciousness, even if that consciousness may be very weak or different from that of humans and other higher animals. "Consciousness" is a complex topic that involves many different disciplines, including neuroscience, psychology, philosophy, cognitive science, biology, physics, and even artificial intelligence research. Different disciplines may have different definitions and theories, but the common goal is to understand how we experience and comprehend ourselves and the world around us.

In neuroscience, which studies consciousness the most, consciousness is considered to have three definitions (see the paper "What is consciousness, and could machines have it?"):

C0, unconscious processing and conscious states. That is, what work requires conscious participation under what conditions? The author conducted many experiments to verify the existence of unconscious processing. Research in this area is relatively mature.
C1, overall availability. That is, information integration and identity. This aspect is related to the theory of "integrated information theory," which mainly studies how information integration occurs and how to quantitatively measure it.
C2, self-monitoring—this is the core of self-awareness.

The following will discuss AI's consciousness and self-awareness from these three levels.

First, we can use an example to illustrate what C0 level consciousness is, namely what "unconscious processing" is.

Snip20240222_44

For example, the letters A and B on a chessboard are actually of the same gray brightness, but our first reaction is always to think that A and B are different colors—this is because our brains have already done some logical processing without consciousness—thinking that B is in shadow, so B's color appears darker. Various experiments later found that there is a vast amount of unconscious processing involved in our brains. For example, simple logical calculations or even decision-making (if one has memorized the multiplication table, when Chinese people perform corresponding calculations, they will use unconscious processing, making the calculations appear particularly fast).

Theories and Models of Consciousness#

To understand C1 level consciousness, we need to model consciousness. Currently, there are two mainstream models: the GNW (Global Neuronal Workspace) and IIT (Integrated Information Theory) models.

The Global Neuronal Workspace is a model of consciousness proposed by American psychologist Bernard J. Baars and neuroscientists Stanislas Dehaene and Jean-Pierre Changeux. It is one of the dominant scientific theories about consciousness. It posits that consciousness arises from certain architectural features of the brain. This theory involves the concept of a "workspace" in the brain, where new information competes with and replaces old information. When the activity of one or more regions exceeds a certain threshold, it triggers a wave of neural excitation that spreads throughout the neuronal workspace, making the signal available for a range of auxiliary processes. The act of globally broadcasting this information is what makes it conscious.

This way of working is somewhat similar to small programs. These small programs do not require conscious participation and can automatically complete a series of tasks. What is consciousness? GNW suggests that consciousness is like a stage; under special circumstances/stimuli, consciousness will load these small programs into the global consciousness system, bringing them to the center of the stage.

In this way, many complex information processing tasks can be performed in this space, such as logical reasoning and decision planning. These require the participation of the entire brain. Moreover, consciousness can also send signals back to these small programs, allowing for quick actions.

image (17)

Scientists have now found much evidence proving that there are many long-range connections across brain regions, which can be considered a physical basis for some global activation. In 2017, a report in "Neuroscience" revealed that scientists discovered "giant neurons." As shown in the figure, one color represents one neuron, with many long synapses that essentially reach the scale of the "brain." This has also become a physical evidence for the "Global Workspace Theory."

The second model is called Integrated Information Theory (IIT). According to IIT, the degree of consciousness of a system (such as a network or brain) can be judged by measuring the interconnectedness and integration of its components. This theory emphasizes the following two key concepts:

Integration: The IIT theory posits that consciousness is composed of many different elements (such as neurons) that, while independent, need to be integrated into an indivisible whole to form a conscious experience. This integrative quality is a fundamental characteristic of consciousness; for example, our conscious experience at any given moment is a whole and cannot only contain information from one part (such as color or shape) while ignoring other parts.
Causal Power: Another important concept in IIT is "causal power." It refers to the influence of the system's current state on its future state and the influence of its past state on its current state. Systems with high causal power can generate complex interactions among their components, which is key to forming consciousness. For instance, interactions between neurons create our thoughts and feelings.

These two concepts together form the core of the IIT theory, which states that only when a system exhibits high integration and causal power among its components can it potentially possess consciousness. This theory can even quantitatively define "degree of consciousness" φ.

We can illustrate the IIT theory with a specific example. Suppose you slap your own body with your hand; when we perform this action, we can observe it from two levels: at the microscopic level, this action undoubtedly causes a large number of cells in the arm and body to die; while from the perspective of "I" as a whole person, these cells are undoubtedly constrained by "me." Ironically, "I" is a system composed of numerous cells; if we follow conventional causal theory (reductionism), the characteristics of "I" as a human body are determined by the cells, just as "I" fear fire because the cells fear fire, and if the cells are burned by fire, they will die. However, people can sacrifice for their ideals and beliefs, even at the cost of letting their cells or even their entire being be burned by fire. Clearly, this transcends the conventional causal theory from micro to macro and instead represents a causal inversion from macro to micro. "I" exists with free will, so I can slap myself. All of this is due to a higher-level whole that can act as an independent subject exerting "causal power," directing the causal arrow from the whole to the individual: I want to slap my arm, so the action occurs, and the cells die along with the action. The most important second point in integrated information theory emphasizes this "causal power."

There are certainly more models attempting to explain the mystery of consciousness than the two mentioned above. In 2022, "NATURE" published a review summarizing almost all models in the academic community regarding consciousness. For details, see the table below:

image (18)

Consciousness Turing Machine#

The purpose of the above discussion on modeling consciousness is to reproduce consciousness at the software level. We know that existing computers are Turing machines. A couple of Turing Award winners (Lenore Bluma and Manuel Blum) published a "note" in PNAS proposing a "Conscious Turing Machine" model: a computable consciousness architecture.

image (19)

The Conscious Turing Machine can be seen as laying out the architecture of the "Global Workspace Theory," where a bunch of small programs can perform various distributed tasks, and then there is a global space. The key point is that the Conscious Turing Machine implements some mechanisms for information upward and downward.

image (20)

As shown in the figure, this Conscious Turing Machine (TM) model visually presents a composite architecture attempting to simulate human consciousness processes. In this model, consciousness processing is envisioned as a multi-layered, highly interactive information processing system that involves multiple steps from basic sensory input to complex decision output.

External inputs are first captured through sensory modalities, and this sensory information enters the system in a read-only form, reflecting stimuli from the real world such as visual images, sounds, and touch. Once this raw data is received, it moves into the short-term memory module, which is the core of consciousness processing. Short-term memory plays a crucial filtering and integrating role here; it not only limits the amount of information that can be processed simultaneously (reflecting human attention limits) but also enhances processing efficiency by integrating information into chunks. These chunks represent the units of information formed through conscious processing and cognitive restructuring, which can be seen as the basic "currency" of conscious activity.

Meanwhile, long-term memory serves as a vast backend database, storing personal experiences, knowledge, and skills. This part of memory is usually in an unconscious state but can be elevated to the conscious level through internal mechanisms, such as UP-Tree competition. This competitive mechanism reflects how our attention shifts from one topic to another and how relevant information is extracted from a vast knowledge base to meet current situational demands.

In short-term memory, selected information chunks are sent to various dedicated processors via a rapid broadcasting system. These processors each have their own roles, processing specific types of information or executing specific tasks. For example, some processors focus on parsing visual-spatial information, some handle internal speech and language processing, while others may connect to external databases and algorithms, such as Google search or AlphaGo. This distributed processing mechanism simulates how the brain processes various information in parallel and allows consciousness to consider multiple aspects and possibilities simultaneously.

After information is further analyzed and integrated in these processors, the final output is realized through an external output module, which can be speech, writing, or other forms of physical behavior. This step completes the entire cycle from perception to action, reflecting how consciousness drives our interaction with the environment and formulates responses based on external feedback and internal goals.

Throughout the model, the design of information flow and processing is intended to reflect the flexibility, dynamism, and creativity of human consciousness. It demonstrates how to adapt and influence the environment through different levels of processing and integration, from simple sensory input to complex thinking and behavioral output. This simulation of the Conscious Turing Machine, while abstract, attempts to provide a framework for understanding and reproducing the complexity and diversity of human consciousness.

Enhancing Consciousness Planning and Imagination#

To explore the planning and imagination capabilities of consciousness, the father of LSTM (Long Short-Term Memory networks) proposed a reinforcement learning framework called World Models in 2012. He believed that reinforcement learning agents should embed a virtual world, i.e., a world model. In his research, he demonstrated through numerous experiments that models embedded with a virtual world could learn more fully on relatively small sample data—because the agent can dream.

The World Models reinforcement learning framework is an advanced method within the field of reinforcement learning, aiming to improve the learning efficiency and adaptability of agents (such as robots or software agents) by simulating environments. This approach originates from how humans and animals use internal models to predict and explain the surrounding world, attempting to replicate this mechanism in artificial intelligence systems.

The main components of the World Models framework are:

Visual Module (V): This part's task is to extract useful features and representations from raw inputs (such as pixels). In humans, this corresponds to perceiving the environment visually and understanding surrounding objects and scenes. In machine learning, this is typically achieved through convolutional neural networks (CNNs) or other image processing techniques.
Memory Module (M): This part processes time-series data, helping the agent understand the temporal dependencies and dynamic changes in the environment. This is akin to human working memory, used to store and process information about recent events. In computer models, this can be implemented through recurrent neural networks (RNNs) or long short-term memory networks (LSTMs).
Controller (C): This part makes decisions based on the outputs of the visual and memory modules and executes actions. In humans, this is similar to deciding how to act based on the current understanding of the environment and goals. In reinforcement learning, this is typically achieved through a policy network that determines which action to take in a given state to maximize future rewards.

Snip20240223_45

Specifically, the world model is an RNN whose input mainly consists of two sets of elements: one set is the encoded world state, and the other is the agent's action at t-1. The purpose of this RNN is to predict the next state/reward/action. With such a world model, the reinforcement learning agent can gain many benefits during learning. On one hand, we can deliberately train this world model (supervised learning mechanism) during training. On the other hand, it can dream—this is why the world model can learn more fully on relatively small sample data.

The dreaming process involves taking an incomplete world model and training it separately, generating virtual actions through some hypothetical actions. At a certain moment t, the world model, as a simulator of the real world, generates the next moment's action/state and reward, and then uses this dreaming data to train the execution part of the reinforcement learning agent. This way, we can optimize the objective function during the dreaming process, maximizing the reward. This greatly increases the training samples and reduces training time. Meanwhile, CMA-ES here is a planning algorithm. Thus, with a world model, there is a simulator, and the agent can set a future goal, finding a planning path to achieve this goal in the simulated World Model, generating step-by-step actions.

Some friends may not be familiar with RNNs. Recurrent Neural Networks (RNNs) are a special type of neural network designed to handle sequential data and temporal dependency issues. Unlike traditional feedforward neural networks, RNNs have the ability to process information related to previous and subsequent inputs, making them particularly suitable for handling language, time series, and other continuous data. The core of RNNs lies in their recurrent structure, allowing information to flow between different time steps in the network. This structure enables RNNs to retain and utilize information from previous time steps when processing new inputs, capturing temporal relationships and dependencies in the data. In practical applications, this means RNNs can remember past information and make more accurate predictions or decisions based on that information.

Although RNNs are powerful in handling sequential data, they also face issues of vanishing or exploding gradients, which can affect the network's ability to learn long-term dependencies. To overcome these issues, researchers have developed more advanced RNN variants, such as Long Short-Term Memory networks (LSTMs) and Gated Recurrent Units (GRUs). These improved models introduce gating mechanisms that help the network learn and retain long-term information more effectively, thus performing better on complex sequential tasks.

However, while the series of work on world models is excellent, a significant regret is that the world model still lacks self. Although it can feed actions back to itself, this is merely an action and not entirely "reflection"—when we refer to "reflection" in the realm of "consciousness," it often indicates a mental state. In contrast to humans, our modeled world model includes self, while existing world model research does not encompass self. Another regret is that the dreaming process in the world model series is non-autonomous. The agent deliberately distinguishes between "playing games" and dreaming, but humans can switch between or even do both simultaneously at any moment.

Self-Awareness and Self-Referential Consciousness#

"Self-reference," commonly translated as "自指" or "self-reference" in Chinese, refers to a situation where a statement, expression, thought, or other types of information directly or indirectly references or involves itself. Self-reference is a common concept in mathematics, logic, philosophy, art, and other fields.

For example, in linguistics, a self-referential example might include a statement like, "This sentence is false." This statement creates a paradox as it references itself; if it is true, then it is false, but if it is false, then it is true. In computer science, a self-referential example might include a computer program referencing or modifying its own parts within its code, or a data structure (such as a recursive data structure) referencing itself. This ability for self-reference is sometimes considered a hallmark of consciousness or self-awareness, as it involves a system's capacity to reflect on and understand its own processes or states.

What is the difference between self-awareness and self-referential consciousness? Self-awareness refers to a conscious system capable of reflective, reasoning, and imaginative cognitive activities, such as our human brain. In contrast, self-referential consciousness refers to a conscious system capable of self-reflection, reasoning, and imaginative activities achieved through self-referential principles. The latter clearly encompasses the former, as the self-reflection achieved through self-referential techniques is a perfect mapping—an ideal self-mapping in both spatial and temporal dimensions realized through special techniques, while the former is likely an imperfect self-mapping. Furthermore, the latter can be seen as a normative theory of self-awareness, a theoretical prototype, while the actual self-awareness system may be an imperfect realization, constrained and distorted by various factors. In contrast, self-referential consciousness, as a theoretical prototype, provides a goal for pursuing perfect self-mapping and self-understanding.

So how can we achieve self-reference in the field of computing? We consider two levels: the hardware level and the software level, so the problem is decomposed into two:

How can a machine achieve self-replicating production?
How can a piece of code achieve self-replicating generation?

Regarding hardware, as early as 1965, von Neumann designed a machine capable of self-replication, as referenced in "Theory of Self-reproducing Automata." However, when exploring self-replication in software, a problem arose that could not be solved: the "infinite recursion" problem.

Print('Hello World')

Print('Print(\'Hello World\')')

Print('Print(\'Print(\\'Hello World\\')\')

A classic example is attempting to create a "self-printing" program, i.e., a program that can output its own source code (as shown above). At first glance, this seems to fall into infinite recursion, as to print its own source code, the program appears to need to reference itself infinitely. However, inspired by philosopher Quine, mathematicians and programmers found a clever way to achieve this goal, avoiding the recursion trap.

The core of this solution lies in the dynamic interaction between the program and its runtime environment (such as the operating system). In this way, the program unfolds during execution, generating output identical to its source code. This self-printing program typically consists of two main parts: one part is the "template" or "framework" (referred to here as the "virtual part" or "virtual aspect"), while the other part is responsible for generating the actual code of this framework (referred to as the "real aspect"). The key lies in how the content of these two parts maps to each other and how the program's structure ensures that the output code accurately reflects itself.

Through this approach, the self-printing program acts as if it is "looking in a mirror," where the "virtual part" provides a pattern, and the "real part" fills in this pattern to produce a complete self-description. This is not only a clever trick in programming and mathematics but also offers a perspective on how to distinguish between consciousness and non-consciousness through software and functional structure. In this framework, self-awareness can be viewed as the system's ability to functionally distinguish and integrate its virtual and actual states. In other words, if a system can functionally differentiate and manage its internal representation (virtual aspect) and external manifestation (real aspect), then this system can be considered to possess some form of self-awareness.

In fact, the self-replicating automaton constructed by von Neumann also follows similar principles, with the key to achieving self-replication being—unfolding over time. Although many scholars believe that achieving perfect self-reference is a challenge because it seems to lead to infinite recursion, machines can achieve complete self-reference by utilizing Quine's techniques. This is akin to a machine engaging in dialogue with itself, gradually constructing its complete image through the unfolding of time. This process breaks through the boundaries traditionally thought impossible for self-reflection, demonstrating the potential for achieving self-cognition through ingenious design (i.e., the ability for self-reflection).

When analyzed separately, both the machine and its description exhibit incompleteness. However, when we combine the two and operate through the intervention of an operating system or nature, following the logic of natural time flow (from t to t+1), we can achieve a complete self-reference process. The core of this process lies in aligning the virtual world with the real world as closely as possible and ensuring they can operate in harmony. Although each has its shortcomings, the natural process can bridge these gaps, enabling the system to self-replicate and self-reference, thus achieving completeness.

Moreover, when the machine and its description form a mirrored relationship, we touch upon fractal theory and its applications in nature and technology, such as von Neumann's self-replicating structures. This phenomenon of mutual mirroring not only serves as a wonderful example of self-replication but also showcases the complexity of self-reference and its similarities in nature and technology.

Extending these thoughts to humanity's pursuit of universal truths, we find that although humans may not perfectly understand the vast universe that includes themselves, through self-referential techniques, we do not need to reach a perfect level of cognition. Humans can integrate their partial understanding, such as cognition achieved through AI, into a whole. Then, leaving the unknown parts for nature to answer, this whole composed of humans and machines can not only simulate the workings of the universe more accurately but also promote a deeper understanding and simulation of the universe's operations, achieving a closer to perfect exploration of universal truths by humanity.

Follow the author to learn more about AI and consciousness knowledge.

References

https://mp.weixin.qq.com/s/bZlhzIuscWyQEB_2nLr1Ag

https://www.science.org/doi/10.1126/science.aan8871

https://www.nature.com/articles/s41583-022-00587-4

https://www.pnas.org/doi/10.1073/pnas.2115934119

https://arxiv.org/pdf/1803.10122.pdf