21730 字
109 分钟
中文
Understanding Effective Learning Methods: A Neuroscience Perspective

The market is flooded with all sorts of learning theories, and teachers in schools have their own teaching methods. So which one is truly effective? I thought this was an unsolvable question, but then I stumbled upon the fascinating field of neuroscience. Starting with the mechanisms of the brain, surely I couldn’t be wrong? As my understanding deepened, I learned that the topic of effective learning methods has been studied countless times, and the academic community has developed several relatively clear approaches. So, how do we compare these scientific methods with our teaching methods? We find that many people mistakenly believe that “learning” is simply stuffing information into the brain: reading more, doing more problems, staying up more nights—then they should understand better, be more proficient, and remember better. But reality is often the opposite—the harder you try, the more confused you become; the more you repeat, the easier it is to forget; despite investing a lot of time, you see no steady progress. Reality has essentially proven that these methods are flawed, and we will further deduce this from the theories of neuroscience.

You will see, in turn: how neural connections are strengthened and pruned; how information is transferred from short-term working memory to long-term explicit memory and automated implicit memory; why the brain is essentially a “prediction machine,” and how this prediction machine predicts complex theories, including how “prediction error” becomes the trigger for actual learning; after reading this article, you will have a better understanding of learning, and the next time you encounter a learning theory, you will be able to quickly analyze its effective components.

Note: Memory cards are included at the end of each chapter.

1. Neuroplasticity: The Physical Basis of Learning#

How are memories formed? Scientists in the past believed that brain neurons cease development after birth, shaping memories solely by strengthening existing synapses. But is this truly the case? With advancements in observational methods, we have finally glimpsed the truth about neuronal development—neurons not only strengthen existing connections but also construct new physical links through a series of mechanisms. What we call “learning,” at the microscopic level, is the alteration of the physical structure of neuronal networks.

This is what a neuron looks like. Our brain is made up of 85 billion to 120 billion neurons, each of which is connected by thousands of other neurons on average. We will mention this repeatedly in Chapter 1.

神经元

1.1 Neurons and Signal Transmission#

Note: If you find this difficult to understand, feel free to skip it. Section 1.1 basically explains the most basic neural mechanisms. Theoretically, you can go straight to section 1.2.

If you zoom in on that image of neurons, you’ll find that it looks very much like a tree:

  • Dendrites (canopy): Responsible for receiving information from other neurons.
  • Axon (trunk): Responsible for transmitting information within a neuron; it is a long output cable.
  • Axon terminal (root): The structure responsible for sending information to the next neuron.

When a neuron is activated, it generates a weak electrical signal that travels rapidly along the long axon (the trunk) to its terminal. The terminal connects to the next neuron, and the process repeats until the last neuron—which then produces an action or secretes a hormone.

However, the connection between two neurons is not seamless. If we magnify the connection (as shown in the image below), we can see a tiny gap in the middle, which we call a “synapse”.

神经元传导

The electrical signal doesn’t travel directly to the next neuron; instead, it’s converted into a chemical signal and transmitted to the next neuron. The next neuron then converts the chemical signal back into an electrical signal. These chemical signals are called neurotransmitters, and they determine whether the electrical potential transmitted to the next neuron is positive or negative, and the strength of the signal. The entire process occurring in the synapse can be roughly divided into the following four steps:

  1. Vesicle storage: In the axon terminals of upstream neurons, neurotransmitters are neatly encased in vesicles, awaiting commands.
  2. Signal release: When an electrical signal is received, these vesicles move to the edge and release the encapsulated neurotransmitters into the synaptic cleft.
  3. Precise matching (receptors): These chemical messengers swim across the interneuronal space, but that’s not the end. Specific receptors grow on the dendritic surfaces of downstream neurons. Neurotransmitters must attach precisely to these receptors, like a key into a lock, opening a channel for free positive or negative ions to enter the next neuron. Only when a match is successful and ions successfully enter will the next neuron be activated and generate a new electrical signal.
  4. Cleaning up the battlefield (enzymes): After the signal transmission is completed, the excess neurotransmitters must be cleared or recycled. Specialized enzymes will work together to process these neurotransmitters that have completed their tasks.

By combining the conduction within neurons and between synapses, we can roughly construct a “brain,” but this brain currently only mechanically transmits signals, showing no signs of plasticity. In this case, our brains may never develop intelligence—what enables intelligence in the brain is the reshaping of these connections, possessing the physical ability to change its strength according to usage frequency—this is neuroplasticity.

1.2 Neuroplasticity#

It’s time to understand neuroplasticity, which occurs at any stage of life, and that this plasticity is not one-way (or not always beneficial). Sometimes you shape the wrong circuit, and sometimes you cut off neurons that you don’t use.

When you first try to learn a new concept (like memorizing an unfamiliar word), a specific neuron A sends a signal to neuron B. Initially, this pathway is unfamiliar and inefficient; signal transmission is fraught with resistance, and there may be very few connections. However, as you repeatedly repeat the word, neuron A repeatedly and frequently activates neuron B. This continuous repetition leads to changes in the related activated neurons. Specifically:

  • Myelination: Strengthening existing connections. Neuronal axons are wrapped layer by layer with a lipid-rich substance called myelin. This substance can significantly increase the speed of electrical signal conduction in neurons.
  • New synapses: establishing new connections. The dendrites of neurons branch and grow, producing new dendritic spines. These newly formed tentacles explore their surroundings, seeking other axons and establishing entirely new connections that have never existed before.
  • Neurogenesis: the formation of new neurons. The germinal layer of the brain generates entirely new neurons through stem cell differentiation.

Ultimately, the brain optimizes into an extremely efficient network, constantly reinforcing the skills you use daily, making you increasingly proficient in them. Examples include social skills and motor skills. Of course, the brain also forgets. The other side of neuroplasticity is synaptic pruning—meaning unused skills are gradually forgotten. Infancy and adolescence are marked by continuous and efficient synaptic pruning, a process that slows down in adulthood. However, the brain continues to optimize this network formed during adolescence, an optimization that continues until death.

Regarding neuroplasticity, there is a frequently cited example—taxi drivers, whose hippocampal systems are enlarged due to memory maps:

Taxi drivers rely on cognitive spatial maps for their livelihood, and a prominent study showed that London taxi drivers had larger hippocampal areas in this region. Furthermore, a follow-up study compared hippocampal imaging before and after years of grueling work and preparation for the London taxi driver ‘s license exam (described by the New York Times as the world’s toughest exam). After this process, the hippocampus enlarged—but only those who passed the exam did.

1.3 Sensitivity - The Best Window for Learning#

We’ve observed that children learn languages ​​incredibly quickly, while the brain struggles to absorb new information as adults. This isn’t limited to language; children demonstrate remarkable learning abilities in almost every area. Many adults regret not having learned more at the right time. This is indeed true. While our neurons change with our daily actions, these adjustments are ultimately subtle. In many brain regions, plasticity reaches its peak only within a limited timeframe, known as a “sensitive period.” This period is activated and peaks in early childhood, gradually diminishing with age. For example, sensory areas experience increased plasticity around 1-2 years of age, followed by a gradual decline. The longest and latest sensitive period occurs in the frontal cortex, an area that begins developing during puberty and continues until around age 20.

One of the best examples to explain sensitive periods comes from our mastery of the sounds of our native language—every child is born with the ability to quickly distinguish all the phonemes of any language. Regardless of where they are born or their genetic background, they only need to immerse themselves in a language environment for a few months (whether it’s monolingual, bilingual, or even trilingual) for their hearing to adapt to the phonological system of the surrounding languages. Adults, on the other hand, can hardly do this. For example, most Chinese people end up speaking Chinglish their entire lives; Japanese speakers who live in English-speaking countries their whole lives will not be able to distinguish the sounds of R and L; and people from some countries will be unable to pronounce the retroflex consonants of Indian languages. Compared to children, an adult will need to expend tremendous effort to regain the ability to distinguish the sounds of a foreign language.

The examples above demonstrate the importance of sensitive periods. However, the very reason a period is sensitive is because, although the corresponding learning ability may diminish after the sensitive period, it doesn’t disappear, and the degree of diminishment varies from person to person . We also see many people who missed the critical period but still learned English through considerable effort.

For example, experiments have shown that later-stage training can still rebuild language mastery (though it’s not as easy as during the sensitive period). In this study, they trained Japanese listeners to recognize the R and L sounds in English. The results showed that three months after completing perceptual training, the Japanese trainees maintained good performance on the perceptual recognition task. Furthermore, native English-speaking American listeners conducted perceptual assessments of the Japanese trainees’ speech output before and after training, as well as during a three- month follow-up period. The results indicated that the trainees maintained long-term improvement in the overall quality, recognizability, and intelligibility of their English /r/-/l/ pronunciations.

1.4 Nutrition, Exercise and Sleep#

营养运动睡眠

It is certain that significant neural remodeling occurs during learning. However, any remodeling requires a certain amount of time, inevitably accompanied by high energy consumption. Such high energy consumption necessitates ample nutrition and high-quality rest, which are essentially the foundation of diet, sleep, and exercise. Yet, modern people generally neglect these factors—thus sowing the seeds of a long-term hidden danger.

Let’s first discuss the often-overlooked nutrients, which are essential for a developing brain. One such substance that has been severely neglected is Omega-3 (derived from olive oil, seafood, etc.) – a 2013 paper studied it .

  • Omega-3 is an essential unsaturated fatty acid. Unfortunately, mammals cannot synthesize it themselves and must obtain it from food or supplements.
  • They are involved in a variety of physiological processes and have been reported to potentially help protect neurons in neurological diseases, such as aging or damaged neurons and Alzheimer’s disease. Their effects on cognitive and behavioral function, as well as on a variety of neurological and psychiatric disorders, have also been confirmed.

What’s relatively easy to replenish are the nutrients we lack, but modern people not only face nutritional deficiencies but also the problem of excessive nutrient intake. The modern food industry leads to our excessive consumption of sugar and fat. A 2002 research paper indicated that a high-fat, refined-sugar diet reduces brain-derived neurotrophic factor (BDNF), neuronal plasticity, and learning ability in the hippocampus.

Not only are we severely imbalanced in our nutrition, but we also suffer from a serious lack of daily physical activity—a significant factor affecting neuroplasticity. A 2014 paper specifically investigated this point .

  • Exercise has a positive impact on cognitive function and can increase the level of brain-derived neurotrophic factor (BDNF), an important neurotrophic factor.
  • Physical exercise is closely associated with a reduction in various physical and mental illnesses. Extensive evidence suggests that physical exercise not only reduces the incidence of cardiovascular disease, colon cancer, breast cancer, and obesity, but also lowers the incidence of diseases such as Alzheimer’s disease, depression, and anxiety.
  • Multiple cross-sectional and longitudinal studies have confirmed the association between being overweight and poor academic performance. Aerobic capacity is also associated with cognitive ability and academic achievement.

Finally, regarding sleep research, although there is a serious imbalance in diet and exercise, at least sleep is necessary, but the quality of sleep is hard to describe—which is also a huge problem. Here is a paper from 2014 and 2022, which mentions 7 [ 8 ]:

  • Sleep is considered to play an essential role in brain plasticity.
  • Sufficient blood flow to the brain provides oxygen to active neurons and removes metabolic waste, thereby promoting neuroplasticity.
  • Sleep duration also affects the process of voluntary neural plasticity, as evidenced by the effects of sleep deprivation on task-related cerebral blood flow and cognitive function.

1.5 Summary#

Learning alters neural networks: a pathway is repeatedly activated, increasing synaptic efficiency, accelerating myelination and transmission, and even leading to new dendritic spines and connections; conversely, pathways that are not used for a long time are weakened or even disappear during “synaptic pruning.” This remodeling continues throughout life, but there are “sensitive periods” in different brain regions. However, adults do not lose their learning ability; they simply usually require higher-quality training, more repetition, and longer consolidation periods. Finally, neural remodeling is an energy-intensive and time-consuming biological process that requires us to provide a favorable environment for it.

Q: From a microscopic perspective of neuroscience, what is the essence of “learning”?

点击翻转

Changes in the physical structure of neuronal networks (i.e., neurons not only strengthen existing connections but also build new physical links).

点击继续

Q: What are the differences in the transmission of neural signals “within” neurons and “between” neurons?

💡 提示:Signal Conversion Process
点击翻转
  1. Internal: Electrical signals (conducted along the axon).
  2. Between neurons (synapses): Chemical signals (neurotransmitters).

Q: What are the four key steps involved in the transmission of nerve signals within the synaptic cleft?

💡 提示:Four Steps of Synaptic Transmission
点击翻转
  1. Vesicle storage (encapsulating neurotransmitters).
  2. Release signal (release into the synaptic cleft).
  3. Precise matching (neurotransmitters bind to receptors, like a key unlocking a lock).
  4. Clean up the battlefield (enzymes remove or recycle excess neurotransmitters).
点击继续
点击继续

Q: When we repeatedly and frequently activate a certain neural pathway (such as memorizing vocabulary), what three specific physical changes occur in the brain to optimize efficiency?

💡 提示:Three Physical Changes from Repetitive Practice
点击翻转
  1. Myelination: Encapsulates axons, significantly increasing the speed of electrical signal conduction.
  2. New synapses: Dendrites grow and branch, establishing entirely new connections.
  3. Neurogenesis: Stem cells differentiate to produce entirely new neurons.

Q: What is “synaptic pruning”?

点击翻转

The opposite mechanism of neuroplasticity: the brain gradually weakens and removes neural connections that haven’t been used for a long time to optimize overall network efficiency.

点击继续
点击继续

Q: What core argument did the famous “London Taxi Drivers” experiment (Maguire’s study) prove?

💡 提示:A Study of London Taxi Drivers
点击翻转

The adult brain still possesses structural plasticity (only in drivers who passed the test did the hippocampus, responsible for spatial memory, significantly enlarge).

点击继续

Q: What are the “sensitive periods” in brain development?

点击翻转

These are limited time windows during which the plasticity of specific brain regions reaches its peak (such as language learning in early childhood). Learning efficiency is highest during this stage, and it gradually weakens with age but does not disappear completely.

点击继续

Q: After adults miss the “sensitive period,” although their learning ability weakens, how can they still achieve neural remodeling?

点击翻转

Through higher-quality training, more repetition, and longer consolidation periods (as Bradlow’s research shows, adults can still improve their speech recognition through training).

点击继续

Q: Neural remodeling is a high-energy-consuming process. What are the three major external physiological factors that support it?

💡 提示:Three Key Elements
点击翻转
  1. Nutrition (raw materials).
  2. Motion (catalyst).
  3. Sleep (Cleanup and Maintenance).
点击继续

Q: How do physical exercise and a high-sugar, high-fat diet affect brain-derived neurotrophic factor (BDNF)?

💡 提示:About BDNF
点击翻转
  • Physical exercise: increases BDNF levels and promotes cognitive function.
  • A high-sugar, high-fat diet: lowers BDNF levels, impairing neural plasticity and learning ability.

Q: What are the two key roles that sleep plays in neuroplasticity?

点击翻转
  1. Waste removal: Metabolic waste is cleared through cerebral blood flow.
  2. Consolidation and remodeling: This involves supplying oxygen to active neurons and consolidating the neural connections changed during wakefulness.

2. Classification of Memory: From Instantaneous Signals to Automatic Instincts#

If the previous chapter answered the question of “how memories leave traces in the brain,” then this chapter, related to memory, will answer the question—what types of memory exist? The vast majority of the information we encounter daily disappears. Only a very small fraction transforms from short-term memory into knowledge that can be retrieved days, years, or even a lifetime later. In this chapter, we will introduce, in turn, working memory, which can only last for a few seconds, and knowledge and skills that are stored long-term and even automatically executed.

2.1 Working memory - thinking about where it is happening#

Now, look around at your surroundings and try to remember what they look like, then look back. If you really try to remember what your surroundings look like, you’ll find that you’ve forgotten some of the text. This isn’t because you’re not capable, but because working memory has limited capacity, and attentional resources are reallocated, squeezing out information that was being maintained. That’s working memory.

A working memory contains a single, active thought that lasts for a few seconds. It relies primarily on the vigorous firing of numerous neurons in the parietal and prefrontal cortex, supporting neurons in other peripheral areas. Generally, it doesn’t last more than a few seconds, and the neuronal activity dissipates immediately upon distraction, leading to immediate forgetting of what was just recalled. Just as you’re reading this, the memory of the preceding neuroplasticity may have already faded, and you’ve forgotten what a “synapse” is.

Working memory is generally considered to have an upper limit of 3-5 items, meaning most people can’t remember more than 5 pieces of information—of course, these 5 pieces of information are considered 5 independent, unordered pieces of knowledge. Most people can remember 8 to 10 digit phone numbers at once, and network engineers can remember 12-digit IPv4 addresses at once; these are actually not more than 5 items, they’ve simply grouped several numbers together. This also shows that for certain specific tasks, working memory can be improved through continuous chunking. We won’t elaborate further here, but will discuss this in detail in later chapters.

Working memory can be retained temporarily, but this does not equate to ‘successful writing’. We will later use the famous case of HM to illustrate this: a person can retain working memory and learned skills, but is almost unable to form new, memorable experiences.

2.2 Explicit Memory - How Recallable Knowledge is Consolidated#

While you’re sound asleep, your brain is rapidly reviewing the information you’ve encountered during the day. This might include this article, which happened to leave a strong impression on you, marking it in your brain and subtly forming some neural connections. You’ll find yourself remembering some of the article’s key points when you wake up the next day. This is explicit memory; it’s stored in your brain as text, and you can review it whenever needed.

Unlike working memory, explicit memory does not rely on short, continuous neuronal firing for maintenance; its key lies in consolidation. We connect the neuronal circuits that are constantly triggered during the day to form memories. Furthermore, at night when we rest, the brain repeatedly “replays” these patterns and simultaneously strengthens connections in relevant cortical areas. Over time and with repeated replays, the memory’s reliance on the hippocampus typically decreases, while its reliance on cortical connections increases—this is why explicit long-term memories gradually become more stable and easier to retrieve.

This is why explicit memory is not a perfect replay of the past. A few days later, we may still remember a formula, a fact, or an important experience, but often we can no longer recall the specific details. Memory, while being preserved, is also constantly being reconstructed, becoming part of our understanding of the world. 10

2.3 Implicit Memory - How Skills Automate Unconsciously#

Finally, and most difficult to understand, is implicit memory. It can be derived from explicit memory as well as from direct actions or thoughts. It refers to our memory of skills and behaviors. However, it doesn’t exist in the form of specific facts or events; it’s generally reflected in “whether or not one can do it.” For example, riding a bicycle, typing, tying shoelaces, or using a tool—when we repeat these activities again and again, neurons in the cortex and other subcortical circuits eventually regulate themselves, making the information more fluent for future use. Neural firing becomes more efficient and replicable, pruning dependent activities, working like a clock—accurate and precise. Once these abilities are mastered, they can often be performed with almost no thought.

Unlike explicit memory, implicit memory is usually difficult to describe clearly in words. We can complete an activity, but it’s hard to pinpoint exactly how each step was done. The formation of this type of memory doesn’t rely on recalling experiences, but rather on repeated practice and feedback, gradually solidifying over long-term use. From a knowledge writing perspective, implicit memory is a highly stable form of storage. Its acquisition is slow, but once established, it’s not easily erased and can even be quickly reactivated after years of disuse. This explains why some skills, once learned, remain “remembered” even after long periods of inactivity.

In the formation of implicit memory, other memory systems often play a supporting role in the early stages. For complex skills, the initial learning phase typically relies on working memory to maintain operational steps and explicit memory to understand rules and goals. However, with repeated practice, this explicit control gradually weakens, and the execution of the behavior begins to be dominated by more automatic neural circuits. Of course, implicit memory does not necessarily have to go through an explicit learning stage. Some simple skills can be formed directly without conscious involvement.

Note: Implicit memory is not only reflected in physical actions, but also includes highly automated thought processes in the brain that require almost no conscious involvement. Furthermore, not all memories become implicit, but some memories can be “programmed” through repeated use. For example, for most adults, the result 3 × 4 = 12 is often retrieved directly without needing to recall or repeat the multiplication rule. While this process may initially originate from explicit learning, after repeated use, its retrieval method exhibits implicit characteristics. 11

2.3 The famous patient HM - “Being able to remember ≠ learning”#

Finally, let’s look at a classic case study of patient HM, which has been studied countless times in the field of memory research. This will hopefully enhance our understanding of these different types of memory.

Henry Molaison underwent surgery for severe epilepsy, resulting in the removal or severe damage of both medial temporal lobes (including structures such as the hippocampus). Postoperatively, his epilepsy was controlled to some extent, but he developed typical and severe anterograde amnesia: he has since been almost unable to form new memories that can be retained long-term and consciously recalled later.12

Importantly, HM’s memory function was not impaired in all aspects. His overall intelligence test scores were close to normal, and his language and existing knowledge were relatively preserved. In the laboratory, his working memory could be maintained normally within short time windows—as long as the information remained “online” with attentional support, he could repeat or manipulate it; however, once attention was interrupted, the information quickly dissipated, and he was unable to recall what had just happened. This indicates that the brief maintenance of working memory does not equate to the memory being fully written.

More crucial evidence comes from skills learning. Corkin’s summary of the research shows that in procedural tasks such as “mirror tracing,” although HM claimed he had never done the task before and could not recall the training process, his performance steadily improved with practice, and his error rate decreased by 12% each day. This indicates that he was still able to acquire new procedural (implicit) memories, meaning that “being able to do” skills can be gradually solidified in the absence of explicit memories.

In summary, HM clearly supports the conclusion that memory is not a single system. Procedural learning can continue to occur relying on relatively independent loops. 13

2.4 Summary#

Memory stageDurationIs it stable?Main functions
Working memorysecondsExtremely unstableTemporarily maintain the current thought content
Explicit memoryDays – LifetimeHighly stableStore facts, concepts and rules
Implicit memoryWeeks – LifetimeExtremely high stabilityAutomation skills and mindset

Q: This article mainly divides the human memory system into three categories?

点击翻转
  1. Working memory: the momentary retention of thoughts.
  2. Explicit memory: Facts and experiences that can be consciously recalled.
  3. Implicit memory: Automated skills and habits acquired unconsciously.

Q: What are the physiological mechanisms and capacity limitations of “working memory”?

💡 提示:About Capacity and Mechanism
点击翻转
  • Mechanism: It depends on the continuous active firing of neurons in the parietal and prefrontal cortex (which dissipates once attention is diverted).
  • Capacity: Typically can only hold 3-5 individual units.
  • Optimization: Scattered information can be merged through chunking, thereby improving memory efficiency without increasing the number of units.
点击继续
点击继续
点击继续
点击继续

Q: How does “explicit memory” transform from a short-term signal into long-term storage (physiological process)?

💡 提示:Difference from working memory
点击翻转

It relies on a consolidation process. The brain (especially during sleep) replays neuronal patterns, gradually shifting memory storage from dependence on the hippocampus and solidifying it in connections in the cerebral cortex, thus eliminating the need for continuous neural firing to maintain it.

点击继续

Q: Why is explicit memory described as “reconstructive” rather than “replayable”?

点击翻转

Explicit memory is not a perfect recording of the past. When retrieving a memory, the brain reconstructs details based on current understanding; and as time passes, memories are constantly modified and reintegrated, causing details to become blurred or distorted.

点击继续

Q: What are the core characteristics of “implicit memory” and how is it formed?

💡 提示:Characteristics: Ineffable, the body knows
点击翻转
  • Characteristics: non-declarative (difficult to describe clearly in language), highly automated (requires almost no conscious involvement), and highly stable (not easily faded).
  • Formation: Through repeated practice and feedback, the efficiency of neural circuits is adjusted, redundant activities are pruned, until the behavior becomes “programmed.”

Q: How does the formation of implicit memory typically reflect the transformation “from explicit to implicit”?

点击翻转

In the early stages of learning complex skills, it usually relies on working memory (maintaining steps) and explicit memory (understanding rules); with repeated practice, explicit control weakens, automatic neural circuits become dominant, and eventually transform into implicit memory that requires no conscious thought (e.g., an adult can directly arrive at 3 x 4 = 12 without memorizing a multiplication table).

点击继续
点击继续

Q: In the case of patient HM (hippocampus removed), which two memory abilities were retained? What core conclusion does this prove?

💡 提示:Famous Case HM
点击翻转
  • Retention capacity:
    1. Short-term working memory (as long as attention is not interrupted).
    2. New implicit/procedural memories (such as “mirror tracing” skills improve with practice, even though he has no recollection of practicing them).
  • Conclusion: Memory is not a single system. Explicit memory (dependent on the hippocampus) and implicit memory (dependent on other circuits) are independent physiological mechanisms.

3. The brain is a predictive machine: learning must begin with making mistakes.#

The academic community confirmed 30 years ago that the brain is a predictive machine, and this theory was rigorously scientifically validated in 2010. However, this revolutionary understanding has yet to be universally accepted—our view of memory remains flawed. This cognitive bias not only leads us to misunderstand the brain but also severely obscures the true function of memory, thus affecting our definition of learning.

3.2 The Brain That Predicts All the Time#

Let me first assume you, as a reader, carefully read the titles. When you see the title of this section, your brainwaves will likely spike. This is because it contradicts your memory—all chapters should start with 1, so this chapter should be 3.1 instead of 3.2. Of course, this is quickly interpreted by the brain as “just a minor oversight by the author,” and forgotten within seconds.

In fact, prediction doesn’t just happen when you see the title of the first section of Chapter 3; it can happen at any moment in our daily lives. A paper from the 1980s demonstrates this: two cognitive scientists presented the sentence “My coffee has cream and a dog” word by word in an experiment and recorded brain activity. Normally, the brain automatically predicts that the blank should contain the word “sugar,” but in this experiment, the last word was changed to “dog.” Presenting the sentence word by word to participants in this situation, while recording their brain activity, showed that when subjects saw the “unexpected” word “dog,” their brainwaves surged, peaking about 400 milliseconds after the stimulus—a specific brainwave pattern known as the N400 effect. This strong signal isn’t simply humorous because of its semantic absurdity; it’s because it disrupts the brain’s prediction. 14

In addition, we can see what the brain predicts in non-verbal ways. Here’s a classic example—visual illusion:

视错觉

Visual illusions occur because the brain’s predictions have a higher interpretive weight than the body’s sensory input. In other words, when there is a discrepancy between sensation and prediction, the brain is more likely to believe the prediction.

In the image on the left, although your eyes tell you that block A is darker than block B, their physical colors are actually exactly the same. This is because your brain not only perceives color but also predicts light and shadow. It “anticipates” that the cylinder will cast a shadow, and automatically compensates for the brightness so that you can see the object clearly even in the shadow. From this optical illusion, we can conclude that your brain doesn’t care about the raw light wave data; it only cares about what it perceives as the “real” world.

The image on the right illustrates the uncertainty of the brain’s predictions. When you predict it’s a rabbit, the long strip becomes the ears; when you predict it’s a duck, it becomes the beak. Interestingly, it’s difficult to see both simultaneously. This shows that the brain can only maintain one primary predictive model at a time and filters the visual information you receive based on that model.

Therefore, it is clear that the brain is not a device that strives to “faithfully reflect the world.” The world we perceive is most likely a result of the brain processing information based on certain things.

3.2 Memory as a Predictive Model for the Future#

Before proceeding, ask yourself a question—what do you expect to see at the beginning of this section?

After a section 3.2 that should have been 3.1, you might be starting to predict—and at the beginning of this section, I’ll also introduce some examples to reinforce your understanding of how the brain makes predictions. Or perhaps you’ve seen the chapter title and are trying to guess that I’m going to argue that “we make a series of predictions based on our memories.” —Congratulations, you’re right. This confirms that we constantly use our past memories to predict the future. “Generally, each summary begins with a certain format,” “Other articles are written this way, so this one I’m reading will definitely be,” “The title largely summarizes what the author wants to express in this chapter.”

In the past, people believed that memory served the future, not the past. However, a growing body of research points out that the core value of memory lies in future-oriented prediction. Bar et al., in a 2013 response paper, proposed that the primary function of the memory system is to help organisms prepare for future actions. The brain recombines accumulated fragments of the past and simulates future scenarios to guide our current decisions. The hippocampus, a key structure in the brain responsible for memory formation and retrieval, is also strongly activated when we look to the future, indicating a close neural link between imagining the future and recalling the past. Therefore, our brains store past events to better predict the future. Whether it’s working memory maintaining information about the current situation, long -term memory providing vast background knowledge, or implicit learning unconsciously accumulating patterns, these memory processes all serve prediction, helping us cope more effectively with environmental changes.

Even stronger evidence comes from another neuroscience study in 2007, which found that when the hippocampus, a key region responsible for memory formation and retrieval, is damaged, people are not only unable to recall the past, but also unable to construct coherent future scenarios. This finding further suggests that the memory system functions as the basis for providing contextualized predictions of future actions. 16

Ultimately, memory is the brain’s survival strategy for coping with the future by simulating and reorganizing historical parameters. It is not like the storage medium we are familiar with.

3.3 There is no unknown information, only incorrect predictions.#

Imagine you see an extremely strange-shaped, never-before-seen object on the street. Would your brain immediately display a “404 Not Found” blank box? Or when you see an extremely complex physics formula, would you really feel like you know nothing about it?

Absolutely not. Your brain will immediately start working like crazy: “It looks a bit like a giant durian, or a spiky alien spaceship, or a modern art sculpture.” Or your brain will start analyzing the symbols in physics formulas, assuming V represents velocity and T represents time. See, even when faced with something completely unknown to you, your brain is still making predictions.

Neuroscience tells us that the brain has an extreme aversion to chaos and uncertainty. Facing a completely unknown concept is extremely dangerous and energy-consuming for the brain. Therefore, the brain’s strategy is to never leave the prediction blank. No matter what novel stimulus it encounters, the brain will quickly retrieve the most similar model from its memory bank and forcefully apply it to the current situation—even if the prediction is very far-fetched.

This is why we see human faces in clouds and Bigfoot in photos of Martian rocks. This phenomenon is called pareidolia.

A 2014 study revealed the mechanism: when we see a blurry image, our brains don’t passively wait for it to become clear; instead, the frontal lobe forcibly “fires” a predictive signal from top to bottom—“It’s a face!” The brain prefers to construct an illusion through a false prediction rather than face pure uncertainty. 17

This principle offers tremendous insights for our learning.

We often think of learning as a process of “going from nothing to something”. But from the perspective of predictive processing, learning is actually a process of “going from wrong to right”.

When you first learn about “electrons” in quantum mechanics, you can’t understand them out of thin air. Your brain will recall old predictive models: you’ll imagine an electron as a tiny ball (planet) orbiting the sun (atomic nucleus). This is clearly a physics error, because the nature of electron clouds is far more complex. But it is precisely this erroneous prediction that becomes the scaffold for understanding new knowledge. The subsequent learning process is essentially a process of constantly discovering that the “ball model” cannot explain diffraction phenomena (prediction error), thus gradually repairing and even overturning the old model.

This also explains why adults often find it more difficult to learn new concepts than children. Children have relatively poor predictive models, meaning they have fewer “priors are flat.” Adults, on the other hand, possess large and robust predictive models. When faced with new information, the brain’s first reaction is often to try to “assimilate” it using old models, thus ignoring subtle differences. This is known in psychology as functional fixedness—because you’re so certain the box is for holding things (the predictive model is too strong), you can’t see that it can also be pinned to the wall as a candlestick.

Therefore, the real learning obstacle is often not because the information is too new, but generally because our old predictions are too strong. — Old knowledge can hinder the learning of new knowledge.

In this sense, there is indeed no absolutely “unknown information” in the world; all “unknowns” are essentially “erroneous predictions” waiting to be corrected.

3.4 Conditions for Model Update - Prediction Error#

What happens when predictions deviate from reality? — Hey! It’s time to update our predictive model! What? You don’t want to update it? Okay, maybe your current model is good enough for you. But there’s always a time when models need updating, which is when predictions go wrong and cause serious consequences. After all, you don’t want to cause serious consequences again.

—What? You’re saying you updated the model on how to kneel down and apologize, instead of how to avoid repeating the mistake?! Oh my god…

The joke above illustrates how, when our predictions fail and have significant consequences, we transmit error signals to higher brain regions, triggering updated models. A 2018 paper proposed the theory of predictive coding, which states that different levels of brain regions build internal models based on past experience, continuously predicting perceived input and comparing these predictions with actual input. When predictions do not match reality, a “prediction error” signal is generated and transmitted upwards, prompting the brain to adjust its internal models to reduce future errors. Furthermore, neural circuits at various levels of the brain can suppress explainable discrepancies, treating only unexplainable errors as valuable new information. In other words, we actively use our memories to measure reality. For example, when reading, we continuously predict the next word based on language knowledge in long-term memory, focusing our attention on unexpected information.

From this, we can conclude that learning is triggered by prediction failure (Philippe, 2006). If everything goes as expected, the brain doesn’t learn anything new. When predictions match, the brain feels neither surprised nor pleased, and neural activity remains normal; we only learn when predictions fail. The classic conditioned reflex experiment (Miami Symposium on the Prediction of Behavior) also reveals this: if a dog has learned that a bell means food, and then a light simultaneously appears with food, the dog can hardly form a new association with the light because the food reward was perfectly predicted by the bell—there is no prediction error, and therefore learning is “blocked.” This famous “blocking experiment” shows that there is no new learning without surprise: learning requires unexpected results, or in other words, the “brain makes a mistake.” Theoretically, modern learning models (such as the Rescorla-Wagner model) indicate that the driving force of learning is the difference between expectation and outcome. Each training session gradually reduces the prediction error until the expectation matches reality. Only at the moment of discrepancy does the brain truly become busy updating the relevant connections. 19 20

Of course, besides prediction errors, the brain also considers whether it’s worth updating neural circuits. After all, updating neural circuits is quite energy-intensive. Therefore, in learning, this often manifests when prediction failure is accompanied by significant consequences (such as a disastrous exam result), although this updating drive may only last a few days. The question of whether it’s worthwhile is generally related to dopamine, a neurotransmitter familiar to many. An experiment illustrates this (Tobler, Philippe). Dopamine neurons in the brain adjust their activity based on prediction errors: they become abnormally excited when the actual reward is worse than expected; they become silent when the reward is worse than expected. Whether it’s an unlimited burst of dopamine or dopamine quiescent waiting, these processes guide synapses to make plastic changes, thus “writing” new experiences into the brain at a physiological level. Now, suppose you take a multiple-choice exam where the results are released immediately afterward. If your actual score is higher than expected, it might reinforce your pre-exam behaviors (such as studying diligently or praying to gods); if your actual score is lower than expected, it might weaken your pre-exam behaviors (such as playing video games). 19

In this section, we discussed a principle—when a prediction error causes serious consequences, it triggers neural shaping circuits (at which point learning begins). This is our core learning principle, so please remember it.

3.5 Why can the brain simultaneously believe in contradictory theories?#

As mentioned earlier, the first step in learning (model updating) begins when a prediction error incurs a significant subjective cost; only then will the brain be willing to expend considerable energy to initiate the process of “finding a new model.” However, certain conflicts can also arise during the search for a new model.

First, let’s look at a very basic learning scenario. Why is it relatively easy to correct a misconception in a physics class? Suppose a student firmly believes that “heavier objects fall faster than lighter objects” (this is an old model based on intuition). When the teacher releases a feather and an iron ball simultaneously into a vacuum tube, and the student sees them fall at the same time, the brain instantly receives an undeniable “surprise signal.”

In this scenario, the prediction error is clearly visible. The actual data (the balls land simultaneously) completely contradicts the prediction (the iron ball lands first), instantly disproving the old model. Because the error is so obvious, the brain has no choice but to quickly search for and accept the new model that can explain this phenomenon—Newtonian mechanics. This moment when the “old model completely fails” is precisely when knowledge learning is most efficient.

However, when we step outside the scientific perspective (assuming you truly understand what science is), we discover numerous flaws in the learning process. At this point, the new models we discover often fail to prevail in debates with the older models.

Take an economics student as an example. Initially, through extremely difficult study, he constructed a perfect Keynesian model in his mind (government intervention, aggregate demand determination). This model worked well in his mind, explaining the Great Depression and passing the final exam. However, in his later years, he was suddenly exposed to post-Keynesianism. The new theory told him that his original understanding was incomplete, that money is endogenous, and that uncertainty is fundamental.

At this point, the old model (Keynesianism) has not been completely disproven, and it can still explain many phenomena; however, the new model (post-Keynesianism) seems to offer a deeper perspective in certain situations. Faced with this predictive error that is “neither completely right nor completely wrong,” the brain is caught in a dilemma.

What’s the result? You’ll force your brain to perform a “partitioning operation.” It will forcibly create two areas in your mind: one area runs the “standard Keynesian model” to handle explaining phenomena; the other area runs the “post-Keynesian model” to prepare for a different exam. In neuroscience, this is called multiple model coexistence. At this point, the brain must expend extra energy to determine which “context” it’s in and thus decide which model to use. When explaining economic phenomena, you might favor the Keynesian model, but when explaining economic phenomena for an exam, you might favor the post-Keynesian model.

A 2001 paper supports this argument. The MOSAIC model, proposed by computational neuroscientists Wolpert and Kawato, suggests that multiple modular ‘mini-experts’ operate within our brains, competing for roles based on context. A 2010 study by Gershman et al. at Princeton University further revealed the algorithmic logic of this mechanism: when the prediction error becomes large enough, the brain performs ‘state splitting’—it no longer attempts to correct the old model but directly determines that it has entered a completely new ‘latent state,’ and the brain creates a new memory partition for this purpose. This explains why the student’s brain could simultaneously hold two contradictory economic theories: like a dual-system computer, the brain uses the hippocampus’s ‘pattern separation’ function to physically isolate these two models in different neural circuits. 21 22

A similar situation exists with the clash between science and pseudoscience—with the development of the internet, a group of pseudo-scientists, pseudo-scientific viewpoints, or conspiracy theories have gradually gained dominance in public opinion. This poses a huge challenge for a highly educated person. At work, they must use rigorous logic and scientific thinking (Model A); but when scrolling through short videos or social media, they are attracted to highly emotionally charged but unscientific viewpoints (Model B). If the brain cannot effectively “zone” these two models, they will clash. The final result is like the economics student mentioned above: during the day at work, he might be a staunch defender of science; but after work, back in his personal life, to fit in or seek psychological comfort, he will instantly switch to “anti-intellectual mode,” forwarding pseudo-scientific articles. To maintain internal stability, the brain completely physically isolates these two mutually exclusive worlds.

Therefore, the prerequisite for finding a new model is that the old model’s predictions fail. Learning a new model does not require you to completely reject the old model; you can hold both simultaneously.

Also, please remember that “failure” here is a subjective concept—it means “the inability to explain your current situation using the old model”.

  • In physics class, the old model could no longer explain the vacuum tube phenomenon (it completely failed, leading to its replacement).
  • In economics, old models cannot explain new theories (partial failures lead to coexistence);
  • In real life, the old model cannot explain emotional needs (situational failure leads to disconnect).

3.6 Model Update - Patching or Refactoring#

Let’s say you’ve already decided to update your model. However, you’re often faced with two paths: a complete, energy-intensive overhaul, or a low-energy patching approach. Neuroscience tells us that the brain is an extremely shrewd energy manager. It doesn’t care whether the model is “objectively correct,” but only whether it can eliminate current prediction errors with minimal energy consumption—so most of the time we choose patching, but sometimes we opt for a complete overhaul.

A 2012 study found that when the environment is stable (small errors), the brain’s “learning rate” is low, tending to ignore deviations or make minor adjustments (patching). When the environment undergoes a sudden change (drastic errors), the pupils dilate (representing the release of norepinephrine), the learning rate spikes, and the brain quickly discards old values ​​and accepts new ones (a complete shift). 23

Therefore, when errors occur in a stable environment, we attempt to explain the current anomalies by adding additional conditions (auxiliary assumptions) while keeping our original beliefs unchanged. This is called “patching”.

The most classic example is the geocentric model in astronomy. Ancient people firmly believed that the Earth was the center of the universe (the core old model). However, observational data revealed that Mars sometimes “retrogrades.” This completely contradicted the predictions of the old model. At this point, astronomers did not abandon the geocentric model, but instead added a complex “patch” to the model called “epicycles.” They explained that Mars not only revolves around the Earth, but also orbits itself in a smaller circle. If one small circle couldn’t explain it, they added another. By increasing the complexity of the model, they forcibly eliminated the predictive error, thus protecting the core belief from collapse.

Of course, this kind of patching is more common in everyday life; it’s an instinct of the brain to maintain continuity. It might be used to “maintain” science—for example, if experimental data doesn’t match, scientists might think, “Is the instrument broken?” or “I haven’t eliminated a confounding variable” (patching up the theory) rather than overturning physical laws. It might also be used to maintain superstition—when rainmaking fails, believers might think, “Was my heart not sincere enough?” or “Were the offerings not abundant enough?” (patching up their faith) rather than immediately denying the existence of God.

The brain doesn’t need to change the overall topology of the neural network; it only needs to adjust a few parameters or store a few “exceptions” in the hippocampus. This is very energy-efficient and allows us to quickly eliminate anxiety. However, when there are too many “patches” to make the model bloated, or when the prediction error is too large to be patched, the brain may tend to—reconstruct.

Just as the geocentric model had dozens of patches, Copernicus discovered that simply assuming “the Earth revolves around the Sun” instantly explained all retrograde phenomena. Thus, the brain, in pursuit of simplicity, abandoned the geocentric model. Similarly, a person burdened by the pressures of modern life and feeling nihilistic might find that science cannot explain their suffering (prediction error). A religion or pseudoscientific theory (even if it seems absurd) can instantly explain all their experiences with just one core assumption—“everything is predetermined.” At this point, the brain, in pursuit of this “explanatory power,” will also abandon complex rational thinking and completely succumb to superstition.

Completely abandoning the old neural circuits corresponds to structural learning. This requires suppressing old neural circuits and growing entirely new synaptic connections. This is a high-energy-consuming process, usually accompanied by intense emotional upheavals—whether it’s the ecstasy of a scientist discovering the truth or the fervor of an ordinary person going astray, it is essentially the brain undergoing a dramatic model reconstruction.

3.7 Summary#

In previous chapters, we discussed neuroplasticity and various types of memory. Now we can see how they affect the brain’s predictive function. Working memory constantly records key information about the current situation, allowing the brain to instantly calculate and predict the next changes. Explicit memory provides the material and basis for predictive models, allowing us to infer the “new” from the “old” experience. Implicit memory works silently; the rules and skills we unconsciously learn (such as language grammar or the sense of balance on a bicycle) manifest as an instinctive intuition, enabling us to make timely and accurate predictions about environmental changes. All these forms of learning rely on the foundation of neuroplasticity—the strengthening or weakening of countless neural connections, which shapes the constantly updated models within the brain. When prediction errors occur, these connections are further adjusted and consolidated through error feedback, making the relevant memories persistent circuits. Ultimately, the brain shapes one or more predictive models in an intricate neural network, allowing us to steadily navigate the unknowable future.

Q: What does this article consider to be the essential mechanism by which the brain processes information? (How does it differ from the traditional view of “reflecting the world as it is”)

💡 提示:Core Metaphor
点击翻转

The brain is not a passive recorder, but an active predictive machine.

  • Mechanism: The brain constantly “predicts” sensory input based on internal models, focusing only on deviations that do not conform to the predictions.
  • Evidence includes the N400 effect (a surge in brain activity when a language prediction is violated) and visual illusions (the brain modifies the colors or shapes it sees to conform to a model).
点击继续
点击继续

Q: From a predictive processing perspective, what is the primary biological function of “memory”?

点击翻转

Serving the future, not recording the past. The brain reassembles fragments of the past to simulate future scenarios and aid in decision-making.

  • Evidence: Patients with hippocampal damage are not only unable to recall the past, but also unable to construct coherent images of the future.

Q: What strategy does the brain typically employ when faced with completely unfamiliar new things (“unknown information”)?

💡 提示:About the Unknown and Pareidolia
点击翻转

The brain dislikes blank spaces and will forcefully invoke old models for “assimilation.” It will retrieve the most similar old experiences and apply them to the current thing (e.g., imagining electrons as planets, or seeing a face in the clouds). Conclusion: Learning is not “from nothing to something,” but rather a process of “from wrong to right” (i.e., correcting incorrect predictions).

点击继续
点击继续

Q: Neuroscience believes that the only necessary and sufficient condition to trigger “learning” (neural circuit updates) is?

💡 提示:Core Learning Conditions
点击翻转

Prediction error.

  • Principle: The brain only adjusts its activity and drives synaptic changes when reality does not match expectations (feeling surprised or making a mistake).
  • Counterexample: If the prediction matches perfectly (as in a blocking experiment), the brain will not learn anything new.
点击继续

Q: When old and new models conflict but cannot be completely falsified (such as economic theories with different explanatory power), what strategies does the brain use to avoid cognitive collapse?

💡 提示:Coexistence of Contradictory Views
点击翻转

State splitting (or partitioning). The brain utilizes the hippocampus’s “pattern separation” function, like a dual-system computer, switching between different models according to the context (e.g., using model A in an exam and model B in daily life), thus allowing contradictory theories to be physically isolated and coexist in the brain.

点击继续

Q: When updating a model, does the brain tend to “patch” or “reconstruct”? What is the difference?

点击翻转

The brain follows the principle of minimizing energy consumption, and prefers patching.

  • Patching: Retain the core beliefs and add auxiliary hypotheses to explain the anomalies (such as the geocentric model’s “epoch”), which is energy-efficient but can lead to a bloated model.
  • Reconstruction: When the error becomes too large to be repaired, the old model is completely overturned and a new connection is established (such as switching to the heliocentric model), which is extremely energy-intensive and accompanied by emotional fluctuations.
点击继续
1 / 16

4. From Novice to Expert: How Complex Models Are Built Step-by-Step#

Everyone has predictive abilities, but they vary from person to person. While people can predict the answers to problems, it’s clear that most people don’t believe they can accurately predict the answers to advanced math problems.

The core task of this chapter is to explain how we move from “incorrect predictions” to “accurate predictions,” and to understand how a complex predictive model is built step by step. From a neuroscience perspective, “learning” is essentially about building more complex generative models in the brain by continuously correcting predictive biases. When we can readily answer a complex problem, it means our internal model has perfectly predicted every step of the deduction, minimizing unexpected outcomes.

4.1 Chunking and Long-Term Models Help Experts “Remember More”#

There is a classic case study in this field—it was discovered that chess masters can often memorize complex piece positions at a glance, while novices often cannot. In the early 1970s, researchers began studying this to understand how chess grandmasters could remember piece positions so accurately.

Researchers posed a simple question: Do chess grandmasters recall the position of every single piece, or do they actually only remember the overall board position and treat the individual pieces as part of a larger whole? They conducted a simple yet effective experiment. They set up two chessboards and tested national-level chess players (i.e., chess grandmasters), intermediate-level players, and beginners. On one board, a real game was played; on the other, pieces were simply arranged in a more haphazard manner.

When a chess grandmaster sees a real chess game, after five minutes of study, they can remember the positions of about two-thirds of the pieces, while a novice can only remember about four. When they see a board with randomly placed pieces, both intermediate and advanced players (novices are still at a disadvantage) can only remember the positions of two or three pieces. At this point, the advantage of experienced players disappears. For a grandmaster, upon seeing a real chess game, the brain quickly invokes prior models for prediction. The positions of the pieces seen highly match the brain’s predictions. Therefore, the brain doesn’t need to record the coordinates of every piece; it only needs to record the information “this is a variant of the Sicilian Defense” and then perform a little deduction. 24

A similar experiment can be conducted again now. Since everyone reading this article is a master of Chinese, please try to remember:

  • There are many people among them, but I am not one of them.

Take a few seconds to look at these words, then close your eyes and try to recall the order in which they are written.

This task is not easy. Although you are very familiar with these Chinese characters, their random arrangement and lack of grammatical structure and semantic connections make it impossible to predict the next piece of information. Therefore, you often only remember two or three, or even get completely confused. Of course, some people might try to use a trick, such as making up a homonym or a mnemonic—which is certainly effective, but you definitely won’t find the previous line of characters easy to remember. The next set of materials is the opposite; you can probably remember them easily in a few seconds.

Now please look at another set of materials:

  • We are studying the working mechanisms of human memory.

This time, even with significantly more words, you’ll find yourself able to repeat it almost entirely. This is because when your brain reads “we,” it automatically predicts, based on grammatical conventions, that it might be followed by “is” or “thinks.” When your eyes see “is,” the visual input perfectly matches the prediction, and your brain hardly needs to expend any energy processing it. However, in the first example, you could only struggle to remember individual words.

Based on the above experiments, we found that the difference between chess masters and novices lies not in working memory capacity, but in the fact that masters possess a large number of “chess board diagrams” in their long-term memory. Masters can automatically integrate the positions of multiple pieces into a few meaningful memory units, thus making more complex predictions. Novices cannot do this. However, when faced with a chaotic chessboard, their performance is almost identical, much like how we Chinese experts try to memorize our first set of material. Therefore, we can conclude that the prerequisite for predicting complex things is building a large number of “chunks” (i.e., predictive models) in memory. The essence of chunking is compressing a large amount of uncertainty into a simple predictive unit, thereby greatly reducing the computational power required for the brain to process prediction biases in real time.

4.2 Sources of cognitive load#

If you feel mentally overloaded (“brain-burning”) when performing complex deductions, this means in neuroscience that your brain is processing excessive predictive bias. We can think of cognitive load as the systemic resources the brain consumes to eliminate “unexpected” and “uncertain” events.

So, where does cognitive load come from? Researchers have categorized these loads into two aspects based on their sources:

  • External loads—ineffective biases caused by noise interference. These have nothing to do with what you’re learning; they’re purely related to the way the material is arranged and the level of annotation. For example, a classical Chinese text, but with annotations for every single word in every paragraph and color illustrations, makes what was originally a simple article complex. Or a product manual with too many technical terms (like asking why I need to know the materials science names; just tell me what it does, like “this is number 1,” not “expansion screw”). Dealing with these meaningless prediction biases wastes valuable mental energy.
  • Intrinsic load—the necessary bias required for model reconstruction. For example, 2. 2 ​ It’s harder than 1+11+1 because 2 2 ​ The concepts involved (irrational numbers, square roots) haven’t yet formed a complete predictive model in your brain. You need to mobilize resources to understand them; this predictive bias caused by “new knowledge” is an inevitable part of learning.

Okay, now:

Total Load=Noise Processing Load (External)+Model Restructuring Load (Internal)\text{Total Load} = \text{Noise Processing Load (External)} + \text{Model Restructuring Load (Internal)}

Therefore, we can conclude that when external load (ineffective information presentation) takes up too much space, the space left for internal load (understanding the core logic) becomes smaller. This is why, even if we are of normal intelligence, we feel like our “brain can’t work” when faced with poorly formatted and verbose textbooks.

In the next two sections, we will explore how to reduce external load by “optimizing presentation” and how to deal with internal load by “deconstructing information”.

4.3 Reduce external load - eliminate redundancy#

External load is generally related to the way learning materials are presented. Often we feel that something that could be explained in one sentence is stretched out in eight or ten sentences in textbooks, leaving us feeling like we don’t understand it at all after reading it. Here’s a 1993 study on elementary school students’ origami tasks that effectively explored the influence of related factors on cognitive load. 25

The first experiment tested folding a circular piece of paper into an isosceles triangle. In this task, they assigned different cognitive materials to groups A and B:

  • Group A (Illustrated Group): Provides only self-explanatory illustrations, accompanied by a very small number of arrows and words.

  • Group B (Redundant Group): Same diagram, but with full text description.

认知负荷实验-折纸

During the testing phase, the illustrated group significantly outperformed the redundant group. Subsequent experiments further confirmed that using images alone was far superior to “image + text” or “plain text”.

Therefore, we can conclude that when images are sufficient to illustrate a point, additional textual descriptions may become a distraction.

Of course, subsequent experiments in the paper further illustrate the inverse benefits of redundant information on comprehension tasks. The first experiment on this was the effect of multiple views—one group of students only saw the front view, while the other group was asked to see the back view of the origami in addition to the front view. The results showed that the group without the back view completed the task in a significantly shorter time during the test.

Another experiment differentiated the placement of instructions. In one group, the instruction booklet was placed on the left, and the paper slip on the right. Students had to constantly switch their gaze between the two. In the other group, the instructions (lines, arrows, numbers) were printed directly on the paper slip to be folded. The results showed that the group with the embedded instructions significantly outperformed the other groups in both the learning and testing phases.

This experiment allows us to be almost 100% certain that external load exists, and the way materials are presented affects the degree of load. When an image can already enable the brain to accurately predict the next action, additional textual descriptions become interference signals. The brain is forced to divert its attention to process the text, trying to figure out “Is this text saying something else?”, only to find that the text and the image are saying the same thing. This process of repeated verification generates unnecessary costs in handling prediction bias.

Of course, the subjects of the above experiment were primary school students, which might be overlooked by us adults. Let’s now look at another research case in online anatomy education. A quasi-experimental study at the Iranian University of Medical Sciences (IUMS) involving 104 basic medical students further validated this physical logic. 26

Researchers divided students into an intervention group and a control group, both receiving instruction in digestive system anatomy on the same Big Blue Button platform. The intervention group’s teaching materials were optimized strictly according to cognitive load theory, eliminating unnecessary colors, lines, noise, and irrelevant content from the screen. Auditory and visual channels were simultaneously engaged, redundant and repetitive text was avoided, and images were used in conjunction with verbal explanations. Data analysis showed no significant difference in academic engagement between the two groups before the intervention. However, after the intervention, the group following cognitive load theory showed a significant improvement in academic engagement scores.

This demonstrates that even in highly complex fields, reducing external burdens remains highly beneficial for learning materials. By optimizing presentation methods (such as reducing redundancy and integrating text and graphics), we can eliminate prediction biases irrelevant to the learning objectives, allowing the brain to focus all its computational power on building the core model.

4.4 Reduce Internal Load - Divide Complexity#

The inherent complexity of knowledge is something we cannot eliminate. For example, when faced with a complex concept like Python’s For loop, if you try to understand variables, sequences, indentation, and iteration logic all at once, your brain will instantly face a massive and utter failure in prediction—because there are too many unknown variables, causing the old model to completely fail and making no effective predictions. Therefore, what we need is to learn step by step, first forming chunks and then learning the whole. We need to first understand what an iterator is before looking at what a For loop is; the corresponding method is called—segmenting material.

A 2024 paper investigated the impact of segmented material in English language teaching. 27 It suggests that the most effective weapon against high-load material is “segmentation.” The study found that breaking down highly concentrated PPT slides (integrated groups) into 10-page, more dispersed PPT slides (segmented groups) significantly improved learning outcomes. The paper also analyzed the logic behind segmenting knowledge points, arguing that segmentation separates these points, giving working memory a chunking opportunity to memorize information, allowing the brain to remember less information simultaneously when completing the current unit. Secondly, segmenting material into small fragments allows us to learn the names and characteristics of individual symbols first, and then their interaction logic, when faced with extremely complex systems. This helps control the magnitude of prediction bias.

Next, let’s take learning Python’s for loop as an example. An inefficient way to divide it is to first talk about variables, then lists, then syntax, and finally the loop logic. Although it seems to be separated, learners still need to call all the information at the same time when they actually write code, and the difficulty of prediction has not decreased compared to the previous approach.

A more reasonable way to divide the data is by using cognitive units as boundaries:

Phase 1 (Building Sequence Intuition): Only print(1) , print(2) , print(3) are shown, and then for x in [1,2,3] is introduced.

  • Prediction goal: To help the brain build a simple model of “Oh, this thing automatically helps me count.”
  • Load status: Only corrects for prediction bias regarding “repetition”.

Phase Two (Introducing Variable Placeholders): Demonstrating that for x and for y have the same effect.

  • Prediction objective: Update the model – “The original name is not important; it is just a code name.”
  • Load status: Only corrects prediction bias regarding “symbol naming”.

Phase 3 (Syntax and Indentation): Finally, we will explain the rules of colons and indentation.

for i in range(5):
print(i)

Prediction objective: Refined model – “Only indented code will be looped.”

The essence of segmenting material is to avoid exposing the brain to too many prediction biases simultaneously. We break down a large bias into multiple smaller biases. At each step, the brain only needs to process a small unexpected event and update the model a tiny bit.

4.5 Summary#

The so-called “complexity” simply stems from your brain’s lack of a corresponding prior model, causing predictions to completely fail, and working memory to be overwhelmed by massive prediction biases. What we need to do is reduce noise interference by optimizing the presentation (reducing external load) and guide the brain to gradually correct the model by breaking it down into chunks (reducing internal load). When you can subconsciously and naturally predict the next line of code or the next step in a math problem, you have successfully created a complex yet accurate predictive model.

Q: Chess masters can remember complex chess positions, but when faced with a “randomly placed” chessboard, they perform no differently than a novice. What does this prove?

💡 提示:Chess Master Experiment
点击翻转

The advantage of experts does not lie in having a larger working memory capacity (innate intelligence), but in storing a large number of “chunks” (i.e., pre-existing predictive models) in their long-term memory.

  • Real-world chess game: Grandmasters can use models to make predictions, compressing multiple pieces into a single unit.
  • Random chess games: The lack of a corresponding prior model leads to prediction failure and memory breakdown.

Q: From a neuroscience perspective, how does “chunking” reduce the brain’s predictive burden?

点击翻转

Chunking compresses a large amount of uncertainty into a simple predictive unit. By invoking chunks, the brain doesn’t need to calculate every detail in real time (such as the order of each Chinese character), but only needs to predict the overall pattern (such as a complete idiom), thus greatly reducing computational demands.

点击继续
点击继续

Q: What are the two main components of “cognitive load” (brain-burning feeling)? Please briefly describe their sources.

💡 提示:Total workload = ? + ?
点击翻转
  1. External loads (noise): originating from presentation methods (such as messy layout, redundant annotations). These constitute invalid prediction bias.
  2. Intrinsic Load (Restructuring): This stems from the complexity of the knowledge itself (such as understanding new concepts). It represents the predictive bias that must be experienced when building a new model.

Q: Why does adding extra text descriptions reduce learning effectiveness when the image already clearly explains the problem?

💡 提示:Origami Experiment/Anatomy Teaching Case
点击翻转

This creates an external load (redundancy effect). The brain is forced to divert attention to process the text and repeatedly verify whether the text matches the image. This pointless “repeated verification” process wastes valuable cognitive resources and interferes with the construction of the core model.

点击继续
点击继续

Q: When faced with inherently complex and unsimplifiable knowledge (such as loop logic in programming), what strategies should be adopted to avoid overloading the brain?

💡 提示:Targeting High Internal Load
点击翻转

Material segmentation strategy. Break down complex systems into multiple independent cognitive units (e.g., learn variables first, then sequences, and finally grammar). Purpose: To control the magnitude of prediction bias, avoid exposing the brain to too many “unexpected” events simultaneously, and ensure that each step only corrects a small portion of the model.

点击继续

Q: According to the conclusions of this chapter, what is the essential transformation from “novice” to “expert”?

点击翻转

It’s the process from “constantly failing predictions” to “perfect predictive deduction.” Experts, by building sophisticated internal models, can reduce unexpected events to zero when facing complex tasks, achieving subconscious, automated, and accurate predictions.

点击继续
1 / 4

5. Truly effective learning paths: imitation and exploration#

For the vast majority of learning scenarios, we are fortunate. One of the greatest achievements of human civilization is the establishment of a vast “external model library,” which we can directly access at this moment.

This convenience aligns perfectly with our conclusion from Chapter 3: the brain, as a shrewd energy manager, always follows the “principle of least resistance” when reconstructing models. It is this biological “energy-saving instinct” that drives us to prioritize imitation—the shortcut of least resistance. However, when we find ourselves at the forefront of human understanding, external books and mentors often fail to provide the answers. At this point, the “shortcut” breaks down, and we are forced to embark on a completely different path—exploration.

5.1 Starting Point - Error Detection#

Before discussing the two types of learning, it’s necessary to address the prerequisites for learning. A common misconception is that people learn simply by making mistakes. The opposite is true—the vast majority of errors are systematically ignored by the brain. As mentioned earlier in section 3.4, regarding the conditions for model updates, we only begin to learn when we encounter prediction errors with significant subjective costs. Note the word “subjective”—it varies from person to person and is influenced by various environmental factors. Therefore, how we notice errors is also worth discussing.

Of course, this isn’t something people pay attention to, and schools never teach you this. Most students passively discover their mistakes during their daily learning process. For example, the path we take in learning mathematics—first learning addition, subtraction, multiplication, and division, then moving on to more complex coordinate systems and calculus, etc. As long as you follow this path, you will naturally discover mistakes and learn new knowledge. This path doesn’t require us to actively discover mistakes; we will naturally pay attention to these mistakes because of exams.

So, what about outside of school?

Generally, in the real world without “syllabi” and “standard answers,” the brain not only doesn’t actively seek out errors, but instead activates a defense mechanism called “confirmation bias.” When reality doesn’t match our internal models, the brain tends to distort perception to maintain the old model rather than overturn it. For example, a tennis beginner might attribute a missed shot to “too much wind” or “a bad racket,” rather than “my hitting motion model is wrong.” This mechanism protects our confidence but hinders learning. 28

Besides automatically maintaining old models, real-world errors are often difficult to trace back to their origins. For example, in today’s society, results are frequently delayed; a mistake made today may only lead to consequences six months later. The causal chain is too long, making it difficult for the brain to attribute errors to specific models. Furthermore, multiple causes may lead to a single effect, and we often struggle to grasp the key points, such as deteriorating relationships, work setbacks, or learning stagnation—these signals don’t provide clear answers like math problems.

To address these issues, here are some simple methods.

First, let’s address confirmation bias. It’s important to clarify that a 1990 study found that confirmation bias is generally caused by our emotions, with a slight added layer of blockage (the brain tends towards laziness). The solution is simple—maintain a belief that “I might be wrong” and “I love truth more than my teacher.” The next time you face a mistake, this reduces emotional blockage because you anticipate being wrong, reducing the intense need to protect your self-esteem or sense of security. It also provides a stronger motivation to seek truth, helping you combat your brain’s laziness. 28

Furthermore, we need to proactively gather information from the outside world. This mainly involves understanding the key conclusions of important disciplines, and even proactively delving into the progress of certain disciplines—these conclusions are often correct and can even overturn conventional wisdom. Essentially, each of these can help you uncover errors (just as this article’s foundation—the brain is a predictive machine)—and by understanding these conclusions, you can ensure your predictive models cover most of the necessary scenarios.

There doesn’t seem to be much research available on methods for detecting errors, so this article only presents two simple approaches, one of which is based on experience. More information may be available in the future, at which point I will update this article on error detection methods.

5.2 Imitation - The Lowest Cost of Learning#

In 99% of cases, when you face a difficult problem, you’re not the first person in history to solve it. When you don’t know why an apple falls to the ground, Newton has already provided the formula; when you don’t know how to cook, a recipe has already validated the steps. In these moments, the brain doesn’t need to reinvent the wheel; it only needs to perform one of the most efficient operations—model transfer.

This is known as “social learning” in neuroscience and evolutionary psychology. It refers to the rapid acquisition of existing skills and concepts through observation, imitation of others, or acquisition of knowledge from culture. Michael Tomasello aptly describes this mechanism as the “cultural ratchet effect”—each generation doesn’t have to start from scratch with trial and error; we, like a ratchet, lock in the wisdom of our predecessors, thus freeing up our energy to explore higher realms. 29

The neural mechanisms of social learning have received substantial research support. Scientists have discovered that the mirror neuron system provides the neural basis for imitative learning: when we observe others performing an action, the neurons in the brain associated with that action are activated in the same way as when we perform it ourselves. This “mental simulation” allows us to experience the behavior of others at a neural level without having to practice it ourselves, making it easier to replicate. Mirror neurons were first discovered in the 1990s by Italian neuroscientist Rizzolati and others, demonstrating the brain’s “empathic” reproduction function of others’ experiences. In terms of social learning theory, psychologist Albert Bandura proposed as early as the 20th century that humans can learn new behaviors by observing others (the famous Bobo doll experiment verified that children imitate aggressive behavior through observation). Bandura’s observational learning theory emphasizes that in many cases, learning does not require direct rewards or punishments; the demonstration of a role model is sufficient for the observer to learn the behavior.

The general steps of socialization learning include:

  • The old model is considered wrong (as discussed in 5.1), and possible new model theories are sought.

  • Learning new theories to quickly eliminate large amounts of structural errors

  • The residual error of the learned new model is finely corrected through feedback.

  • Solidify a stable new model during time and sleep.

Furthermore, it’s important to note that if the new theory found in this step is overly complex, please refer back to Chapter Four—at this point, we need to break down complex knowledge into simpler pieces, just like how a semester’s worth of material is divided into weeks, and then studied progressively in rounds. Ultimately, this will allow you to master the new theory.

The steps at this point are roughly as follows:

  • The old model is considered wrong (as discussed in 5.1), and possible new model theories are sought.

  • Learning the new theory block 1 allows for the rapid elimination of numerous structural errors.

  • The residual error of the learned new model is finely corrected through feedback.

  • Solidify a stable new model during time and sleep.

  • Learn the new theory block 2 to quickly eliminate a large number of structural errors.

  • The residual error of the learned new model is finely corrected through feedback.

  • Solidify a stable new model during time and sleep.

  • Learning New Theory Block 3

  • ……

Next, we will discuss how to conduct social learning efficiently.

5.3 Methods for finding credible new theories#

Students don’t need to find new theories; they just need to read the textbooks.

It’s actually very simple, and everyone knows this method: find academic papers. Here, we’ll simply rank the credibility of the information sources:

Information sourceNotice
Systematic reviews / Meta-analysis / Academic guideTo assess the rigor of the methodology and the quality of the research.
High-quality peer review + reproducible data and codeConsider design, functionality, and verifiability.
Standard textbooks/HandbooksSlow updates; cutting-edge technology may not be covered.
Preprint/Technical ReportUse it as a clue, not as a conclusion.
Serious media/science popularization/lecturesThe original source must be traced.
Social media/retellingThe default is untrusted and needs to be verified.

Generally speaking, books, high-quality papers, systematic meta-analysis of papers, and academic guides are the most reliable sources; there is basically no need to look at any other information sources.

For example, if you want to study whether intermittent fasting is effective, first search for “Intermittent fasting meta-analysis” . Read the reviews to see what they say; this can help you avoid being misled by a few extreme experiments. Then search for relevant highly cited papers, or whether there are any systematic books (the books must contain a large number of cited papers, not just the author rambling on).

Of course, I don’t intend to condemn all science popularization creators here. Some creators do achieve highly credible updates, such as Veritasium—the most typical characteristic of which is that they include academic paper citations at the end of the article/introduction or video ending. This greatly increases the credibility of their work, and often includes their own statistical analysis.

Another point to note is that the “most reliable source” varies slightly across different fields, for example:

  • Mathematics/Logic: Peer-reviewed paper + formal proof
  • Medicine/Public Health: Systematic reviews/guidelines are generally more valuable than single papers.
  • Economics/Social Sciences/Psychology: Identification strategies, reproducibility, and external validity are particularly critical (psychology has a well-known reproducibility crisis, namely, the discovery that some psychological experiments cannot be reproduced).
  • Engineering/Computer Science: Reproducible experiments, standards, benchmarks, and open-source implementations are very important.
  • Pseudoscience: Don’t believe it! If someone says that astrology can predict the future, they are definitely just trying to make money off you.

5.4 Imitation - Rapidly Eliminating Large Amounts of Errors#

Once you’ve successfully identified a new theory to learn, what should you do? Generally, we can master a concept with a suitable workload in a short time, except for a few minor flaws. This section focuses on how to quickly grasp the general content of a theoretical model, thereby gaining a certain level of application ability.

5.4.1 Seeing the message#

If you can’t even see it, then there’s no point in learning. Therefore, seeing information is the prerequisite for all our learning. You might think this is just stating the obvious, but it’s not. Let’s do an experiment—here’s a video; please watch it. Your task is to count how many times the person in the white shirt passed the ball.

https://www.youtube.com/watch?v=6E035QRzHbc

After watching the video, you’ll have a completely new understanding of “seeing information.” This is a very famous psychology experiment. Statistics show that over 50% of people tend to ignore things in a scene that are extremely obvious, even “outrageously obvious.” The reason isn’t that the eyes didn’t receive it, but that the brain initially judges “irrelevant information” as noise—your task is to count the number of passes, so the brain will focus almost all its resources on the “white shirt—ball—pass” clue, and other things, no matter how obvious, may be filtered out on the spot. In 2013, researchers further studied this phenomenon, this time involving 24 radiologists. Researchers had them complete a familiar task of detecting lung nodules. In the last case, researchers inserted a chimpanzee with a nodule 48 times larger than a normal lung nodule. The results showed that 83% of the radiologists did not see the chimpanzee. Eye-tracking data showed that most of the doctors who missed the chimpanzee were directly looking at its location. Therefore, even experienced expert examiners can be affected by involuntary visual blindness within their professional field. 32

This is directly related to learning: in the classroom, in books, and in online courses, what truly determines whether you learn effectively is often not whether the material is provided, but whether you invest your limited attentional resources in “information that triggers model updates.” In the previous chapter, we broke down cognitive load into “external noise” and “internal modeling,” and attention is that switch: if you turn it on wrong, external noise will fill your working memory; if you turn it on right, you have room to process real prediction errors. In other words—attention determines what you’re using to update your brain’s model. A student whose mind is full of thoughts about what to eat for lunch is unlikely to learn in class. More precisely, “seeing information” depends on three functional components of the attention system: alertness, orientation, and executive control. They answer three questions respectively: when to focus attention, what to pay attention to, and how to process the information being focused on.

Alertness refers to the brain’s ability to maintain a basic level of arousal and be prepared for potentially important information. It determines whether you have sufficient neural resources to enter a learning state. Teachers often induce your “alertness” system in the classroom, most commonly by saying, “We’re about to get to the important points, so pay close attention.” When alertness is insufficient, your brain operates more like a “low-power mode”—the working memory window is narrower, executive control is weaker, and any slightly complex material is subjectively perceived as “tiring, annoying, and difficult,” commonly known as not taking it to heart. Alertness is more closely related to physiological signals, especially when life or death is involved, it is stimulated to the greatest extent. This is why we find potential in video games in this regard. In addition, you can actively cued your brain, such as by adopting a fixed learning posture or wearing headphones while studying. Just as some people specifically adjust their state for studying, this tells the brain that it is time to enter a high-alert task.

Orientation is the ability to direct attention to a specific location, feature, or clue after alertness has been established. Missing “obvious information” in a video experiment is essentially orientation hijacking: you’re only orienting yourself towards the “ball’s trajectory and passing motion,” downplaying everything else. This form of attention amplifies the selected signal and significantly diminishes those considered irrelevant. In learning, this orientation failure is quite common. For example, students might only focus on bolded, highlighted, or emphasized sentences, but these may not be the truly insightful points of the lesson. And then there’s the possibility of inattentive attention to external stimuli, such as pop-ups, messages, classmates talking, or even unnecessary highlights and fancy layouts on a page, further emphasizing the importance of the learning environment. Coupled with today’s focus on grades, students are more likely to orient themselves towards “results” rather than “relationships”—focusing solely on answers in math, copying code in programming, and memorizing definitions when learning concepts. We can’t say this orientation is completely ineffective, but it deviates from what we were originally meant to learn. It’s like a robot vacuum eventually learning to put down the trash and then pick it up again.

Executive control is responsible for making choices in conflict: suppressing distractions, suppressing immediate rewards, and sticking to the task goal among multiple candidate cues (there may be multiple oriented goals, which will switch if not controlled). It’s the part of “I know where I should look, but I just can’t do it.” This is why many people attribute their learning failures to “lack of self-discipline,” but a more accurate statement is: executive control is a finite resource that can be quickly depleted by noise, fatigue, multitasking, and emotional exhaustion. Once executive control goes offline, your orientation will be pulled away by the strongest stimuli in the environment, and you’ll revert to the least effort mode: scrolling through short videos, rereading the same passage, copying answers. At this point, it’s best to take a complete break, such as getting up and walking around. This can hopefully restore your executive control’s ability, allowing you to fully immerse yourself in learning again (you might think this is ineffective, but newer research confirms that daily attention deficits can be recovered through exposure to nature, but scrolling through your phone during rest certainly won’t restore it). 33

In summary, focus intently on what you really need to learn and avoid distractions.

5.4.2 Deep processing#

When you want to learn how to ride a bike, what do you do? You buy a bike and ride it, right? Instead of watching others ride and hoping you can learn by watching them. Otherwise, after a few days, the person who bought the bike will be able to ride without training wheels, while the person watching others will only have the illusion that they can do it too. Why is there such a big difference in learning outcomes?

A closer comparison reveals that the difference between the two types of people lies in their brain patterns. When you simply watch someone else ride a bike, your brain is in “input mode.” You receive sensory data, and your brain only needs to interpret “what I saw,” such as getting on the bike and starting to pedal. Here, the prediction error is close to zero, so naturally, no model correction is made because the explanation of this phenomenon is too simplistic. However, when you hold the handlebars yourself, your brain must generate more predictions that violate old knowledge: “If I turn the handlebars to the left with the force I imagine, the bicycle will fall over. Therefore, I need to adjust my force.” We find that if the comparison fails (prediction error), the brain must immediately correct the motion model.

This simple example highlights the role of deep processing in learning. When we actively engage in learning, relevant areas of the brain are fully mobilized. In a study, psychologists Neal Cohen et al. asked participants to memorize the positions of a series of objects on a screen: one group could actively control the browsing order, while the other group passively watched the same process. The results showed that the active exploration group significantly outperformed the passive group in a post-test memory test. Functional magnetic resonance imaging (fMRI) revealed stronger activity in memory-related structures such as the hippocampus in the brains of active learners, and higher functional connectivity and synchronization with multiple areas such as the prefrontal cortex and parietal lobe. Conversely, when passively receiving information, the brain only responds locally and sporadically. Therefore, by moving away from purely passive learning and trying to ask questions, conduct experiments, and organize knowledge ourselves, the brain can more effectively refine its learning models. 34

Evidence from multiple school teaching studies also validates the role of deep processing. One study, combining over two hundred reviews of undergraduate STEM (mathematics and physics) course instruction, supports this conclusion. Traditional teaching methods, where teachers lecture and students passively listen, are inefficient.10 Compared to teaching methods that encourage active student participation, students taught in this traditional lecture style exhibit poorer academic performance. 35

So, when it comes to daily learning, we should actively think about the knowledge points learned in the lesson and process the knowledge; or discuss the theory with classmates; or conduct experiments to verify the phenomenon. Students taking online classes can pause after the explanation of a knowledge point and reproduce the reasoning process themselves. In short, any method that allows you to stop passively sitting and listening is good.

However, we also need to correct a problem with this teaching method. This teaching philosophy is quite old, yet most teachers still use it today. Rousseau wrote in *Emile*: “May I state here the most important and useful principle in the process of education? It is not to think about saving time, but to ‘waste’ it.” For Rousseau and his successors, even if this exploration process takes several hours, it is worthwhile; letting children discover and build their own knowledge structures is the best way to learn. However, deep processing means engaging students in thinking about the learning material, activating the necessary brain regions—that’s enough. It doesn’t mean letting students build from scratch—that’s almost meaningless. How can we expect students to rediscover rules that humanity took centuries to understand in just a few hours without external guidance?

Modern teaching methods have shifted to designing a series of logically structured and progressively challenging learning activities. The aim is for teachers to carefully demonstrate to students before they engage in hands-on practice. Guided by direct instruction and inspiring teaching materials, learning is highly effective when students actively, happily, and autonomously participate in the learning process . This has been repeatedly proven. Unfortunately, because this teaching method requires a significant amount of effort, not many schools have yet adopted it.

5.5 Imitation - Fine Correction of Residual Errors#

After quickly eliminating a large number of errors, we need to correct the small errors that we can’t perceive even after repeated readings. —Yes, repeated readings will inevitably cause us to overlook some things, and these overlooked things may manifest in exam scores.

5.5.1 Testing - Error Feedback#

Once we have a general grasp of the theory, all that’s needed is testing. We’ve mentioned countless times before that the brain is a predictive machine, correcting its model through prediction errors. Therefore, by trying countless problems, the brain records each mistake, allowing it to improve upon the next challenge. The more times the same mistake is made, the lower the probability of repeating it. This feedback allows us to fine-tune our learned knowledge, enabling us to identify the most subtle differences in understanding.

So how do you get people to respond, to actively formulate hypotheses, no matter how uncertain they are about their ideas? Secondly, you must receive immediate, objective feedback so you can correct your mistakes. The answer is obvious: testing. Numerous academic papers have proven the effectiveness of testing for learning, and even regular testing—this strategy has become one of the most effective educational strategies. Regular testing maximizes the long-term effects of learning and enhances memory. Testing directly embodies the principles of active participation and error feedback. Taking tests forces you to confront reality, solidify what you already know, and recognize what you don’t yet know.

However, most people don’t realize how helpful tests are for learning, treating them merely as a final exam. —Of course, they’re also unlikely to focus on the results, only on the score, which prevents the generation of incorrect feedback. What matters isn’t the final grade, but the effort you put into acquiring information and the immediate feedback you receive. In this respect, research suggests that tests are at least as important as the course itself.

Similarly, we can conclude that learning without feedback is ineffective. Unfortunately, most students believe that the more they learn, the better they should learn. So, most students spontaneously spend a lot of time taking notes and filling textbooks with notes, highlighting key points with different colored pens… However, these strategies are actually less effective than taking a simple test. What’s the underlying principle? Because we cannot distinguish between different parts of our memory. Immediately after reading a textbook or taking notes, the information is presented in our minds. It exists in an active form in our conscious working memory, leading us to mistakenly believe that we have mastered everything.

5.5.2 Interleaving Test#

Another point worth mentioning regarding testing is that incorporating different types of questions into tests can make them more effective. Let’s divide tests into two types: one is the common in-class quizzes, which we call intensive tests—generally testing only the content learned in that lesson. The other is our midterms and final exams—the scope is much broader, requiring the brain to switch between different predictive models and choose the correct one to solve the problem—we call this interspersed testing.

For a long time, people have mistakenly believed that concentrated testing is more effective, and some experiments have indeed proven this, showing that concentrated testing is extremely effective in the short term. However, they often overlook the long-term memory effect of materials. When testing is conducted a week later, intermittent testing shows that it is far more effective than concentrated testing.

Here’s a small example. Teach two groups of college students to calculate the volume of four uncommon geometric solids (wedge, ellipsoid, cone, and hemicone), then have them solve practice problems. One group of students solved problems categorized by type (first solving four problems calculating the volume of a wedge, then four problems calculating the volume of an ellipsoid, and so on). The other group solved the same practice problems, but with a mixed format (interleaved), instead of grouping problems of the same type together. During practice, the students who solved problems of the same type (i.e., those who focused on a single approach) had an average accuracy rate of 89%, while the students who solved problems of the mixed type had an accuracy rate of only 60%. However, on the final test a week later, the students who had practiced solving problems of the same type had an average accuracy rate of only 20%, while the students who had practiced interleaved problems had an average accuracy rate of 63%. Mixing different types of problems, while initially hindering learning, resulted in a staggering 215% improvement in final test scores. 36 37

To avoid misunderstanding, here are a few examples of interleaved testing. Interleaved testing doesn’t mean switching back and forth between two completely unrelated tasks; alternating between memorizing vocabulary and doing math is ineffective.

ExampleDetailed Explanation
Comprehensive Mathematics ExercisesThe problems interweave quadratic equations, function monotonicity, trigonometric functions, and derivatives, requiring students to first determine which problem-solving model to use.
Physics and Mechanics ExercisesThe problem combines Newton’s laws, conservation of energy, and conservation of momentum, requiring students to distinguish between different physical models before solving it.
(Counterexample) Repeated practice of a single question typeFor multiple consecutive questions of the same type (e.g., 20 consecutive quadratic function calculations), only the numerical values ​​are changed without altering the problem-solving model.
(Counterexample) Alternating irrelevant tasksAlternating between math problems and English vocabulary memorization, frequently switching between unrelated tasks, without training the model’s recognition capabilities.

5.5.3 The test does not require scores.#

When training a neural network (large language model), we use backpropagation to tell the model where it went wrong and how to improve its parameters. We don’t punish the neural network by removing a neuron, as that doesn’t help with training. We can draw a similar analogy to a quiz—thinking about how bad your score is won’t help; the key is to carefully examine where you went wrong and how to adjust your prediction model.

Schools frequently use grades as punishment, making it hard to imagine a school that doesn’t value grades. However, the feedback provided by grades is often imprecise, while the impact is very precise. Poor grades have a huge negative impact on the emotional system in students’ brains: frustration, shame, helplessness… These, in turn, make students more resistant to learning, not only emotionally, but the pressure also affects their actual learning ability—experiments have shown that it can hinder neuronal plasticity.

Therefore, how we view mistakes is also a key factor in our learning. Adjusting our mindset, using scientific methods, and believing in ourselves… although these are clichés, they remain important and must be mentioned.

5.6 Imitation-Consolidation#

We may remember the theory in the short term, but we will inevitably forget it gradually. Therefore, to remember it for a long time, we must repeatedly retrieve it in the future.

5.6.1 On the Forgetting Curve#

The above content, whether it’s attentional information, deep processing, or error feedback, all addresses how to correct our prediction biases at the time. However, scientific research has found that what we truly need for long-term retention is spaced repetition. The most frequently cited figure here is Ebbinghaus, who, through repeated self-testing of forgetting curves, discovered the famous “spaced repetition effect”: compared to a single, concentrated learning session, multiple reviews over a longer period significantly improve memory retention. He also summarized this into a forgetting curve. This curve has been touted for many years, and it is still discussed by many people today.

记忆遗忘曲线

However, this doesn’t mean Ebbinghaus’s findings were wrong, just that his experiments weren’t rigorous. What we’ll be doing here is making some corrections to the curves. Let’s first look at the original experiment:

In 1885, Hermann Ebbinghaus experimented on himself and, in order to eliminate the interference of existing knowledge, invented “meaningless syllables” (such as DAX, BOK, YAT). He then conducted tests.

I think everyone can see the problem here—human memory is usually semantic memory, involving understanding and association. You’re far more efficient at memorizing the “relativity formula” than at memorizing a string of gibberish. Therefore, this curve reflects the limits of rote memorization. Numerous repeated experiments in later generations (after the advent of the internet) have indeed proven that the curve’s trend roughly exists, but there isn’t a forgetting rate like a memory curve.

For example, a large-scale internet-based study conducted in 2005 investigated people’s memory of past news events. The results showed that meaningful news with a strong emotional impact was less easily forgotten. Like the 9/11 attacks—most people wouldn’t forget them after hearing about them once or twice. 38

Subsequent experiments improved the forgetting curve algorithm. Piotr Wozniak developed the SM-2 algorithm in the 1980s—for most modern learners, even a slight introduction of “active recall” significantly slows down the rate of forgetting. The algorithm behind software like Anki is also SM-2. It has now evolved into SM-17. The essence of the algorithm is that we don’t need to review every day; we only need to intervene at the critical point of forgetting. By calculating the optimal review interval for each knowledge point, the optimal memory efficiency can be achieved.

https://www.supermemo.com/en/supermemo-method

5.6.2 Active Retrieval#

Having clarified that we need to use spaced repetition, we also need to pay attention to one point during the review process—active retrieval. That is, frequently recalling the knowledge you have learned, rather than simply reviewing it again.

A common scenario is that students spend hours repeatedly reading textbooks, yet achieve better results than spending a shorter time self-testing to recall information. This has actually been scientifically investigated and is known as the “testing effect”: allocating some study time to recalling information, rather than simply inputting it, can significantly enhance long-term memory. Karpicke and Roediger’s classic experiment compared the effects of different learning strategies: once words have been memorized once, repeated study did not improve retention a week later, while multiple self-tests (i.e., attempts to recall words) resulted in a more than 150% improvement in memory performance! In other words, once information can be retrieved from memory, repeated reading provides no additional benefit, while repeated attempts to retrieve it from memory significantly strengthen the memory. 39

This discovery contradicts our intuition: many people prefer repeatedly reviewing their notes to testing themselves during revision, because the former feels more familiar and easier, while the latter is strenuous and easily exposes what they’ve forgotten. However, it is precisely this active retrieval process that stimulates deeper processing in the brain—we must diligently search for related connections in our minds, mobilizing cues to reconstruct the information, essentially actively retracing the memory’s path. When we successfully recall, this path is reinforced; even if we can’t remember immediately, comparing our answers to the correct ones creates a clear and contrasting memory trace, preventing future mistakes. In the same amount of time, pure reading only passively reinforces existing traces. Therefore, effectively utilizing quizzes, flashcards, and self-questioning to continuously retrieve learned content is the core of consolidating our learning model.

TL;DR: Don’t look at the answers directly. When reviewing using flashcards or quizzes, remember to actively search for the answers.

5.6.3 Sleep#

I think most people don’t realize the importance of sleep. However, recent scientific research has found that sleep is not a period of inactivity, nor is it merely a process of clearing away waste accumulated in the brain during the day. On the contrary, while we sleep, the brain remains active, operating according to a specific algorithm, replaying important events recorded from the previous day, and transferring them to our memory for storage.

This brings us back to our research on the forgetting curve. In 1924, two researchers re-examined Ebbinghaus’s forgetting curve and discovered an anomaly: no memory loss occurred within 8–14 hours after learning new knowledge. Looking back at Ebbinghaus’s experiment, the researchers noted that the 8 hours referred to 8 hours within a day, meaning people didn’t sleep during this time; however, the 14 hours were for testing the following day. To differentiate this variable of sleep, they conducted a completely new test. They randomly taught students syllables around midnight, before bedtime, or in the morning. The results were clear: according to Ebbinghaus’s forgetting curve, knowledge learned in the morning fades over time, while knowledge learned at midnight remains stable over time (provided the student had at least 2 hours of sleep). In other words, sleep can prevent forgetting.

This finding was validated in subsequent neuroscience research. The two researchers discovered that when rats were asleep, place cells in the hippocampus began firing in the same sequence. These neurons were tracing the paths the rats had taken during the day. The only difference was that neurons fired nearly 20 times faster during sleep than they actually did, allowing the brain to rapidly simulate the paths explored during the day. This indeed confirms the aforementioned finding regarding the correction to the Ebbinghaus forgetting curve—we do indeed consolidate our memories during sleep. 41

大鼠睡眠

However, scientists’ research on sleep doesn’t stop there. Another experiment demonstrated that we may still be engaged in some mental activity during sleep. Researchers taught subjects a complex algorithm during the day, an algorithm with a hidden shortcut that could significantly reduce computation time. Before sleep, only a very small number of subjects knew this. However, after sleep, the number of subjects who discovered the shortcut doubled, while those who couldn’t sleep never had such a flash of insight. Furthermore, the results were the same regardless of the time of day the subjects were tested—proving that this truly is thanks to sleep. 42

The importance of sleep needs no further emphasis; we can be fairly certain that its exceptional memory-building properties have been neglected for years. Unfortunately, even today, most students and adults struggle to get a good night’s sleep. This could stem from the pressures of school or work, the stimulation of short videos on the internet, or the effects of alcohol.

In short, get a good night’s sleep; it’s the most effective way to consolidate your memory.

5.7 Imitation - Summary#

This chapter redefines “learning” as a more engineered process: continuously optimizing internal models. First, errors are identified; then, correct structures are transferred through imitation; attention and deep processing improve comprehension efficiency; testing, interleaving, and feedback refine the process; and finally, interval retrieval and sleep solidify the knowledge. We are fortunate to live within the accumulated wisdom of human civilization. Most of the time, we don’t need to “reinvent the wheel” from scratch; instead, we can efficiently transfer the wisdom of our predecessors through social learning (imitation).

Q: Why isn’t simply “making mistakes” enough to trigger learning? In the real world, without external standards (like exams), how does the brain typically process errors?

💡 提示:The prerequisite for learning
点击翻转
  • Premise: It is necessary to produce prediction errors that are subjectively costly.
  • Brain mechanism: The default setting is “confirmation bias,” which maintains the old model rather than corrects it by distorting perception (such as blaming the environment).
  • Solution: Maintain the belief that “I might be wrong” and actively seek out high-quality information that contradicts your intuition.

Q: Why is “imitation” (social learning) considered the lowest-cost learning path? What is its neural basis?

💡 提示:The most efficient learning strategy
点击翻转
  • Strategy: Directly draw upon the wisdom of predecessors through model transfer (“cultural ratchet effect”) to avoid trial and error from scratch.
  • Neural basis: The mirror neuron system. It allows the brain to “recreate” the behavior of others, thus reproducing empathy.

:::flashcard{hint=“The Role of Attention”} Q: In the “seeing information” step, why doesn’t simple sensory reception equal learning? (Refer to the gorilla experiment) ??? Learning depends on the allocation of attention. The brain filters irrelevant information as noise based on the current task (orientation). If attention is not focused on the “information that triggers model updates,” even strong external sensory input will be ignored by inattentional blindness. :::

Q: Why is learning through “active exploration” (such as riding a bike independently or deriving formulas independently) far more effective than “passive instruction”? Please explain using neuroscience.

💡 提示:Active vs. Passive
点击翻转
  • Passive mode: The brain only interprets “what it sees”, the prediction error approaches zero, and the model is not updated.
  • Active mode: The brain must constantly generate predictions and compare them with feedback (deep processing). This fully mobilizes the hippocampus and prefrontal cortex, reconstructing neural circuits in the process of continuously correcting prediction errors.

Q: Why is testing a more effective learning method than repeated reading?

💡 提示:The Essence of Testing
点击翻转

Testing provides error feedback. It forces the brain to confront prediction errors and identify subtle cognitive gaps.

  • Key strategy: Interleaving. Mixing different types of questions (rather than focusing on similar questions) forces the brain to train its ability to “recognize and switch between models.”

Q: How do “spaced review” and “active retrieval” respectively optimize memory consolidation?

💡 提示:About Review
点击翻转
  1. Spaced review: Utilize the spaced effect to intervene at the forgetting threshold (such as the SM-2 algorithm) to avoid mechanical repetition.
  2. Active Recall: The process of retrieving information is more crucial than the input. The recall process reconstructs and reinforces neural pathways (the testing effect), while repeated reading merely passively reinforces existing traces.

:::flashcard{hint=“The Role of Sleep”} Q: What are the two key roles that sleep plays in the learning process? ???

  1. Memory consolidation: The hippocampus replays daytime neural activity at 20 times the speed, solidifying short-term memories into long-term structures.
  2. Mental restructuring: Finding shortcuts and patterns in the subconscious (e.g., in mathematical algorithm experiments, the proportion of sleepers who discover shortcuts doubles).

5.8 Exploration#

In that 1% of cases, you’re truly facing a problem that’s unprecedented in history. When Einstein discovered that Newtonian mechanics couldn’t explain the constancy of the speed of light, he had no textbooks to refer to.

Note: Indeed, not much literature was found on “how to explore”, so this chapter is not reliable. It only briefly describes the necessary steps for exploring new theories based on the philosophy of science and logic.

At this point, the brain is forced to switch to a high-energy-consuming exploration mode. You no longer have replicable knowledge to deal with the situation. You need to mobilize memory fragments in the hippocampus, and under the command of the prefrontal cortex, try to build a new explanatory framework like piecing together a jigsaw puzzle. You propose a hypothesis, reality slaps you in the face; you revise the hypothesis again, and encounter setbacks again. The long anxiety scientists experience before discovering the truth is essentially the brain enduring extremely high prediction errors, trying to build a model that has never existed before out of thin air.

The general steps of generative construction:

  • The old model is no longer valid.

  • Search and trial and error:

    • Hypothesis A is proposed -> Verification -> Failure (error is still large) -> Discard.

    • Hypothesis B is proposed -> Verification -> Failure (error is still large) -> Discard.

    • Hypothesis C is proposed -> Verification -> Success (error drops sharply) -> Selected

  • Consolidate the new stable model

Don’t think of exploration as something too lofty and unattainable. It’s not only the forefront of science and a sanctuary for the mind, but it’s also often the breeding ground for amateur scientists. Most people don’t follow established methods in their research, leading to consistently erroneous results. These flawed models are quite popular, but they inevitably collapse at some point. Therefore, exploration isn’t like a charlatan who achieves instant enlightenment through meditation. Even the flashes of genius are guided by rigorous cognitive logic.

5.9 Exploration - Attribution and Multidimensional Verification#

In imitation mode, we directly copy other people’s answers. In exploration mode, we must guess the principles ourselves, and then determine the one that best fits the current situation from among several possible principles. Exploring things is extremely difficult, but fortunately, most people do not have to engage in complex exploration in their lifetime.

5.9.1 Abductive Reasoning#

When errors occur, we need new theories to explain them. These new theories are not proposed arbitrarily; they arise from rigorous logical thinking—abductive reasoning. Simply put, abductive reasoning works by tracing back from the “result” to the “cause.” We observe a phenomenon and, based on existing knowledge, infer the most likely cause. It is a reasonable guess.

For example, astronomers discovered anomalies in Uranus’s orbit, which did not conform to the calculations expected by Newtonian mechanics. Based on existing theories, they calculated that if an unknown planet were nearby, its gravity would interfere with Uranus’s orbit. Scientists then made abductive inference—there must be another unknown planet pulling on Uranus from its outer edge. Later, based on this hypothesis, scientists conducted observations and indeed discovered Neptune.

In more formal terms:

观察到现象 D+规则(HD)结论(H可能是真的)\text{观察到现象 D} + \text{规则} (H \rightarrow D) \rightarrow \text{结论} (H \text{可能是真的})

Of course, we can deduce that H is likely true, but not necessarily true, therefore abductive reasoning often fails. Let’s turn our attention to another astronomical scenario:

In the late 19th century, astronomers conducted long-term, repeated, and rigorous observations of Mercury’s orbit, discovering an extremely small but stable error: Mercury’s perihelion precession increased by 43 arcseconds per century. This number, though small, was fatal. Small enough to resemble an “error term”; fatal enough that it recurred in long-term observations and could not be erased with crude explanations—its precision was too high, and this error had the power to challenge the then-current Newtonian gravitational model.

Therefore, the most natural strategy is (Abduction A): without changing the core generation mechanism, add a hidden variable to make the error “reasonable again” within the old framework. This is why many people proposed the “Zhurong Star” at the time: assuming there is an undiscovered planet inside Mercury, whose gravitational perturbation caused those 43 arcseconds.

Then scientists began to investigate: Where should it be? When could it be seen? The telescope was pointed there, but nothing was visible. Once might be a coincidence, but multiple times is no longer a coincidence. —At this point, a second, a third, and even more theories are needed.

At this point, another scientist stepped in. He began to wonder if we had overlooked some variable, causing our existing theories to be inaccurate. Based on this, he proposed the theory of general relativity, which could also well explain why Mercury would deviate. The subsequent outcome is well-known: Einstein’s theory of relativity prevailed. The hypothesis of Venus was repeatedly overturned; we have never observed such a planet, and the 1.75 arcseconds of light predicted by relativity was verified by Sir Eddington in 1919.

5.9.2 Verification Reasoning#

In the section on abductive reasoning, we discussed a case study in physics, though it was not perfect. General relativity became mainstream in physics at the time not only because it precisely explained these few small phenomena, but also because it solved many other physical phenomena simultaneously. This section will explore what kind of verification methods we need to provide sufficiently reliable support for a theory. Scientific verification methods can be further divided into three categories:

  1. Mathematics/Computer Science/Logic (Abstract World)

Mathematical proofs in textbooks are usually “informal,” relying on natural human language and intuition. For example, they often present statements like “obvious” or “by analogy.” However, in academia, proofs need to be transformed into purely symbolic computation. Students who have taken advanced mathematics should be familiar with the ε-δ proof, which is the most rigorous definition of a limit.

ϵ>0,δ>0,xD,(0<xc<δ    f(x)L<ϵ)\forall \epsilon > 0, \exists \delta > 0, \forall x \in D, \big( 0 < |x - c| < \delta \implies |f(x) - L| < \epsilon \big)

For example, if you want to prove an obvious fact:

limx2(3x5)=1\lim_{x\to 2}(3x-5)=1

You will need to write:

Let ϵ>0 be given.\text{Let } \forall \epsilon > 0 \text{ be given.}

Choose δ:=ϵ3.\text{Choose } \delta := \frac{\epsilon}{3}.

(ϵ>0    δ>0)(\because \epsilon > 0 \implies \delta > 0)

Let x satisfy the condition: 0<x2<δ.\text{Let } x \text{ satisfy the condition: } 0 < |x - 2| < \delta.

Then implies:\text{Then implies:}

x2<ϵ3(Substitute δ)    3x2<ϵ(Multiply by 3)    3(x2)<ϵ(ab=ab)    3x6<ϵ(Distribute)    (3x5)1<ϵ(Rewrite term)\begin{aligned} |x - 2| &< \frac{\epsilon}{3} && (\text{Substitute } \delta) \\ \implies 3|x - 2| &< \epsilon && (\text{Multiply by } 3) \\ \implies |3(x - 2)| &< \epsilon && (|a||b| = |ab|) \\ \implies |3x - 6| &< \epsilon && (\text{Distribute}) \\ \implies |(3x - 5) - 1| &< \epsilon && (\text{Rewrite term}) \end{aligned}

0<x2<δ    (3x5)1<ϵ\therefore 0 < |x - 2| < \delta \implies |(3x - 5) - 1| < \epsilon

Q.E.D.\text{Q.E.D.}

This example gives us a glimpse into how complex rigorous disciplines can be, though they are relatively difficult to grasp. Here’s another example that further illustrates the pursuit of rigor by researchers in the abstract world: the proof of the Four Color Theorem (which requires only four colors for a map). In 1976, it was heavily criticized by mathematicians for relying on brute-force computation (fearing bugs in the program). Later, in 2005, Gonthier of Microsoft Research formalized the entire proof using the Coq language, with every line of code verified by the logical kernel, thus making it a valid proof.

  1. Physics/Chemistry/Biology (Empirical World)

Unlike the abstract world, because measurements are always accompanied by errors, noise, instrumental biases, and potentially theoretical flaws, empirical science rarely talks about “proof,” but rather about “support,” “falsification,” and “confidence enhancement.” So how do we determine the credibility of an experiment? It must be reproducible—that is, different teams, different instruments, and different locations/conditions must obtain the same (or statistically consistent) results. Therefore, publishing papers in these disciplines requires detailed documentation of experimental methods.

For example, in biomedicine, a classic and intuitive example is the establishment of the theory that Helicobacter pylori causes gastritis and peptic ulcers. In the mid-20th century, the mainstream view held that ulcers were mainly caused by factors such as excessive stress, high stomach acid, and dietary irritants. While the existing theories could explain some phenomena, they struggled to provide a stable, repeatedly verifiable causal chain. Later, researchers repeatedly observed Helicobacter pylori in gastric mucosal biopsies from numerous patients and proposed that bacterial infection was a significant cause of gastritis and ulcers. However, this was still only a “correlation clue” and insufficient to solidify the theory.

Subsequently, samples from different hospitals, countries, and laboratories, using their respective sampling and culture/staining procedures, were able to detect this bacterium in a significant proportion of gastritis/ulcer patients. This phenomenon became a reproducible and stable fact. We then tested the theory—if the bacteria were the cause, then eradicating the bacteria should significantly reduce the recurrence rate. Numerous clinical trials subsequently showed that, compared to acid-suppressing therapy alone, “antibiotic eradication therapy + acid-suppressing drugs” resulted in more stable healing and significantly reduced ulcer recurrence. Finally, we established this conclusion.

  1. History/Economics/Psychology/Medicine (Complex Systems)

Because these fields involve complex systems (numerous variables, difficult to isolate, weak controllability, and strong ethical constraints), it is generally difficult to conduct high-precision experiments.

Let’s start with the relatively easier fields to conduct experiments in—psychology and medicine. Currently, the most empirically meaningful method is generally a randomized controlled trial. This involves designing a placebo control group, randomly distributing subjects, and ensuring a sufficient sample size. However, this is difficult to achieve in reality, so psychology and medicine frequently face the crisis of reproduction failure. In fact, psychology faced a large-scale reproduction crisis in 2015, with many older experiments failing to be replicated.

Another area where experiments are impossible is history and economics. You can’t control history or national economies to conduct research, and setting up control groups is also difficult. Therefore, both schools of thought developed a “comparative method”—finding similar historical periods or economic situations and then analyzing them. For example, in studying the question “Does a minimum wage increase unemployment?”, researchers used two similar locations: one where the minimum wage was raised, and the other where it wasn’t. They then tracked certain economic indicators in both locations and concluded that a minimum wage does not significantly reduce employment.

That’s roughly the situation. The inherent nature of some disciplines makes them difficult to experiment with, often leading to them being labeled as pseudoscience. Of course, these disciplines have continuously evolved in recent years to achieve scientific rigor, resulting in the emergence of the following fields:

For example, biophysics, medical physics, epidemiology, computational neuroscience, and econometrics. These are all relatively practical disciplines; they label themselves as physics or neuroscience, which theoretically does imply that they are more reliable.

5.10 The Rise of Ineffective Exploration#

Having clarified the process of “exploration,” we should now turn our attention to the modern public discourse. This section aims to emphasize the importance of avoiding ineffective exploration and refraining from arbitrary abductive reasoning without proper verification when our understanding of a theory is incomplete. This is because the internet is now rife with absurd claims. This may be due to the inherent lag in scientific concepts—a theory may have been revised or its boundaries defined within the physics community, but it may only be gaining traction and being misinterpreted in popular culture.

Like Descartes’ mind-body dualism, despite modern neuroscience’s extensive fMRI data and neurophysiological experiments demonstrating that consciousness is a property of complex neural networks, the dualism of connecting soul and body via the “pineal gland” remains prevalent on the internet. Similarly, quantum mechanics demonstrates that quantum effects rapidly decohere on macroscopic scales. Forcibly applying Planckian-scale ( 103510^{-35} meters) laws to macroscopic human sociology (meter-scale) is utterly absurd. It’s as pointless as trying to calculate how to cook using general relativity. Oh, and of course, there’s the idea, as discussed in this article, that the brain is a predictive machine—and most people still store memories as facts.

These theories have been widely applied and have the potential to become a new type of religion. However, a closer look at their content reveals that they are fundamentally untenable. Furthermore, their verification relies heavily on case studies and cross-referencing flawed theories—this is ineffective exploration.

5.11 Exploration - Summary#

Exploration is not some mystical epiphany; we cannot make any progress simply by meditating or going into seclusion. Only by truly understanding the theories we need to understand, then engaging in abductive reasoning, and finally verifying them, can we complete a qualified exploration.

Q: When is the brain forced to switch to a high-energy-consuming “exploration mode”? What is the core psychological experience?

💡 提示:Exploration Mode Trigger
点击翻转
  • Condition: When the first person in history faces a completely new problem (such as Einstein facing the constancy of the speed of light), and there is no external model to imitate.
  • Experience: Prolonged anxiety. Essentially, this is the brain enduring extremely high prediction errors, attempting to construct a model from scratch that never existed. :::

Q: What are the general steps of the new theory of “generative construction”? (Trial and error loop)

💡 提示:Generative Construction
点击翻转
  1. The old model has failed (prediction errors persist).
  2. Search and trial and error:
    • Hypothesis A is proposed -> Verification -> Failure (error is still large) -> Discard.
    • Propose hypothesis B -> … -> Discard.
    • Hypothesis C is proposed -> Verification -> Success (error drops sharply) -> Selected
  3. Consolidate and stabilize the new model.

Q: What is “abductive reasoning”? Please explain using the Uranus/Mercury case.

💡 提示:Inferring cause from effect
点击翻转

Abductive reasoning is inferring the most likely cause (best explanation inference) from observed anomalies.

  • Logic: Observe phenomenon D + rule (H -> D) -> conclusion (H may be true).
  • Case Study:
    • Success: Uranus’s orbital anomaly -> speculation of an unknown planet pulling on it -> discovery of Neptune.
    • Failure: Mercury’s perihelion precession -> speculation of a “Zhurong” star -> observation failed (later replaced by general relativity). :::

Q: When verifying a new theory, what core standards do different disciplines follow respectively?

💡 提示:Verification Standards for Different Disciplines
点击翻转
  1. Abstract world (mathematics/logic): formal proof. Relies on pure symbolic computation and logical deduction (such as code verification of the four color theorem).
  2. The empirical world (physical/biological): reproducibility. Different teams can obtain statistically consistent results under different conditions (e.g., validation of Helicobacter pylori).
  3. Complex systems (historical/economic): comparative analysis. Due to the difficulty of conducting experiments, randomized controlled trials (RCTs) or natural experiments (comparative methods) are often used to search for causal clues.

Q: Why are disciplines like psychology and economics more prone to “reproducibility crises” or being questioned?

💡 提示:Pseudoscience vs. Science
点击翻转

Because they study complex systems. The numerous variables, difficulty in isolation, and strong ethical constraints make it difficult to conduct highly precise controlled experiments. Therefore, these disciplines are trending towards interdisciplinary collaborations with more hardcore fields (such as econometrics and computational neuroscience) to improve rigor.

点击继续
点击继续
点击继续
点击继续
点击继续
点击继续
点击继续
点击继续
点击继续
点击继续

6. Epilogue#

If we were to summarize the main theme of the entire text in one sentence, it would be that effective learning is about updating internal models more accurately. Neuroplasticity tells us that change does indeed occur, but it requires repeated activation, time to consolidate, and can be pruned due to long-term disuse; the memory system tells us that “what we can recall now is unrelated to what we will remember later,” and that the automation of skills and the acquisition of long-term retrieval of knowledge both depend on long-term training; the prediction mechanism further explains that learning is not “from nothing to something” as most people imagine, but more like “from wrong to right”—new circuits are only truly formed when you are forced to correct your predictions by feedback.


Final review cards:#

Q: According to neuroscience, what is the essence of learning at the microscopic level?

💡 提示:Ch1 Physics Foundations
点击翻转

Changes in the physical structure of neuronal networks. Learning is not merely the input of information, but rather the physical process by which the brain reshapes itself by building new synapses, strengthening myelin sheaths, or generating new neurons.

点击继续

Q: What are the three essential physiological factors that fuel neural remodeling (learning)?

💡 提示:Ch1 Three Pillars
点击翻转
  1. Sleep (clears waste and consolidates memory).
  2. Exercise (produces neurotrophic factor BDNF).
  3. Nutrition (provides the building blocks for synapses, such as Omega-3). :::

Q: What are the three main categories of the human memory system? What are the characteristics of each?

💡 提示:Ch2 Memory Classification
点击翻转
  1. Working memory: extremely unstable, only able to maintain the current thought for a few seconds (capacity 3-5).
  2. Explicit memory: Facts and experiences that can be consciously recalled (reinforced by the hippocampus).
  3. Implicit memory: Automatic skills and habits acquired unconsciously (such as cycling, which are extremely difficult to forget).

Q: Why is the brain described as a “prediction machine” rather than a “recorder”?

💡 提示:Ch3 Core Metaphor
点击翻转

The brain constantly anticipates sensory input based on past experiences; it only focuses on prediction errors (i.e., “unexpected events”). If the prediction matches perfectly, the brain won’t activate its learning mechanism; learning only occurs when reality contradicts the expectation.

点击继续
点击继续
点击继续

Q: What is the sole core condition that triggers the brain to update its model (i.e., “truly learn”)?

💡 提示:Ch3 Learning Triggers
点击翻转

Prediction Error. Neurotransmitters only adjust, driving synaptic changes, when there is a subjective feeling of “surprise” or “mistake.” Without error, there is no learning.

点击继续

Q: When faced with a conflict between old and new knowledge, does the brain tend to “patch” or “reconstruct”?

💡 提示:Ch3 Update Strategy
点击翻转

Following the principle of minimum energy consumption, patching is preferred (retaining old concepts and adding auxiliary hypotheses to explain anomalies). Only when the error is too large to be patched will high-energy-consuming reconstruction (completely overturning the old model) occur.

点击继续

Q: Why can experts (such as chess grandmasters) instantly remember complex information?

💡 提示:Ch4 Expert vs. Novice
点击翻转

Because they possess “chunking.” Experts utilize prior models in long-term memory to compress large amounts of fragmented information into meaningful units (chunks), thereby greatly relieving pressure on working memory.

点击继续

Q: What are the two main sources of cognitive load that makes studying feel “brain-burning”? How should we deal with it?

💡 提示:Ch4 Cognitive Load
点击翻转
  1. External load (noise/poor layout): Reduced by removing redundancy.
  2. Internal burden (the knowledge itself is difficult): digest it gradually by breaking it down into smaller steps. :::

Q: Why is simply “seeing” or “hearing” (such as passively listening to lectures) often ineffective?

💡 提示:Ch5 Attention Mechanisms
点击翻转

Because learning relies on attentional orientation. If attention is not focused on “information that triggers model updates,” the brain will treat other inputs as noise and filter them out (unintentional blindness), resulting in the inability to form memory traces.

点击继续
点击继续

Q: Which two review strategies have been proven to be more effective than “repeated reading”?

💡 提示:Ch5 Most Effective Methods
点击翻转
  1. Active retrieval (testing): Forces the brain to rebuild neural pathways and uses error feedback to correct the model.
  2. Spaced review: Utilizing the interval effect, reviewing at the forgetting threshold converts short-term memory into long-term memory.

Q: Why should “interleaved tests” (mixing different question types) be used instead of “concentrated practice” during practice?

💡 提示:Ch5 Interleaved Tests
点击翻转

Interleaved tests force the brain to constantly “recognize and switch between models.” Although it is more difficult at first, it simulates real-world application scenarios, which can significantly improve long-term retention and transferability.

点击继续
点击继续

Q: When there is no prior experience to imitate, how to construct a new theory through “abductive reasoning”?

💡 提示:Ch6 Advanced Exploration
点击翻转

Inferring the most likely cause from observed anomalies (outcomes). That is: Observe phenomenon D -> Conceive rule H -> If H is true, then D will naturally occur -> Temporarily accept H as true and verify it. :::


This article also draws heavily on the content of the following books:

Precision Learning — by Stanislas Dehaene.

Cognitive Nature — by Peter C. Brown, Henry L. Roediger III, and Mark A. McDaniel.

Understanding the Brain / 7½ Lessons About the Brain — By Lisa Feldman Barrett


Feel free to leave your comments QWQ


Attached is the paper:

Footnotes#

  1. Maguire, Eleanor A., et al. “Navigation-Related Structural Change in the Hippocampi of Taxi Drivers.” Proceedings of the National Academy of Sciences, vol. 97, no. 8, 2000, pp. 4398–403.

  2. Dehaene, Stanislas. How We Learn: Why Brains Learn Better Than Any Machine… for Now. Viking, 2020. ↩2 ↩3

  3. Bradlow, Ann R., et al. “Training Japanese Listeners to Identify English /R/and /L/: Long-Term Retention of Learning in Perception and Production.” Perception & Psychophysics, vol. 61, no. 5, Springer Science and Business Media LLC, Jan. 1999, pp. 977–85, https://doi.org/10.3758/bf03206911. Accessed 30 Dec. 2025.

  4. Crupi, Rosalia, et al. “n-3 Fatty Acids: Role in Neurogenesis and Neuroplasticity.” Current Medicinal Chemistry, vol. 20, no. 24, 2013, pp. 2953–63.

  5. Molteni, R et al. “A high-fat, refined sugar diet reduces hippocampal brain-derived neurotrophic factor, neuronal plasticity, and learning.” Neuroscience vol. 112,4 (2002): 803-14. doi:10.1016/s0306-4522(02)00123-9

  6. Meeusen, Romain. “Exercise, Nutrition and the Brain.” Sports Medicine, vol. 44, Suppl. 1, 2014, pp. S47–56.

  7. Frank, Marcos G. “Sleep and Synaptic Plasticity in the Developing and Adult Brain.” Sleep, Neuronal Plasticity and Brain Function, edited by Peter Meerlo et al., Springer, 2014, pp. 123–49.

  8. Pickersgill, Jacob W., et al. “The Combined Influences of Exercise, Diet and Sleep on Neuroplasticity.” Frontiers in Psychology, vol. 13, 2022, article 831819.

  9. Cowan, Nelson. “The Magical Mystery Four: How Is Working Memory Capacity Limited, and Why?” Current Directions in Psychological Science, vol. 19, no. 1, 2010, pp. 51–57.

  10. Dudai, Yadin. “The Neurobiology of Consolidations, or, How Stable Is the Engram?” Annual Review of Psychology, vol. 55, 2004, pp. 51–86.

  11. Squire, Larry R., and Adam J. O. Dede. “Conscious and Unconscious Memory Systems.” Cold Spring Harbor Perspectives in Biology, vol. 7, no. 3, 2015, article a021667.

  12. Corkin, Suzanne. “What’s New with the Amnesic Patient H.M.?” Nature Reviews Neuroscience, vol. 3, no. 2, 2002, pp. 153–60. ↩2

  13. MacKay, Donald G., et al. “Language, Memory, and H.M.: A Profound Deficit in Pronouncing New Words and Sentences and Its Implications.” Journal of Memory and Language, vol. 57, no. 3, 2007, pp. 375–410.

  14. Kutas, M, and S A Hillyard. “Reading senseless sentences: brain potentials reflect semantic incongruity.” Science (New York, N.Y.) vol. 207,4427 (1980): 203-5. doi:10.1126/science.7350657

  15. Fukukura, Jun, et al. “Prospection by Any Other Name? A Response to Seligman et Al. (2013).” Perspectives on Psychological Science, vol. 8, no. 2, Feb. 2013, pp. 146–50, https://doi.org/10.1177/1745691612474320. Accessed 28 Mar. 2019.

  16. Hassabis, Demis et al. “Patients with hippocampal amnesia cannot imagine new experiences.” Proceedings of the National Academy of Sciences of the United States of America vol. 104,5 (2007): 1726-31. doi:10.1073/pnas.0610561104

  17. Liu, Jiangang et al. “Seeing Jesus in toast: neural and behavioral correlates of face pareidolia.” Cortex; a journal devoted to the study of the nervous system and behavior vol. 53 (2014): 60-77. doi:10.1016/j.cortex.2014.01.013

  18. Friston, K. Does predictive coding have a future?. Nat Neurosci 21, 1019–1021 (2018). https://doi.org/10.1038/s41593-018-0200-7

  19. Tobler, Philippe N et al. “Human neural learning depends on reward prediction errors in the blocking paradigm.” Journal of neurophysiology vol. 95,1 (2006): 301-10. doi:10.1152/jn.00762.2005 ↩2

  20. Miami Symposium on the Prediction of Behavior, 1967: Aversive Stimulation p9–p31

  21. Haruno, M et al. “Mosaic model for sensorimotor learning and control.” Neural computation vol. 13,10 (2001): 2201-20. doi:10.1162/089976601750541778

  22. Gershman, Samuel J et al. “Context, learning, and extinction.” Psychological review vol. 117,1 (2010): 197-209. doi:10.1037/a0017808

  23. Nassar, Matthew R et al. “Rational regulation of learning dynamics by pupil-linked arousal systems.” Nature neuroscience vol. 15,7 1040-6. 3 Jun. 2012, doi:10.1038/nn.3130

  24. Ericsson, Anders, and Robert Pool. Peak: Secrets from the New Science of Expertise. Eamon Dolan/Houghton Mifflin Harcourt, 2016.

  25. Bobis, Janette, et al. “Cognitive Load Effects in a Primary-School Geometry Task.” Learning and Instruction, vol. 3, no. 1, 1993, pp. 1–21.

  26. Sohrabi, Zohreh, et al. “A Comparative Study of the Effect of Two Methods of Online Education Based on Sweller’s Cognitive Load Theory and Online Education in a Common Way on the Academic Engagement of Medical Students in Anatomy.” Medical Journal of the Islamic Republic of Iran, vol. 37, 2023, article 73.

  27. Liu, Dongyang. “The Effects of Segmentation on Cognitive Load, Vocabulary Learning and Retention, and Reading Comprehension in a Multimedia Learning Environment.” BMC Psychology, vol. 12, no. 1, article 4, 2024.

  28. Kunda, Z. “The case for motivated reasoning.” Psychological bulletin vol. 108,3 (1990): 480-98. doi:10.1037/0033-2909.108.3.480 ↩2

  29. Tomasello, M. (1999). The Cultural Origins of Human Cognition. Harvard University Press.

  30. Figueiredo, Luiz Felipe et al. “The mirror neuron: thirty years since its discovery.” Revista brasileira de psiquiatria (Sao Paulo, Brazil : 1999) vol. 45,3 (2023): 298-299. doi:10.47626/1516-4446-2022-2870

  31. Mayo, Oded, and Simone Shamay-Tsoory. “Dynamic Mutual Predictions during Social Learning: A Computational and Interbrain Model.” Neuroscience & Biobehavioral Reviews, vol. 157, Elsevier BV, Dec. 2023, pp. 105513–13, https://doi.org/10.1016/j.neubiorev.2023.105513. Accessed 3 Jan. 2026.

  32. Drew, Trafton et al. “The invisible gorilla strikes again: sustained inattentional blindness in expert observers.” Psychological science vol. 24,9 (2013): 1848-53. doi:10.1177/0956797613479386

  33. Stevenson, Matt P et al. “Attention Restoration Theory II: a systematic review to clarify attention processes affected by exposure to natural environments.” Journal of toxicology and environmental health. Part B, Critical reviews vol. 21,4 (2018): 227-268. doi:10.1080/10937404.2018.1505571

  34. Voss, Joel L et al. “Hippocampal brain-network coordination during volitional exploratory behavior enhances learning.” Nature neuroscience vol. 14,1 (2011): 115-20. doi:10.1038/nn.2693

  35. Freeman, Scott et al. “Active learning increases student performance in science, engineering, and mathematics.” Proceedings of the National Academy of Sciences of the United States of America vol. 111,23 (2014): 8410-5. doi:10.1073/pnas.1319030111

  36. Rohrer, Doug, and Kelli Taylor. “The Shuffling of Mathematics Problems Improves Learning.” Instructional Science, vol. 35, no. 6, Apr. 2007, pp. 481–98, https://doi.org/10.1007/s11251-007-9015-8.

  37. Brown, Peter C., et al. Make It Stick: The Science of Successful Learning. Belknap Press, 2014.

  38. Meeter, M et al. “Remembering the news: modeling retention data from a study with 14,000 participants.” Memory & cognition vol. 33,5 (2005): 793-810. doi:10.3758/bf03193075

  39. Karpicke, Jeffrey D., and Henry L. Roediger III. “The Critical Importance of Retrieval for Learning.” Science, vol. 319, no. 5865, 2008, pp. 966–968, https://doi.org/10.1126/science.1152408.

  40. Jenkins, John G., and Karl M. Dallenbach. “Obliviscence during Sleep and Waking.” The American Journal of Psychology, vol. 35, no. 4, 1924, pp. 605–12. JSTOR, https://doi.org/10.2307/1414040. Accessed 3 Jan. 2026.

  41. Skaggs, W E, and B L McNaughton. “Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience.” Science (New York, N.Y.) vol. 271,5257 (1996): 1870-3. doi:10.1126/science.271.5257.1870

  42. Wagner, Ullrich et al. “Sleep inspires insight.” Nature vol. 427,6972 (2004): 352-5. doi:10.1038/nature02223

点击继续
1 / 8
Understanding Effective Learning Methods: A Neuroscience Perspective
https://techleaf.xyz/posts/neuroscience-effective-learning-methods-en/
作者
Billy Xu
发布于
2026-01-03
许可协议
CC BY-NC-SA 4.0