Investment Applications of Operant Conditioning

Written by Oliver Sung

If we don’t like the consequences of an action we’ve taken, we’re less likely to do it again and if we do like the consequences, we’re more likely to do it again. That’s the basic presumption of B.F. Skinner’s operant conditioning: a type of learning that shapes behavior, or conditioned reflexes, by reward or punishment.

Skinner believed that the previously dominating field of behaviorism, classical (or Pavlovian) conditioning, in which involuntary responses may be triggered by repeated exposure to associations (Pavlov’s bell), was too simplistic to be a complete explanation of complex human behavior. Building upon Edward Thorndike’s work and believing that you would have to look at the causes of an action and its consequences, Skinner set out to do a bunch of experiments that he could observe rather than relying on hypotheses.

These experiments led him to eventually be able to make animals do all kinds of weird things like having pigeons turn 360 degrees and “read” English. His most famous experiment became the 1948 “Skinner Box” experiment where he placed a hungry rat in a box with a lever by the side and whenever the rat would bump into the lever, a food pellet would drop into a container. After several test runs, the rat learned that upon entering the box, running straight toward the lever and pressing down meant receiving food. Likewise, in another box that gave the rat small electric shocks, the rat learned how to use the lever to make the zapping stop.

Skinner used these experiments to prove that animal organisms, including human beings, are all subject to the same “selection by consequences” process and liked to explain it in terms of evolution. Nature tests out different forms randomly and if the forms bring good consequences, like leading to survival, nature keeps them. But if the forms bring bad consequences, like causing the forms to die, nature gets rid of them. Since the organisms endowed with the most optimal forms have a greater chance of survival and thus reproduce more, it leads to more copies of these forms in subsequent generations until eventually, they become present in the entire reproductive population. By sticking to what works, nature deduces order from chaos, creating complex self-preserving systems. Nature doesn’t have a pre-conceived bias to design these forms nor does it have the ability to predict what will work beforehand.

As applied to human psychology, when actions produce positive consequences (benefit, pleasure, etc.), the brain and psychology are modified in ways that cause it to engage in them more often, and when the actions produce negative consequences (harm, pain, etc.), the brain and psychology are modified in ways that cause it to refrain from them in the future.

Skinner found that in order to condition an animal to a certain behavior, the consistency and the timing of the reinforcement played critical roles. If the reinforcement stopped, the conditioned behavior would eventually disappear, which is what he termed “extinction”. Pavlov observed the same thing: if the association between the bell and the arrival of meat powder wasn’t maintained over time, Pavlov’s dogs will stop salivating in response to the bell.

Skinner also found that the type of reinforcement used can affect the strength and persistence of the behavior. Using a more powerful reinforcement, such as a larger food reward, would have a stronger effect than a weaker reinforcement, such as a small food reward. Furthermore, the reinforcement must occur soon or immediately after a desired behavior.

Together with classical conditioning, operant conditioning is arguably the most important behavioral mental model there is because it provides the foundation for the study of incentives. Other than explain straightforward behavior, like why you train your dog with treats or why you decided to smoke as a teenager (to get with the cool people; the reward) but stopped (because you were caught by the principal and your parents became involved; the punishment), its implications go far wider, as does your advantage in life if you know how easily it flows through different fields and studies. The human brain, Skinner argued, is just a more computationally advanced version of the brains of other types of organisms. As an investor, here’s how you can apply Skinner’s work to understanding market behavior and yourself.

Gambling and Investing

Since gambling (taking a zero or negative expected return bet) is an obviously irrational behavior, psychologists have long struggled with the question of why humans can become addicted to it. Maslow saw gambling as a means of fulfilling basic human needs like excitement and self-actualization. Freud saw gambling as a way for people to experience a sense of control and mastery over chance.

Skinner’s explanation was better. During his pigeon experiments, Skinner found that in order to keep the pigeons doing what Skinner wanted them to, variable reinforcement, where the reward was provided not every time but every so often, was enough. According to Skinner, people gamble because they have been exposed to such variable reinforcement where the reward hasn’t arrived at every bet but just every so often, and with the possibility of the occasional huge reward, that it leaves enough of a connection in the brain to keep gambling. If you were to completely remove the reinforcement, the gambling appetite would eventually go extinct, albeit gradually.

You see how this applies to understanding stock market behavior. Countless historical examples exist of how investment appetite flows with the direction of the market and operant conditioning explains why that is. As you enter the market as a new investor, what the market throws at you is likely what you’ll be conditioned to perceive the market as. If you jump on a bull market, making money on a consistent basis will reinforce your appetite for more until it reaches the point where you might think you’re smarter than you are. On the contrary, bad investments and experiences in a hard market from the get-go will cause you to become averse to investing and view equities as a risky asset.

Peter Lynch has this cocktail party theory where he explains first-hand how the layman’s interest in the stock markets is directly correlated with the price level of the market:

In the first stage of an upward market, the market has been down for a while and nobody expects it to rise again, people aren’t talking about stocks. In fact, if they turn up to ask me what I do for a living, and I answer, “I manage an equity mutual fund,”, they nod politely and wander away. When ten people would rather talk to a dentist about plaque than to the manager of an equity mutual fund about stocks, it’s likely that the market is about to turn up.

In stage two, after I’ve confessed what I do for a living, the new acquaintances linger a bit longer, perhaps long enough to tell me how risky the stock market is before they move over to talk to the dentist. The cocktail party is still more about plaque than about stocks. The market’s up 15 percent from stage one, but few are paying attention.

In stage three, the markets are up significantly from the lows. A crowd of interested parties ignores the dentist and circles around me all evening. A succession of enthusiastic individuals takes me aside to ask what stocks they should buy. Even the dentist is asking me what stocks he should buy.

In stage four, the markets are having a non-stop bull run. Once again they’re crowded around me, but this time it’s to tell me what stocks I should buy. Even the dentist has three or four tips, and in the next few days I look up his recommendations in the newspaper and they’ve all gone up. When the neighbors tell me what to buy and then I wish I had taken their advice, it’s a sure sign that the market has reached the top and is due for a tumble.

Howard Marks says that the herd is wrong about risk at least as often as it is about return. When everyone believes something embodies no risk, they usually bid it up to the point where it’s enormously risky, and when everyone believes something is risky, their unwillingness to buy usually reduces its price to the point where it’s not risky at all. The investment that “everyone” believes to be a great idea by definition cannot be so. That, coupled with variable reinforcement, explains why trends form, change, and inflect.

Market Trends

By and large, everyone invests based on what the future fundamental outlook will be, whether that’s a few months or a couple of years into the future. Taking the decade-long view would be more rational since that’s where the majority of present value lies. However, people tend to take the short-term view because projecting fundamentals that far into the future is too difficult to confidently estimate, particularly since performance reviews, at least for professional investors, depend heavily on short-term price fluctuations.

Prices move in trends because the very fundamentals underlying the prices themselves move in trends, whether that’s earnings, interest rates, employment rates, or, collectively, the business cycle. Those trends in the fundamentals, as well as price fluctuations, condition market participants into certain mindsets that influence their behavior. And when they inflect, the previous mindset gets unlearned and replaced by new ones.

Depending on whether or not you subscribe to the efficient market hypothesis, you would think that such trend inflections translate directly to price action. But that is not what happens. When a fundamental trend inflection happens, the market participants who are responsible for reciprocal price action to occur haven’t yet been conditioned to do so. In other words, market participants need to see the price rise, even as their mindset may have already changed, before believing it will actually do so. Without the right amount of reinforcement to influence their behavior, confidence to act upon changes in mindset is a slow process rather than an instantaneous one.

If the inflection point is a reversal from the bottom, like in March 2009 or March 2020, this gradual process is one that goes through Skinner’s concepts of first punishment, extinction, and then reinforcement. For instance, in the first half of 2008, when the housing market was in a bubble, the US economy started to experience a tight labor market, constrictive monetary policy, and enormous risk in the financial system, while investors’ confidence evidently remained high. The Global Financial Crisis hit, and it took until the first half of 2012 for the market to trade at the prior 2008 level. That was also a time when the market was improving with rising corporate earnings, expansionary monetary policy with years away from tightening, growing retail sales, and industrial output. The question is, why were the markets trading at the same price in 2008 and 2012 when economic prospects were so different? The answer, of course, is that investors hadn’t been conditioned to think about markets in crisis mode in 2008 and still had confidence in the market, whereas in 2012, the lack of reinforcement held things back.

That was until it didn’t. This leads us to how the Federal Reserve’s gain of power over the markets can again be explained by Skinner’s operant conditioning. Fed interventions, starting in 1998 as interest rates started coming down, slowly crept into the market to such a degree that made market participants confident to chase yield and buy any dip and any equities at any price. And since it worked during March 2020, as the market snapped back to new heights in a very short period of time while the money-printing machine turned up to unprecedented gears, the market may well be reinforced once again so that when fundamentals will truly be in the slump, the price action to warrant such a slump may be delayed. That is until the confidence will be unlearned once again, as it will, and the cycle starts over.

Practical Use for Investors

Operant conditioning may not give you a crystal ball that lets you predict the future of the market. You can understand human nature and still be completely wrong. But what it can do is assist you in comprehending and responding promptly to the changes that occur during these cycles when investors’ risk-averseness fluctuates between too much and too little.

Starting with yourself, perceiving your own behavior through the mental model of operant conditioning can protect you against your own exuberance and help you find the strength to position correctly when you might go down the wrong path.

It can also help you take action when the right opportunity comes around. By removing your focus on price, either when buying or selling, you resist the urge to wait for the market’s continued positive reinforcement to condition you with the confidence to take action, even when that means buying a cheap stock on the way down or selling an expensive stock on the way up, or even selling a stock on the way down or buying a stock on the way up when it’s the right thing to do. Consider the fact that after breaking the previous market records in the first half of 2012, the market was up 100% from the lows and not far from its previous “bubble” heights. And as always, there were concerns to think about: record profit margins, a Eurozone debt crisis, and so forth. To the price-sensitive investor, there would likely be questions like 1) What upside is left? and 2) Would this be another looming bubble?

There would be plenty of reasons to come up with to stay out of the market but what really would happen is that you have been conditioned, subconsciously, into such a risk-averse mindset that you would want to come up with a reason for valuations to head lower so that you can buy into. But being conditioned into a mindset is not the same as gaining wisdom or understanding. Because at the same time when you would see plenty of bad horizons, you would also see an improving housing market, rising employment, and easy monetary policy. For each tick higher, your bearish confidence would gradually wane. And as the S&P would break 1500, you would slowly be acclimatized to the idea of higher market numbers while your prior conditioning would gradually become extinct together with the rest of the market until it would be a matter of chasing. Your bullish confidence would finally have been conditioned through continuous reinforcement of each failed prediction.

In reality, removing your focus on price is not easy. Anchoring bias has us believe that when prices fall from where they’ve been, they look cheaper. We envision the possibility that they might return to where they recently were, precisely because we have been reinforced, Skinnerian style, from the prevalent trend. And if that envisioning is fulfilled once, it will reinforce us to again fall prey to our innate bias the next time around. And that’s exactly why removing the focus on price, as well as unrealized gains or losses, is so important.

Lastly, another important observation of Skinner’s is that operant conditioning affects our thinking and even our language throughout our lives, often in subtle ways. We respond to what we want to hear because we’ve been conditioned through our emotional reactions. Unfortunately, the emotional factors that shape our thinking are not always oriented toward truth, but rather toward other values, such as building relationships, achieving status, or acquiring resources. This lack of truth orientation is generally not a problem, except in investing, where the consequences of being wrong can be tangibly significant. To avoid this problem, you must be vigilant about truth in investment decisions and ensure that your thoughts and statements reflect an honest description of reality, rather than being influenced by hidden contingencies such as the desire for status, approval, or preserving an ideology.

I love connecting with other curious nerds so if you have a comment to my article or want to introduce yourself, shoot me an email. Also, subscribe to the newsletter to receive new research, articles, and other interesting stuff directly to your inbox.

Subscribe to Emails from Oliver Sung

Read this next

Decision Making

Investment Applications of Operant Conditioning

Gambling and Investing

Market Trends

Practical Use for Investors

Read this next

Mental Model: Anchoring

The Kelly Criterion Applied to Long-Term Value Investing

The Loser’s Game

The Deliberate Entrepreneur