What Can We Learn From 4 Superhuman, Game-Playing AIs?
The world is full of advice, with aphorisms and philosophies for all aspects of human experience from business to romance to physical fitness. How do we evaluate these frequently conflicting philosophies, ideas and approaches?
One of the cleanest domains I’ve found for studying life theories over the years is games. Games have tightly defined rules and are separate and distinct from the rest of the world. A global pandemic doesn’t change how a pawn moves in chess. So, by looking at games, we might draw some conclusions we can apply to other, fuzzier aspects of life.
We now have AIs that are significantly better than humans at several games, which makes gaming an even more interesting lens. Within the narrow domain of gameplay, AIs can show us which of our approaches work and which ones don’t.
By looking at go, chess, poker, and StarCraft and their AI agents, AlphaGo, AlphaZero, Pluribus, and AlphaStar, respectively, we can see how these AIs achieved victory over humans and draw some valuable lessons about both technology and life.
4 Instructive, Game-Playing AIs
The AlphaGo versus Lee Sedol showdown in 2016 was very much the Deep Blue versus Kasparov match for our current AI era. Once again, a big, multinational company — in this case, it was Google and its DeepMind team — challenged a human opponent to a game of mental skill. Lee Sedol, one of the best Go players in the world, faced off with Google’s AlphaGo AI with a large prize purse on the line.
Go was invented in China more than 2,500 years ago. Like chess, it’s a turn-based game that exhibits perfect information because both players are completely aware of all the events that take place during play. One person has a bowl of black stones, the other white, and the players alternate placing them on the board. The goal is to surround more of the board than the opponent.
Play often lasts longer than a chess game and has an intuitive feel, as the players carefully balance their expansion and safety against their opponent’s. The complexity of play and lack of easy heuristics makes traditional, tree-search-based AI infeasible for mastery of go. Thus, well after chess went to the AI’s, humans still dominated this game.
To tackle this problem, DeepMind created AlphaGo using deep reinforcement learning. In this approach, the AI plays against itself repeatedly, increasing the weights in the network used by the winning side and decreasing the weights used by the losing side. Since learning the rules of the game from random moves is so complex, the initial AlphaGo variant that defeated Lee Sedol was first trained to predict professional moves in a variety of positions. Once AlphaGo plateaued in improvement, the DeepMind team restarted it, but this time feeding it its own games, rather than human ones. That process created a big jump in skill. Finally, the team overcame the challenges of learning from a blank slate and showed that such an approach could outplay the versions that learned first from humans.
With AlphaGo, observers saw around 70 matches against humans first and believed that the AI had a peaceful, simple and accommodating style. AlphaGo would consistently back down from fights, finding simple paths through the game, leading to small wins. Then DeepMind released games where the AI played against itself. These were aggressive, complex and wild affairs. It turns out AlphaGo was always ahead early in the opening against humans and could play a simple game to win but had incredible ability to fight and handle complexity when needed.
So, what lessons can we take from the AlphaGo AI?
- Learning for ourselves is more potent than being taught. There’s value in struggling with an idea and figuring something out on your own.
- At a certain level of expertise, it’s not what we don't know that holds us back, but falsehoods that we think are true. AlphaGo had learned a lot of techniques from humans that didn’t turn out to be the best ones for winning.
- Always remember your goals. When playing against humans, AlphaGo just wanted to win, even if it was just by half of a point, and it took the most straightforward path it could to get there. Sometimes we need moonshots where you achieve big or fail fast. Other times, though, you just need competency. Sometimes we need to push ourselves to the limit, whereas sometimes, a task is just a distraction from our actual goals.
In chess, we already knew that an AI like Deep Blue, incorporating smart heuristics and excellent search, was good enough to defeat humans. Then, the DeepMind team started applying the same techniques it was using for go to chess in the AlphaZero AI. Having created an algorithm that could master one board game, they wanted to find single approaches that would master many.
When this happened, we moved from an era of deep search to deep reinforcement learning. Instead of just exploring possible game trees with super human speed, deep neural networks were encoding complex insights, and performing the complex pattern-matching that comes easy to humans but that machines have struggled with for so long.
Chess pundits say that AlphaZero’s manner of play harkens back to so-called Romantic chess, a style popular from the 18th century to the 1880s. The style was marked by wild, aggressive and exciting gameplay in which players found clever and new ways to achieve a checkmate and put great emphasis on sacrificing pieces. This era gave way to the modern, materialist style that prioritizes solid defenses. In this approach, players withstood Romantic-style assaults and used their numerical superiority in pieces to win. With new chess theory, the Romantic style all but vanished from high-level play.
AlphaZero is resurrecting that older style of play, making sacrifices early in the game that don’t seem to have an obvious, immediate advantage. These initial moves eventually allow it to carefully push a slight edge to an eventual victory, though. The concept of Romantic chess was sound, it turns out; the execution was just too difficult for most human players to effectively master.
Following AlphaZero’s lead, we can learn the following lessons:
- Cultivate small advantages. Devoting 15 minutes here and there to a project can add up. If you get a little positional advantage early on in a chess game, it’s not a given that it will be gone by the end of the game.
- Be open-minded about opportunities. Sometimes just having faith that playing good moves without a particular end goal in mind will prove beneficial.
AI has been able to outplay humans at some simpler versions of poker for a while. Still, an AI from Facebook called Pluribus finally outperformed humans at a gold-standard variant in 2019: six-player, no-limit Texas hold ’em.
Most poker AIs, including Pluribus, don’t try to read their opponents the way humans might. They don’t vary their strategy because they suspect opponents are bluffing more frequently. Instead, Pluribus carefully reasons about which cards its opponents might have this hand and what information it has shared so far in its own actions and then plays to make its opponents’ lives hard no matter what strategy they’re using. In one sense, this is a limitation compared to the human ability to reason about each other’s minds or interpret body language. Matched up against a table of professional poker players, however, Pluribus demonstrated such skills aren’t necessary to win.
One notable aspect of Pluribus is that it folds more often than amateurs. Novice players get excited about the potential of a given hand, and they may get overly aggressive. Despite this, a player’s odds are only one-in-six that she’ll have the best hand. And, if the opponents are folding a lot, they’re going to have better hands when it comes time to put a lot of money on the table.
As a result, Pluribus often folds many hands in all situations, including any time it has a two. Since higher cards defeat lower cards, there’s a simple logic to avoiding hands with weaknesses, but humans will mix up their hands more to confuse opponents. Playing predictably gives its opponents an informational edge, but the AI finds itself more than compensated by never having to bet on poor-quality hands.
What does Pluribus’ style of play teach us, then?
- Trust that the underlying mechanics will shine through in life’s endeavors. There is a lot of room for style in games, but there are also underlying mechanics that push specific ways of playing: slow versus fast, or seeing lots of hands versus a few good ones. AIs show great variety in their approaches, but if there’s even a tiny edge in a game, the AI exploits it fastidiously.
StarCraft is a real-time strategy (RTS) game in which players wage war between science fiction factions, creating an economic base to raise an army and destroy their opponents. They rapidly divide their attention between tending to their base, directing their armies into battle, and scouting out their opponents plans. Thanks to its deep strategy and a demand for frenetic action on the players’ parts, it was one of the first massively popular e-sports, with millions of dollars in prizes awarded to top players.
StarCraft is the only game on this list in which an AI limited to human reaction speeds plays as well as, but not better than, the top humans. The real-time, continuous domain makes reasoning harder. Additionally, although a poker AI can ignore its opponents’ tendencies and play its own style to great success, StarCraft has more complex counters. The players must predict each other’s strategies and adapt to them.
AlphaStar is an AI built to play StarCraft. Direct application of reinforcement learning tends to create agents that chase their tails, continually learning the counters to their existing strategy but just shifting between the fundamental strategies. In StarCraft, an AI might first learn to throw all its units against the enemy immediately in a rush strategy. This attack can be countered by building defensive structures, so reinforcement learning would adapt to this approach. And overly defensive play can be defeated by focusing on exploiting resources to eventually build a huge army.
Once the AI moves on to this strategy, it discovers that the approach is weak to a rush: Starting with an all-out attack. Human players develop nuance, using scouting and mixed strategies to react appropriately to their opponents approaches. Reinforcement learning tends to just swing around from extreme to extreme.
To address this problem, DeepMind trained a whole league of thousands of different agents. Even with this approach, though, these agents tend to become self-similar, so DeepMind gave each agent its own goal. Some tried to be the best, but some had handicaps, like winning while also building 10 of a particular unit or within a set time limit. They also trained agents tasked only with beating the most robust algorithm.
Just training one, a few, or even many networks together wasn’t enough for AlphaStar to produce high-quality competitors. But once they could introduce a range of goals and, thus, create a few agents strong against all numbers of reasonable strategies, they had an AI that was ready to face off against humans. Some early versions had superhuman access to the map and the ability to briefly click at superhuman speeds, a substantial advantage in certain situations. After these were ironed out, though, the AI reached grandmaster status, playing better than all but the top 0.2 percent of players.
What lessons does the AlphaStar example hold?
- Although there are lessons to be drawn here about teamwork and diversity, another important one is to be playful! Consider the edge cases. Consider unlikely solutions. Put time aside to try to succeed at tasks in some unusual way. Overly narrow learning leaves us rigid.
Learning Game Thinking From AI
I’m absolutely not suggesting that you adopt the personality of an AI. But, by looking at some of the obstacles and successes achieved by some of the most sophisticated game-playing algorithms, we can surprisingly derive some great lessons for life. The strategies and tactics these AIs use are proven successful in at least a narrow domain like chess, so perhaps there’s something we can take away to less easily quantified domains like business.