{"id":17649,"date":"2018-04-18T16:27:00","date_gmt":"2018-04-18T16:27:00","guid":{"rendered":"http:\/\/www.gamasutra.com\/view\/news\/316654"},"modified":"2018-04-18T16:27:00","modified_gmt":"2018-04-18T16:27:00","slug":"blog-creating-a-hard-ai-for-terra-mystica","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2018\/04\/18\/blog-creating-a-hard-ai-for-terra-mystica\/","title":{"rendered":"Blog: Creating a hard AI for Terra Mystica"},"content":{"rendered":"<p><strong><em><small>The following blog post, unless otherwise noted, was written by a member of Gamasutra\u2019s community.<br \/>The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.<\/small><\/em><\/strong><\/p>\n<hr \/>\n<p>Spoiler alert: This doesn&#8217;t have a happy ending.\u00a0 <a href=\"http:\/\/digidiced.com\/\">Digidiced<\/a> has been hard at work for more than a year trying to produce a Hard version of its AI for <a href=\"http:\/\/digidiced.com\/terra-mystica-factions\/\">Terra Mystica<\/a> using machine learning.\u00a0 Our results have been a lot less impressive than we were hoping for.\u00a0 This article will describe a little bit about what we\u2019ve tried and why it hasn&#8217;t worked for us.<\/p>\n<p>If you\u2019ve paid attention to the latest developments in AI, you\u2019ve probably heard of <a href=\"https:\/\/en.wikipedia.org\/wiki\/AlphaGo\">AlphaGo<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/AlphaZero\">AlphaZero<\/a>, developed by Google\u2019s <a href=\"https:\/\/deepmind.com\/\">DeepMind<\/a>.\u00a0 In 2017, AlphaGo defeated Ke Jie, the #1 ranked Go player in the world.\u00a0 AlphaGo was developed by using a massive neural network and feeding it hundreds of thousands of professional games.\u00a0 From those games, it learned to predict what it thought a professional would play.\u00a0 AlphaGo then went on to play millions of games against itself, gradually improving its evaluation function little by little until it became a superhuman monster, better than any human player.\u00a0 The defeat of a human professional was thought to be decades away for a game as complex as Go. \u00a0But AlphaGo shocked everyone with its quantum leap in playing strength.\u00a0 AlphaGo was able to come up with new strategies, some of which were described as \u201cgod-like.\u201d<\/p>\n<p><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.sickgamedev.win\/wp-content\/uploads\/2018\/04\/blog-creating-a-hard-ai-for-terra-mystica.png\" \/><\/p>\n<p>But it didn&#8217;t stop there.\u00a0 In December of 2017, DeepMind introduced AlphaZero \u2013 a method that also learned the game of Go, but this time didn&#8217;t use any human-played games.\u00a0 It learned entirely from self-play, being only told the rules of the game.\u00a0 It was not given any suggestions or strategies on how to play.\u00a0 AlphaZero not only able to learn from self-play alone, it was able to get stronger than the original AlphaGo.\u00a0 And on top of that, the same methodologies were used for Chess and Shogi and the DeepMind team showed results that AlphaZero was able to solidly beat the top existing AI players in both of these games (which were already better than humans).\u00a0 Since these results have come out, there has been some criticism around if the testing conditions were really fair to the existing AI programs, so there is a little debate as to whether AlphaZero is actually stronger, but it is an outstanding achievement nonetheless.\u00a0<\/p>\n<p>It also became quite clear that AlphaZero approached chess differently than <a href=\"https:\/\/stockfishchess.org\/\">Stockfish<\/a> (the existing AI they competed against).\u00a0 While Stockfish examined 70 million positions per second, AlphaZero only examined 80,000.\u00a0 But AlphaZero was able to pack a lot more positional and strategic evaluation into each of those positions.\u00a0 By examining the games that AlphaZero played against Stockfish it became obvious to a lot of people that AlphaZero was much better at positioning its pieces and relied less on having a material advantage.\u00a0 In many cases AlphaZero would sacrifice material in order to get a better position, which it later used to come back and secure a win.\u00a0 It suggested the possibility that there might be a resurgence in chess programming ideas, which had been stagnating in recent years.<\/p>\n<p align=\"center\"><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.sickgamedev.win\/wp-content\/uploads\/2018\/04\/blog-creating-a-hard-ai-for-terra-mystica-1.png\" \/><\/p>\n<p align=\"center\">\u00a0<\/p>\n<p align=\"center\">\u00a0<\/p>\n<p align=\"center\">\u00a0<\/p>\n<p align=\"center\">\u00a0<\/p>\n<p align=\"center\">\u00a0<\/p>\n<p align=\"center\">\u00a0<\/p>\n<p align=\"center\">\u00a0<\/p>\n<p align=\"center\"><em>The DeepMind team was able to show that AlphaZero learned many human-discovered opening moves.\u00a0 They showed several examples of how different openings gained and lost popularity as it continued to learn.<\/em><\/p>\n<p>As Digidiced\u2019s AI developer, these were exciting developments for me.\u00a0 I\u2019ve had experience with machine learning and neural networks before and have been playing around with them for many years. \u00a0I once developed a network as a private commission for a professional poker player that could play triple draw low at a professional level.\u00a0 I began to wonder if I could use some of these same techniques for Digidiced\u2019s Terra Mystica app.\u00a0 One of the compelling features of AlphaGo was that it was largely based on something called a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Convolutional_neural_network\">convolutional neural network<\/a> (CNN).\u00a0 A CNN is also used in other deep learning applications like image recognition and is good at identifying positional relationships between objects.\u00a0 AlphaGo was able to use this structure to identify patterns on the Go board and determine the complex relationships that could be formed from the different permutations of stones.<\/p>\n<p>While Terra Mystica takes place on a hex-based map instead of a square grid, a CNN can still be applied to it so that the proximity of players\u2019 buildings can be incorporated, which is a critical part of TM strategy.\u00a0 However, there are several things that make TM a much more complicated game than Go.<\/p>\n<ul>\n<li>TM can have anywhere from 2 to 5 players, although it is often played with exactly 4.\u00a0 For programming AI, the leap from 2 players to more than 2 is actually a lot more difficult than most people realize.\u00a0 You may have noticed that whenever you hear about an AI reaching superhuman performance, it\u2019s almost always in a 2-player game.<\/li>\n<li>While a spot on a Go board can only have 3 states (white stone, black stone, or empty), a hex on a TM map can have 55 different states, taking into account the different terrain types and buildings.\u00a0 Add things in like towns and bridges and the complexity goes up from there.<\/li>\n<li>TM has 20 different factions using the Fire &amp; Ice expansion, and each one of these factions has different special abilities and plays differently.<\/li>\n<li>TM has numerous elements that occur off the map including the resources and economies of each player, positioning on the cult tracks, and shared power actions.<\/li>\n<li>Each game is different by adding scoring elements and bonus scrolls that are different with each game.\u00a0 Which elements are present in the particular game can have a massive effect on all of the player\u2019s strategies.\u00a0 Not to diminish the complexity of Go (a game which I\u2019m still in awe of after casually studying it for over a decade), but you\u2019re always playing the same game.<\/li>\n<\/ul>\n<p>One of the things that makes TM such a great game and causes it to have a very high skill ceiling is the fact that its economies and player interactions are so tightly interwoven.\u00a0 The correct action to take on the map can be highly dependent on not only your own situation, but the economic states of your opponents or the selection of available power actions.\u00a0 All of this makes TM orders of magnitude more complex of a game than Go.<\/p>\n<hr \/>\n<p><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.sickgamedev.win\/wp-content\/uploads\/2018\/04\/blog-creating-a-hard-ai-for-terra-mystica-2.png\" \/><\/p>\n<p align=\"center\"><em>Chaos Magicians, Swarmlings, Darklings, and Dwarves fight it out on the digital version of Terra Mystica.\u00a0 Complexities abound and an AI needs to know how to read the board.\u00a0 Darklings will want to upgrade one of their dwellings to get the town bonus. They should upgrade next to the Dwarves to keep power away from the stronger CM player. The choice of towns could affect the flow of the rest of the game:<\/em><\/p>\n<ul>\n<li align=\"center\"><em>Should they take 7VP &amp; 2 workers so they have enough workers to build a temple and grab a critical favor tile?<\/em><\/li>\n<li align=\"center\"><em>Or 9VP &amp; 1 priest that they can use to terraform a hex or send to the cults?<\/em><\/li>\n<li align=\"center\"><em>Or 8VP &amp; free cult advancements which will gain them power and cult positioning?<\/em><\/li>\n<li align=\"center\"><em>5VP &amp; 6 coins is sometimes good, but probably not in this situation since the Darklings have other income sources.<\/em><\/li>\n<\/ul>\n<p align=\"center\"><em>\u00a0 The other town choices seem inferior at this point, which the AI needs to recognize.\u00a0 Notice what is needed to plan a good turn \u2013 the recognition that a town needs to be created this turn, the optimal location of the upgraded building, the knowledge that a critical favor tile exists and how to get it, the relative value of terraforming compared to other actions, the value of cult positioning (not shown) &amp; power, as well as the value of coins which depend on how many coin-producing bonus scrolls are in the game.<\/em><\/p>\n<hr \/>\n<p>The main idea behind training the network to become stronger is called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bootstrapping\">bootstrapping<\/a>.\u00a0 I\u2019m simplifying things a bit here, but think of the neural network as an enormously complicated evaluation function.\u00a0 You feed it all the information about the map, the resources of all the players, and other variables that describe the current game state.\u00a0 It crunches the numbers and spits out an estimate of the best action to take (each action is given as a percent chance that it is the best action) and an estimate of the final scores for each player.\u00a0 Let\u2019s say you have a partially trained network that has an okay evaluation function, but not that good.\u00a0 You now use that, and each time you\u2019re going to make a move you think 2 moves ahead, considering all the options and picking what you think is best.\u00a0 You\u2019ll now have a (moderately) more informed estimate of your current state because you\u2019ve searched 2 moves ahead.\u00a0 You now try to tweak that model so that your initial estimate is more similar to your 2-moves-ahead estimate.\u00a0 If you were able to fully incorporate everything from 2 moves ahead into your evaluation function, when you use this function to search 2 moves ahead, it\u2019s equivalent to searching 4 moves ahead with your original function.\u00a0 It\u2019s not that simple, but you can see how repeating this over and over again will keep improving the model as long as it has enough features to handle the complexity.\u00a0 You just have to repeat it billions of times\u2026<\/p>\n<p><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.sickgamedev.win\/wp-content\/uploads\/2018\/04\/blog-creating-a-hard-ai-for-terra-mystica-3.png\" \/><\/p>\n<p>In order to train its networks, DeepMind was able to utilize a <em>massive<\/em> amount of hardware.\u00a0 According to an Inc.com <a href=\"https:\/\/www.inc.com\/lisa-calhoun\/google-artificial-intelligence-alpha-go-zero-just-pressed-reset-on-how-we-learn.html\">article<\/a>, the hardware used to develop AlphaZero would cost about $25 million.\u00a0 There is no way that a small company like ours would be able to compete with that.\u00a0 Some people have estimated that if you were to try and replicate the training done on a single machine, it would take 1,700 years!\u00a0 Even after all the training, when AlphaGo is run on a single machine, it still uses very sophisticated hardware, running dozens of processing threads simultaneously.\u00a0 We needed to create an AI that was capable of running on your phone.\u00a0 For each single position that AlphaGo analyzes, its neural network needs to do almost 20 billion operations.\u00a0 We were hoping to have a network with less than 20 million.\u00a0 And instead of analyzing 80,000 positions per second, we would be lucky if we could do 10.\u00a0 We also considered an even smaller network that could look at more positions per second, but it would not have enough complexity to incorporate a lot of the nuances needed for a strong player.<\/p>\n<p>So our goal was to create an AI for a game that was even more complicated than Go, using a network about a thousandth the size.\u00a0 AlphaZero was able to play over 20 million self-play games in order to help its development.\u00a0 Even renting several virtual machines and playing games 24\/7 for a few months, Digidiced was only able to collect about 40,000 self-play games.\u00a0 Despite these limitations, we were cautiously optimistic.\u00a0 We didn&#8217;t need super-human and god-like play.\u00a0 We wanted something that could be a challenge to the entire player base while not taking too long to think for each move.<\/p>\n<p><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.sickgamedev.win\/wp-content\/uploads\/2018\/04\/blog-creating-a-hard-ai-for-terra-mystica-4.png\" \/><\/p>\n<p><em>A tiny peek into the complexity of Alpha Go (from David Foster\u2019s<\/em> <a href=\"https:\/\/medium.com\/applied-data-science\/alphago-zero-explained-in-one-diagram-365f5abf67e0\"><em>AlphaGo Zero Cheat Sheet<\/em><\/a><em>).<\/em><\/p>\n<p>But even that turned out to be too much of a challenge with our limited budget.\u00a0 The AlphaZero paper claimed that starting from scratch and completely random play yielded better results than mimicking games played by humans.\u00a0 We decided to try both methods in parallel: one network would start from random play and build up network sophistication over time while another network was trained on games played on the app.\u00a0 Neither was able to create a very strong player; in fact, we were never able to create a version that could outperform our Easy version that used fairly standard Monte Carlo Tree Search.\u00a0 We even tried focusing the development on only 4-player games, but this didn&#8217;t help much.<\/p>\n<p>What was really heartbreaking was that we could see the improvement that the network was making.\u00a0 We could see the improvement over time.\u00a0 But the rate of improvement was just too slow for the amount of money we were spending.\u00a0 It was a very difficult decision, but we\u2019ve decided that we\u2019re going to halt development work on this for now.\u00a0 We still see a possibility of spending some time converting the played games from Juho Snellman\u2019s online <a href=\"https:\/\/terra.snellman.net\/\">implementation<\/a> of TM, but we don\u2019t have the funds for that now.\u00a0 Juho had very kindly given us permission to do that much earlier, but the conversion proved to be very difficult for a number of reasons, mostly due to how the platforms differed in accepting power.\u00a0 So while there is still a chance of further development, we don\u2019t want to promise anything that doesn&#8217;t seem likely.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The following blog post, unless otherwise noted, was written by a member of Gamasutra\u2019s community.The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company. Spoiler alert: This doesn&#8217;t have a happy ending.\u00a0 Digidiced has been hard at work for more than a year trying to produce a Hard [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":17650,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-17649","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/17649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=17649"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/17649\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media\/17650"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=17649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=17649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=17649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}