{"id":121729,"date":"2020-12-07T16:03:41","date_gmt":"2020-12-07T16:03:41","guid":{"rendered":"https:\/\/news.microsoft.com\/?p=440181"},"modified":"2020-12-07T16:03:41","modified_gmt":"2020-12-07T16:03:41","slug":"reinforcement-learning-helps-bring-a-new-class-of-ai-solutions-to-customers","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2020\/12\/07\/reinforcement-learning-helps-bring-a-new-class-of-ai-solutions-to-customers\/","title":{"rendered":"Reinforcement learning helps bring a new class of AI solutions to customers"},"content":{"rendered":"<p>Someone looking to book a vacation online today might have very different preferences than they did before the COVID-19 pandemic.<\/p>\n<p>Instead of flying to an exotic beach, they might feel more comfortable driving locally. With limited options for dining out, having a full kitchen might be essential. Motel rooms or cabins might be more appealing than hotels with shared lobbies.<\/p>\n<p>Countless companies use online recommendation engines to show customers products and experiences that match their interests. And yet, traditional machine learning models that predict what people might prefer are often based on data from past experience. That means they aren\u2019t necessarily able to pick up on quickly changing consumer preferences unless they are retrained with new data.<\/p>\n<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/personalizer\/\">Personalizer<\/a>, which is part of <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/?OCID=AID2100131_SEM_3a76f05e1e0318dbb4f2afeb49d6c3e4:G:s&amp;ef_id=3a76f05e1e0318dbb4f2afeb49d6c3e4:G:s&amp;msclkid=3a76f05e1e0318dbb4f2afeb49d6c3e4\">Azure Cognitive Services<\/a> within the <a href=\"https:\/\/azure.microsoft.com\/en-us\/overview\/ai-platform\/\">Azure AI platform<\/a>, uses a more cutting-edge approach to machine learning called reinforcement learning, in which AI agents can interact and learn from their environment in real time.<\/p>\n<p>The technique used to be primarily used in research labs. But now, it\u2019s making its way into more Microsoft products and services \u2014 from Azure Cognitive Services that developers can plug into apps and websites to <a href=\"https:\/\/www.microsoft.com\/en-us\/ai\/autonomous-systems\">autonomous systems<\/a> that engineers can use to refine manufacturing processes. Azure Machine Learning is also previewing <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/how-to-use-reinforcement-learning\">cloud-based reinforcement learning offerings<\/a> for data scientists and machine learning professionals.<\/p>\n<p>\u201cWe\u2019ve come a long way in the last two years when we had a lot of proof of concept projects within Microsoft and deployments with a couple of customers,\u201d said Rafah Hosn, senior director at Microsoft Research\u2019s New York lab. \u201cNow we are really progressing nicely into things that can be packaged and shrink wrapped and pointed to a particular set of problems.\u201d<\/p>\n<figure id=\"attachment_82550\" aria-describedby=\"caption-attachment-82550\" class=\"wp-caption alignright\"><a href=\"https:\/\/www.sickgaming.net\/blog\/wp-content\/uploads\/2020\/12\/reinforcement-learning-helps-bring-a-new-class-of-ai-solutions-to-customers.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-82550 size-full\" src=\"https:\/\/www.sickgaming.net\/blog\/wp-content\/uploads\/2020\/12\/reinforcement-learning-helps-bring-a-new-class-of-ai-solutions-to-customers.jpg\" alt=\"Rafah Hosn standing outside\" width=\"300\" height=\"300\"><\/a><figcaption id=\"caption-attachment-82550\" class=\"wp-caption-text\">Rafah Hosn, senior director at Microsoft Research Lab \u2013 New York City. Photo courtesy of Microsoft.<\/figcaption><\/figure>\n<p><a href=\"https:\/\/ztech.net\/\">Z-Tech<\/a>, the technology hub of Anheuser-Busch InBev, is using Personalizer to deliver tailored recommendations in an online marketplace to better serve small grocery stores across Mexico. Other Microsoft customers and partners are employing reinforcement learning to detect production anomalies and develop robots that can adjust to unpredictable real-world conditions \u2014 with models that can learn from environmental cues, expert feedback or customer behavior in real time.<\/p>\n<p>Once Microsoft began using Personalizer on its homepage to contextually personalize the products displayed to each visitor, the company saw a 19-fold increase in engagement with the products that Personalizer chose. The company has also used Personalizer internally to select the right offers, products and content across Windows, Edge browser and Xbox. These scenarios are giving up to a 60% lift in engagement across billions of personalizations each month.<\/p>\n<p>Teams has also used reinforcement learning to find the optimal jitter buffer for a video meeting, which trades off millisecond-scale information delays to provide better connection continuity, while Azure is exploring reinforcement learning-based optimization to help determine when to reboot or remediate virtual machines.<\/p>\n<p>Because reinforcement learning models learn from instantaneous feedback, they can quickly adapt to changing or unpredictable circumstances. Once the COVID-19 pandemic hit, some companies had no idea what to expect as people\u2019s purchasing and travel behaviors changed overnight, said Jeff Mendenhall, a Microsoft principal program manager for Personalizer.<\/p>\n<p>\u201cAll of their historic modeling and expert knowledge went out the window,\u201d Mendenhall said. \u201cBut with reinforcement learning, Personalizer can update the model every minute if needed to learn and respond to what actual user behaviors are right now.\u201d<\/p>\n<p>In reinforcement learning, an AI agent learns largely by trial and error. It tests out different actions in either a real or simulated world and gets a reward when the actions achieve a desired result \u2014 whether that\u2019s a customer hitting the button to book a vacation reservation or a robot successfully unloading an unwieldy bag of coins.<\/p>\n<p>Training an AI agent through reinforcement learning is similar to teaching a puppy to do a trick, Hosn said. It gets a treat when it makes decisions that yield a desired result and learns to repeat the actions that get the most treats. But in complicated real-world scenarios, exploring the vast universe of potential actions and finding an optimal sequence of decisions can be far more complicated.<\/p>\n<p>At the 34th Conference on Neural Information Processing Systems (NeurIPS 2020) this week, Microsoft researchers presented <a href=\"https:\/\/aka.ms\/MSRBlogRLNeurIPS20\">17 research papers that mark significant progress<\/a> in addressing some of the field\u2019s biggest challenges. By investing in reinforcement learning teams across its network of Microsoft Research labs, the company says it is developing a portfolio of approaches to tackle different problems and exploring multiple paths to potential breakthroughs.<\/p>\n<figure id=\"attachment_82549\" aria-describedby=\"caption-attachment-82549\" class=\"wp-caption alignleft\"><a href=\"https:\/\/www.sickgaming.net\/blog\/wp-content\/uploads\/2020\/12\/reinforcement-learning-helps-bring-a-new-class-of-ai-solutions-to-customers-1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-82549 size-full\" src=\"https:\/\/www.sickgaming.net\/blog\/wp-content\/uploads\/2020\/12\/reinforcement-learning-helps-bring-a-new-class-of-ai-solutions-to-customers-1.jpg\" alt=\"John Langford sits in an office\" width=\"300\" height=\"300\"><\/a><figcaption id=\"caption-attachment-82549\" class=\"wp-caption-text\">John Langford, partner research manager at Microsoft Research Lab \u2013 New York City. Photo by John Brecher.<\/figcaption><\/figure>\n<p>Those teams have focused on developing a <a href=\"https:\/\/aka.ms\/research-collection-rl\">robust understanding of reinforcement learning\u2019s foundational elements<\/a> and creating practical solutions for customers \u2014 not just novelty demonstrations, researchers say.<\/p>\n<p>They\u2019ve spent a lot of time figuring out which scenarios reinforcement learning is well-suited to solve, as well as probing the technical underpinnings to understand why something works and how to repeat it, said John Langford, a partner research manager at Microsoft Research Lab \u2013 New York.<\/p>\n<p>\u201cRight now there\u2019s a big gap between one-off applications where you can get PhDs to grind really hard and figure out a way to make it work as opposed to developing a routinely useful system that can be used over and over again,\u201d Langford said.<\/p>\n<p>\u201cAll of our reinforcement learning research at Microsoft really falls into two big buckets \u2014 how can we solve challenges that customers are bringing to us and what are the foundations we can use to build replicable, reliable solutions?\u201d he said.<\/p>\n<h2><strong>A different approach to machine learning<\/strong><\/h2>\n<p>Reinforcement learning uses a fundamentally different approach than supervised learning, a more common machine learning technique in which models learn to make predictions from training examples they\u2019ve been fed.<\/p>\n<p>If a person is trying to learn French, exposing themselves to French text, grammar rules and vocabulary is closer to a supervised learning approach, said Raluca Georgescu, a research software engineer working on <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-paidia\/\">Project Paidia<\/a> in the Microsoft Research Cambridge UK lab.<\/p>\n<p>With a reinforcement learning approach, they would go to France and learn by talking to people. They\u2019d be penalized with puzzled looks if they say the wrong thing and they\u2019d get rewarded with a croissant if they order it correctly, she said.<\/p>\n<p>A reinforcement learning agent learns from interacting with its environment, either in the real world or in a simulated environment that allows it to safely explore different options. It takes an action and waits to see if it results in a positive or negative outcome, based on a reward system that\u2019s been established. &nbsp;Once that feedback is received, the model learns whether that decision was good or bad and updates itself accordingly.<\/p>\n<p>It\u2019s a really simple form of learning that\u2019s endemic in the natural world, said Langford.<\/p>\n<p>\u201cEven worms can do reinforcement learning \u2014 they can learn to go towards things and avoid things based on some feedback,\u201d Langford said. \u201cThat ability to learn at a very basic level from your environment is something that is super natural for us but in machine learning it\u2019s a bit more tricky and delicate and requires more thought than supervised learning.\u201d<\/p>\n<p><a href=\"https:\/\/aka.ms\/MSRBlogRLNeurIPS20\">The new papers presented at NeurIPS this week<\/a> offer significant contributions in three key research areas: batch reinforcement learning, strategic exploration given rich observations and representation learning. Taken together, researchers say, these breakthroughs aim to boost the efficiency of models and expand the scope of problems that reinforcement learning can solve.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Someone looking to book a vacation online today might have very different preferences than they did before the COVID-19 pandemic. Instead of flying to an exotic beach, they might feel more comfortable driving locally. With limited options for dining out, having a full kitchen might be essential. Motel rooms or cabins might be more appealing [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":121730,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[49],"tags":[135,50],"class_list":["post-121729","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-news","tag-artificial-intelligence","tag-recent-news"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/121729","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=121729"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/121729\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media\/121730"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=121729"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=121729"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=121729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}