In 1898, Hans Søren Hansen arrived in Lem, Denmark, a small farming town about 160 miles from Copenhagen. The 22-year-old was eager to make his way in business and bought a blacksmith shop. In time, he became known to those in the area for his innovative spirit.
Hansen’s business went on to change with the times, morphing into building steel window frames. Future generations continued to expand on Hansen’s openness to change, evolving to building hydraulic cranes, and ultimately, in 1987, becoming Vestas Wind Systems, one of the largest wind turbine manufacturers in the world.
That tenacity to adapt and succeed has continued to define Vestas, which is now looking to optimize wind energy efficiency for customers who use its turbines in 85 countries.
Working on a proof of concept with Microsoft and Microsoft partner minds.ai, Vestas successfully used artificial intelligence (AI) and high-performance computing to generate more energy from wind turbines by optimizing what is known as wake steering.
That potential energy increase is important. But also important, Vestas says, was the rapidity with which the proof of concept was developed – in a few months – and what that could mean for putting it into place. The company is not the first to study the issue, but the expedited results were a differentiator for it.
Sven Jesper Knudsen, Vestas Chief Specialist and modeling and analytics module design owner.
“This is a theoretical exercise that has been living in the research community for years,” says Sven Jesper Knudsen, Vestas chief specialist and modeling and analytics module design owner. “And there have been some demonstrations by both our competitors and also some wind farm owners. We wanted to see if we could try to shorten the development cycle.
“Time to market is essential to the whole wind industry to meet aggressive targets that we all have,” Knudsen says.
Wind, like solar, energy is a clean alternative to fossil fuels for creating electricity. Both wind and solar are of growing importance as the world looks to decrease the use of coal, gas and crude oil to reduce carbon emissions to meet climate change goals.
Wind power also is one of the fastest-growing renewable energy technologies, according to the International Energy Agency (IEA), an organization that works with governments and industry to help them shape and secure a sustainable energy future.
In 2050, two-thirds of the world’s total energy supply will come from wind, solar, bioenergy, geothermal and hydro energy, with wind power expected to increase 11-fold, the agency said in a report last year, Net Zero by 2050: A Roadmap for the Global Energy Sector.
“In the net zero pathway, global energy demand in 2050 is around 8% smaller than today, but it serves an economy more than twice as big and a population with 2 billion more people,” the IEA says in the report.
Wind energy has many advantages. But one challenge is that the amount of energy that is harnessed can change daily based on wind conditions. Finding ways to better capture every part of wind energy is important to Vestas – hence what began last year as the “Grand Challenge,” as the company described it.
A woman works in Vestas’ blades factory in Nakskov, in south Denmark. (Photo courtesy of Vestas)
Wind turbines cast a wake, or a “shadow effect” that can slow other turbines that are located downstream, Knudsen says. Energy can be recaptured using wake steering, turning turbine rotors to point away from oncoming wind to deflect the wake.
“The idea is that you control that shadow effect away from downstream turbines and you then channel more wind energy to these downstream turbines,” he says.
To accomplish this, Vestas used Microsoft Azure high-performance computing, Azure Machine Learning and help from Microsoft partner minds.ai, which used DeepSim, its reinforcement learning-based controller design platform.
Reinforcement learning is a type of machine learning in which AI agents can interact and learn from their environment in real-time, and largely by trial and error. Reinforcement learning tests out different actions in either a real or simulated world and gets a reward – say, higher points – when actions achieve a desired result.
Vestas’ use of Azure high-performance computing also meant getting results faster.
Cloud computing is powering a new age of data and AI by democratizing access to scalable compute, storage, and networking infrastructure and services. Thanks to the cloud, organizations can now collect data at an unprecedented scale and use it to train complex models and generate insights.
While this increasing demand for data has unlocked new possibilities, it also raises concerns about privacy and security, especially in regulated industries such as government, finance, and healthcare. One area where data privacy is crucial is patient records, which are used to train models to aid clinicians in diagnosis. Another example is in banking, where models that evaluate borrower creditworthiness are built from increasingly rich datasets, such as bank statements, tax returns, and even social media profiles. This data contains very personal information, and to ensure that it’s kept private, governments and regulatory bodies are implementing strong privacy laws and regulations to govern the use and sharing of data for AI, such as the General Data Protection Regulation (GDPR) and the proposed EU AI Act. You can learn more about some of the industries where it’s imperative to protect sensitive data in this Microsoft Azure Blog post.
Commitment to a confidential cloud
Microsoft recognizes that trustworthy AI requires a trustworthy cloud—one in which security, privacy, and transparency are built into its core. A key component of this vision is confidential computing—a set of hardware and software capabilities that give data owners technical and verifiable control over how their data is shared and used. Confidential computing relies on a new hardware abstraction called trusted execution environments (TEEs). In TEEs, data remains encrypted not just at rest or during transit, but also during use. TEEs also support remote attestation, which enables data owners to remotely verify the configuration of the hardware and firmware supporting a TEE and grant specific algorithms access to their data.
At Microsoft, we are committed to providing a confidential cloud, where confidential computing is the default for all cloud services. Today, Azure offers a rich confidential computing platform comprising different kinds of confidential computing hardware (Intel SGX, AMD SEV-SNP), core confidential computing services like Azure Attestation and Azure Key Vault managed HSM, and application-level services such as Azure SQL Always Encrypted, Azure confidential ledger, and confidential containers on Azure. However, these offerings are limited to using CPUs. This poses a challenge for AI workloads, which rely heavily on AI accelerators like GPUs to provide the performance needed to process large amounts of data and train complex models.
Today, CPUs from companies like Intel and AMD allow the creation of TEEs, which can isolate a process or an entire guest virtual machine (VM), effectively eliminating the host operating system and the hypervisor from the trust boundary. Our vision is to extend this trust boundary to GPUs, allowing code running in the CPU TEE to securely offload computation and data to GPUs.
Figure 1: Vision for confidential computing with NVIDIA GPUs.
Unfortunately, extending the trust boundary is not straightforward. On the one hand, we must protect against a variety of attacks, such as man-in-the-middle attacks where the attacker can observe or tamper with traffic on the PCIe bus or on a NVIDIA NVLink connecting multiple GPUs, as well as impersonation attacks, where the host assigns an incorrectly configured GPU, a GPU running older versions or malicious firmware, or one without confidential computing support for the guest VM. At the same time, we must ensure that the Azure host operating system has enough control over the GPU to perform administrative tasks. Furthermore, the added protection must not introduce large performance overheads, increase thermal design power, or require significant changes to the GPU microarchitecture.
Our research shows that this vision can be realized by extending the GPU with the following capabilities:
A new mode where all sensitive state on the GPU, including GPU memory, is isolated from the host
A hardware root-of-trust on the GPU chip that can generate verifiable attestations capturing all security sensitive state of the GPU, including all firmware and microcode
Extensions to the GPU driver to verify GPU attestations, set up a secure communication channel with the GPU, and transparently encrypt all communications between the CPU and GPU
Hardware support to transparently encrypt all GPU-GPU communications over NVLink
Support in the guest operating system and hypervisor to securely attach GPUs to a CPU TEE, even if the contents of the CPU TEE are encrypted
Confidential computing with NVIDIA A100 Tensor Core GPUs
NVIDIA and Azure have taken a significant step toward realizing this vision with a new feature called Ampere Protected Memory (APM) in the NVIDIA A100 Tensor Core GPUs. In this section, we describe how APM supports confidential computing within the A100 GPU to achieve end-to-end data confidentiality.
APM introduces a new confidential mode of execution in the A100 GPU. When the GPU is initialized in this mode, the GPU designates a region in high-bandwidth memory (HBM) as protected and helps prevent leaks through memory-mapped I/O (MMIO) access into this region from the host and peer GPUs. Only authenticated and encrypted traffic is permitted to and from the region.
In confidential mode, the GPU can be paired with any external entity, such as a TEE on the host CPU. To enable this pairing, the GPU includes a hardware root-of-trust (HRoT). NVIDIA provisions the HRoT with a unique identity and a corresponding certificate created during manufacturing. The HRoT also implements authenticated and measured boot by measuring the firmware of the GPU as well as that of other microcontrollers on the GPU, including a security microcontroller called SEC2. SEC2, in turn, can generate attestation reports that include these measurements and that are signed by a fresh attestation key, which is endorsed by the unique device key. These reports can be used by any external entity to verify that the GPU is in confidential mode and running last known good firmware.
When the NVIDIA GPU driver in the CPU TEE loads, it checks whether the GPU is in confidential mode. If so, the driver requests an attestation report and checks that the GPU is a genuine NVIDIA GPU running known good firmware. Once confirmed, the driver establishes a secure channel with the SEC2 microcontroller on the GPU using the Security Protocol and Data Model (SPDM)-backed Diffie-Hellman-based key exchange protocol to establish a fresh session key. When that exchange completes, both the GPU driver and SEC2 hold the same symmetric session key.
The GPU driver uses the shared session key to encrypt all subsequent data transfers to and from the GPU. Because pages allocated to the CPU TEE are encrypted in memory and not readable by the GPU DMA engines, the GPU driver allocates pages outside the CPU TEE and writes encrypted data to those pages. On the GPU side, the SEC2 microcontroller is responsible for decrypting the encrypted data transferred from the CPU and copying it to the protected region. Once the data is in high bandwidth memory (HBM) in cleartext, the GPU kernels can freely use it for computation.
Figure 2: The GPU driver on the host CPU and the SEC2 microcontroller on the NVIDIA A100 Tensor Core GPU work together to achieve end-to-end encryption of data transfers.
Accelerating innovation with confidential AI
The implementation of APM is an important milestone toward achieving broader adoption of confidential AI in the cloud and beyond. APM is the foundational building block of Azure Confidential GPU VMs, now in private preview. These VMs, designed in collaboration with NVIDIA, Azure, and Microsoft Research, feature up to four A100 GPUs with 80 GB of HBM and APM technology and enable users to host AI workloads on Azure with a new level of security.
But this is just the beginning. We look forward to taking our collaboration with NVIDIA to the next level with NVIDIA’s Hopper architecture, which will enable customers to protect both the confidentiality and integrity of data and AI models in use. We believe that confidential GPUs can enable a confidential AI platform where multiple organizations can collaborate to train and deploy AI models by pooling together sensitive datasets while remaining in full control of their data and models. Such a platform can unlock the value of large amounts of data while preserving data privacy, giving organizations the opportunity to drive innovation.
A real-world example involves Bosch Research, the research and advanced engineering division of Bosch, which is developing an AI pipeline to train models for autonomous driving. Much of the data it uses includes personal identifiable information (PII), such as license plate numbers and people’s faces. At the same time, it must comply with GDPR, which requires a legal basis for processing PII, namely, consent from data subjects or legitimate interest. The former is challenging because it is practically impossible to get consent from pedestrians and drivers recorded by test cars. Relying on legitimate interest is challenging too because, among other things, it requires showing that there is a no less privacy-intrusive way of achieving the same result. This is where confidential AI shines: Using confidential computing can help reduce risks for data subjects and data controllers by limiting exposure of data (for example, to specific algorithms), while enabling organizations to train more accurate models.
At Microsoft Research, we are committed to working with the confidential computing ecosystem, including collaborators like NVIDIA and Bosch Research, to further strengthen security, enable seamless training and deployment of confidential AI models, and help power the next generation of technology.
About confidential computing at Microsoft Research
The Confidential Computing team at Microsoft Research Cambridge conducts pioneering research in system design that aims to guarantee strong security and privacy properties to cloud users. We tackle problems around secure hardware design, cryptographic and security protocols, side channel resilience, and memory safety. We are also interested in new technologies and applications that security and privacy can uncover, such as blockchains and multiparty machine learning. Please visit our careers page to learn about opportunities for both researchers and engineers. We’re hiring.
Microsoft is making upgrades to Translator and other Azure AI services powered by a new family of artificial intelligence models its researchers have developed called Z-code, which offer the kind of performance and quality benefits that other large-scale language models have but can be run much more efficiently.
“Our goal is to help everyone and every organization on the planet to communicate better, and to achieve that goal there are really two important dimensions — we want the quality of translations to be as good as possible and we want to support as many languages as possible,” said Xuedong Huang, Microsoft technical fellow and Azure AI chief technology officer.
Z-code takes advantage of shared linguistic elements across multiple languages via transfer learning —which applies knowledge from one task to another related task — to improve quality for machine translation and other language understanding tasks. It also helps extend those capabilities beyond the most common languages across the globe to underrepresented languages that have less available training data.
“With Z-code we are really making amazing progress because we are leveraging both transfer learning and multitask learning from monolingual and multilingual data to create a state-of-the-art language model that we believe has the best combination of quality, performance and efficiency that we can provide to our customers,” Huang said.
These models use a sparse “Mixture of Experts” approach that is more efficient to run because it only needs to engage a portion of the model to complete a task, as opposed to other architectures that have to activate an entire AI model to run every request. This architecture allows massive scale in the number of model parameters while keeping the amount of compute constant.
To put these models in production, Microsoft is using NVIDIA GPUs and Triton Inference Server to deploy and scale them efficiently for high-performance inference.
Microsoft has recently deployed Z-code models to improve common language understanding tasks such as name entity recognition, text summarization, custom text classification and key phrase extraction across its Azure AI services. But this is the first time a company has publicly demonstrated that it can use this new class of Mixture of Experts models to power machine translation products.
The new Z-code-based translation model is now available, by invitation initially, to customers using document translation in Translator, a Microsoft Azure Cognitive Service which is a part of Azure AI.
Microsoft’s Z-code models consistently improved translation quality over current production models, according to common industry metrics. In contrast with typical multilingual transfer learning approaches, which typically show AI quality gains in languages that have fewer direct translation examples available for training, the Z-code Mixture of Experts models show consistent gains even in the largest languages.
New Z-code Mixture of Experts AI models are powering improvements and efficiencies in Translator and other Azure AI services.
Human evaluators in a blind test commissioned by Microsoft found that the Z-code Mixture of Experts models improved translations across languages, with an average gain of 4%. For instance, the models improved English to French translations by 3.2 %, English to Turkish by 5.8 %, Japanese to English by 7.6%, English to Arabic by 9.3% and English to Slovenian by 15%.
Creating more powerful and integrative AI systems
Z-code is part of Microsoft’s larger XYZ-code initiative that seeks to combine models for text, vision, audio and multiple languages to create more powerful and integrative AI systems that can speak, hear, see and understand people better.
“Those are the pieces, the building blocks that we are using to build a truly differentiated intelligence…and to form production systems that are cost efficient,” Huang said.
Z-code models were developed as part of Microsoft’s AI at Scale and Turing initiatives, which seek to develop large models that are pretrained on vast amounts of textual data to understand nuances of language — which can be integrated in multiple Microsoft products and also made available to customers for their own uses.
The same underlying model can be fine-tuned to perform different language understanding tasks such as translating between languages, summarizing a speech, offering ways to complete a sentence or generating suggested tweets, instead of having to develop separate models for each of those narrow purposes.
Natural language understanding (NLU) is one of the longest running goals in AI, and SuperGLUE is currently among the most challenging benchmarks for evaluating NLU models. The benchmark consists of a wide range of NLU tasks, including question answering, natural language inference, co-reference resolution, word sense disambiguation, and others. Take the causal reasoning task (COPA in Figure 1) as an example. Given the premise “the child became immune to the disease” and the question “what’s the cause for this?,” the model is asked to choose an answer from two plausible candidates: 1) “he avoided exposure to the disease” and 2) “he received the vaccine for the disease.” While it is easy for a human to choose the right answer, it is challenging for an AI model. To get the right answer, the model needs to understand the causal relationship between the premise and those plausible options.
Since its release in 2019, top research teams around the world have been developing large-scale pretrained language models (PLMs) that have driven striking performance improvement on the SuperGLUE benchmark. Microsoft recently updated the DeBERTa model by training a larger version that consists of 48 Transformer layers with 1.5 billion parameters. The significant performance boost makes the single DeBERTa model surpass the human performance on SuperGLUE for the first time in terms of macro-average score (89.9 versus 89.8), and the ensemble DeBERTa model sits atop the SuperGLUE benchmark rankings, outperforming the human baseline by a decent margin (90.3 versus 89.8). The model also sits at the top of the GLUE benchmark rankings with a macro-average score of 90.8.
Microsoft will release the 1.5-billion-parameter DeBERTa model and the source code to the public. In addition, DeBERTa is being integrated into the next version of the Microsoft Turing natural language representation model (Turing NLRv4). Our Turing models converge all language innovation across Microsoft, and they are then trained at large scale to support products like Bing, Office, Dynamics, and Azure Cognitive Services, powering a wide range of scenarios involving human-machine and human-human interactions via natural language (such as chatbot, recommendation, question answering, search, personal assist, customer support automation, content generation, and others) to benefit hundreds of millions of users through the Microsoft AI at Scale initiative.
Figure 1: The SuperGLUE leaderboard as of January 6th, 2021.
DeBERTa (Decoding-enhanced BERT with disentangled attention) is a Transformer-based neural language model pretrained on large amounts of raw text corpora using self-supervised learning. Like other PLMs, DeBERTa is intended to learn universal language representations that can be adapted to various downstream NLU tasks. DeBERTa improves previous state-of-the-art PLMs (for example, BERT, RoBERTa, UniLM) using three novel techniques (illustrated in Figure 2): a disentangled attention mechanism, an enhanced mask decoder, and a virtual adversarial training method for fine-tuning.
Figure 2: The architecture of DeBERTa. DeBERTa improves the BERT and RoBERTa models by 1) using a disentangled attention mechanism where each word is represented using two vectors that encode its content and relative position, respectively, and 2) an enhanced mask decoder.
Disentangled attention: a two-vector approach to content and position embedding
Unlike BERT, where each word in the input layer is represented using a vector that sums its word (content) embedding and position embedding, each word in DeBERTa is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices based on their contents and relative positions, respectively. This is motivated by the observation that the attention weight (which measures the strength of word-word dependency) of a word pair depends on not only their contents but also their relative positions. For example, the dependency between the words “deep” and “learning” is much stronger when they occur next to each other than when they occur in different sentences.
Enhanced mask decoder accounts for absolute word positions
Like BERT, DeBERTa is pretrained using masked language modeling (MLM). MLM is a fill-in-the-blank task, where a model is taught to use the words surrounding a mask token to predict what the masked word should be. DeBERTa uses the content and position information of the context words for MLM. The disentangled attention mechanism already considers the contents and relative positions of the context words, but not the absolute positions of these words, which in many cases are crucial for the prediction.
Consider the sentence “a new store opened beside the new mall” with the italicized words “store” and “mall” masked for prediction. Although the local contexts of the two words are similar, they play different syntactic roles in the sentence. (Here, the subject of the sentence is “store” not “mall,” for example.) These syntactical nuances depend, to a large degree, upon the words’ absolute positions in the sentence, and so it is important to account for a word’s absolute position in the language modeling process. DeBERTa incorporates absolute word position embeddings right before the softmax layer where the model decodes the masked words based on the aggregated contextual embeddings of word contents and positions.
Scale Invariant Fine-Tuning improves training stability
Virtual adversarial training is a regularization method for improving models’ generalization. It does so by improving a model’s robustness to adversarial examples, which are created by making small perturbations to the input. The model is regularized so that when given a task-specific example, the model produces the same output distribution as it produces on an adversarial perturbation of that example. For NLU tasks, the perturbation is applied to the word embedding instead of the original word sequence. However, the value ranges (norms) of the embedding vectors vary among different words and models. The variance gets larger for bigger models with billions of parameters, leading to some instability of adversarial training. Inspired by layer normalization, to improve the training stability, we developed a Scale-Invariant-Fine-Tuning (SiFT) method where the perturbations are applied to the normalized word embeddings.
Conclusion and looking forward
As shown in the SuperGLUE leaderboard (Figure 1), DeBERTa sets new state of the art on a wide range of NLU tasks by combining the three techniques detailed above. Compared to Google’s T5 model, which consists of 11 billion parameters, the 1.5-billion-parameter DeBERTa is much more energy efficient to train and maintain, and it is easier to compress and deploy to apps of various settings.
DeBERTa surpassing human performance on SuperGLUE marks an important milestone toward general AI. Despite its promising results on SuperGLUE, the model is by no means reaching the human-level intelligence of NLU. Humans are extremely good at leveraging the knowledge learned from different tasks to solve a new task with no or little task-specific demonstration. This is referred to as compositional generalization, the ability to generalize to novel compositions (new tasks) of familiar constituents (subtasks or basic problem-solving skills). Moving forward, it is worth exploring how to make DeBERTa incorporate compositional structures in a more explicit manner, which could allow combining neural and symbolic computation of natural language similar to what humans do.
Acknowledgments
This research was conducted by Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. We thank our collaborators from Bing, Dynamics 365 AI, and Microsoft Research for providing compute resources for large-scale modeling and insightful discussions.
Once the developers had created that simulation framework, the AI algorithm learns through trial and error as well as feedback from operators – a process called reinforcement learning. In the simulation, the AI solution can simulate a day’s run in a mere 30 seconds.
That means the AI solution has easily gone through more simulated runs than an operator could see in many lifetimes. And its computing power means it can come up with the right option far faster. Plus, it learned from the company’s most skilled operators and Cheetos experts, so it’s monitoring the fluctuations in quality and productivity from the highest level of experience.
The AI solution “could encapsulate the knowledge and skill of the best operators, then apply that through other facilities,” says Jayson Stemmler, a technical project manager at Neal Analytics who worked on the PepsiCo pilot project. “This solution reveals interactions and relationships that might not be intuitive to operators but that exist in the data. Without the manual measurement process, PepsiCo’s engineers are able to be more efficient with their time and focus on breakthrough innovation.”
A few bad Cheetos?
After the solution spent some time in its simulation proving ground, it was time to take it to a test plant in PepsiCo’s Plano facility to see how it did with the real thing, which means testing it with some imperfect Cheetos.
“To develop this technology, we need to be able to make product that’s not good, so the AI can learn to take the product back into spec,” says Sean Eichenlaub, a senior principal engineer at PepsiCo.
Personally, I don’t see how any Cheetos could be “not good,” but I understand PepsiCo is going for perfect.
With the computer vision system continually monitoring and sending data to the Project Bonsai solution, any variance from that ideal can be fixed ASAP.
“With faster corrections, we can avoid the potential issues of going out of spec, such as having to discard product, or problems with packaging and waste,” Eichenlaub says.
I, for one, am all for a bag full of perfect Cheetos. And while the company prepares to use this Project Bonsai solution at a production plant, it’s also looking into using it with other Frito-Lay products, including the even-more-complex tortilla chip.
Leah Culler edits Microsoft’s AI Blog for Business & Technology.
Someone looking to book a vacation online today might have very different preferences than they did before the COVID-19 pandemic.
Instead of flying to an exotic beach, they might feel more comfortable driving locally. With limited options for dining out, having a full kitchen might be essential. Motel rooms or cabins might be more appealing than hotels with shared lobbies.
Countless companies use online recommendation engines to show customers products and experiences that match their interests. And yet, traditional machine learning models that predict what people might prefer are often based on data from past experience. That means they aren’t necessarily able to pick up on quickly changing consumer preferences unless they are retrained with new data.
Personalizer, which is part of Azure Cognitive Services within the Azure AI platform, uses a more cutting-edge approach to machine learning called reinforcement learning, in which AI agents can interact and learn from their environment in real time.
The technique used to be primarily used in research labs. But now, it’s making its way into more Microsoft products and services — from Azure Cognitive Services that developers can plug into apps and websites to autonomous systems that engineers can use to refine manufacturing processes. Azure Machine Learning is also previewing cloud-based reinforcement learning offerings for data scientists and machine learning professionals.
“We’ve come a long way in the last two years when we had a lot of proof of concept projects within Microsoft and deployments with a couple of customers,” said Rafah Hosn, senior director at Microsoft Research’s New York lab. “Now we are really progressing nicely into things that can be packaged and shrink wrapped and pointed to a particular set of problems.”
Rafah Hosn, senior director at Microsoft Research Lab – New York City. Photo courtesy of Microsoft.
Z-Tech, the technology hub of Anheuser-Busch InBev, is using Personalizer to deliver tailored recommendations in an online marketplace to better serve small grocery stores across Mexico. Other Microsoft customers and partners are employing reinforcement learning to detect production anomalies and develop robots that can adjust to unpredictable real-world conditions — with models that can learn from environmental cues, expert feedback or customer behavior in real time.
Once Microsoft began using Personalizer on its homepage to contextually personalize the products displayed to each visitor, the company saw a 19-fold increase in engagement with the products that Personalizer chose. The company has also used Personalizer internally to select the right offers, products and content across Windows, Edge browser and Xbox. These scenarios are giving up to a 60% lift in engagement across billions of personalizations each month.
Teams has also used reinforcement learning to find the optimal jitter buffer for a video meeting, which trades off millisecond-scale information delays to provide better connection continuity, while Azure is exploring reinforcement learning-based optimization to help determine when to reboot or remediate virtual machines.
Because reinforcement learning models learn from instantaneous feedback, they can quickly adapt to changing or unpredictable circumstances. Once the COVID-19 pandemic hit, some companies had no idea what to expect as people’s purchasing and travel behaviors changed overnight, said Jeff Mendenhall, a Microsoft principal program manager for Personalizer.
“All of their historic modeling and expert knowledge went out the window,” Mendenhall said. “But with reinforcement learning, Personalizer can update the model every minute if needed to learn and respond to what actual user behaviors are right now.”
In reinforcement learning, an AI agent learns largely by trial and error. It tests out different actions in either a real or simulated world and gets a reward when the actions achieve a desired result — whether that’s a customer hitting the button to book a vacation reservation or a robot successfully unloading an unwieldy bag of coins.
Training an AI agent through reinforcement learning is similar to teaching a puppy to do a trick, Hosn said. It gets a treat when it makes decisions that yield a desired result and learns to repeat the actions that get the most treats. But in complicated real-world scenarios, exploring the vast universe of potential actions and finding an optimal sequence of decisions can be far more complicated.
At the 34th Conference on Neural Information Processing Systems (NeurIPS 2020) this week, Microsoft researchers presented 17 research papers that mark significant progress in addressing some of the field’s biggest challenges. By investing in reinforcement learning teams across its network of Microsoft Research labs, the company says it is developing a portfolio of approaches to tackle different problems and exploring multiple paths to potential breakthroughs.
John Langford, partner research manager at Microsoft Research Lab – New York City. Photo by John Brecher.
They’ve spent a lot of time figuring out which scenarios reinforcement learning is well-suited to solve, as well as probing the technical underpinnings to understand why something works and how to repeat it, said John Langford, a partner research manager at Microsoft Research Lab – New York.
“Right now there’s a big gap between one-off applications where you can get PhDs to grind really hard and figure out a way to make it work as opposed to developing a routinely useful system that can be used over and over again,” Langford said.
“All of our reinforcement learning research at Microsoft really falls into two big buckets — how can we solve challenges that customers are bringing to us and what are the foundations we can use to build replicable, reliable solutions?” he said.
A different approach to machine learning
Reinforcement learning uses a fundamentally different approach than supervised learning, a more common machine learning technique in which models learn to make predictions from training examples they’ve been fed.
If a person is trying to learn French, exposing themselves to French text, grammar rules and vocabulary is closer to a supervised learning approach, said Raluca Georgescu, a research software engineer working on Project Paidia in the Microsoft Research Cambridge UK lab.
With a reinforcement learning approach, they would go to France and learn by talking to people. They’d be penalized with puzzled looks if they say the wrong thing and they’d get rewarded with a croissant if they order it correctly, she said.
A reinforcement learning agent learns from interacting with its environment, either in the real world or in a simulated environment that allows it to safely explore different options. It takes an action and waits to see if it results in a positive or negative outcome, based on a reward system that’s been established. Once that feedback is received, the model learns whether that decision was good or bad and updates itself accordingly.
It’s a really simple form of learning that’s endemic in the natural world, said Langford.
“Even worms can do reinforcement learning — they can learn to go towards things and avoid things based on some feedback,” Langford said. “That ability to learn at a very basic level from your environment is something that is super natural for us but in machine learning it’s a bit more tricky and delicate and requires more thought than supervised learning.”
The new papers presented at NeurIPS this week offer significant contributions in three key research areas: batch reinforcement learning, strategic exploration given rich observations and representation learning. Taken together, researchers say, these breakthroughs aim to boost the efficiency of models and expand the scope of problems that reinforcement learning can solve.
As artificial intelligence continues its rapid progress, equaling or surpassing human performance on benchmarks in an increasing range of tasks, researchers in the field are directing more effort to the interaction between humans and AI in domains where both are active. Chess stands as a model system for studying how people can collaborate with AI, or learn from AI, just as chess has served as a leading indicator of many central questions in AI throughout the field’s history.
AI-powered chess engines have consistently bested human players since 2005, and the chess world has undergone further shifts since then, such as the introduction of the heuristics-based Stockfish engine in 2008 and the deep reinforcement learning-based AlphaZero engine in 2017. The impact of this evolution has been monumental: chess is now seeing record numbers of people playing the game even as AI itself continues to get better at playing. These shifts have created a unique testbed for studying the interactions between humans and AI: formidable AI chess-playing ability combined with a large, growing human interest in the game has resulted in a wide variety of playing styles and player skill levels.
There’s a lot of work out there that attempts to match AI chess play to varying human skill levels, but the result is often AI that makes decisions and plays moves differently than human players at that skill level. The goal for our research is to better bridge the gap between AI and human chess-playing abilities. The question for AI and its ability to learn is: can AI make the same fine-grained decisions that humans do at a specific skill level? This is a good starting point for aligning AI with human behavior in chess.
Our team of researchers at the University of Toronto, Microsoft Research, and Cornell University has begun investigating how to better match AI to different human skill levels and, beyond that, personalize an AI model to a specific player’s playing style. Our work comprises two papers, “Aligning Superhuman AI with Human Behavior: Chess as a Model System” and “Learning Personalized Behaviors of Human Behavior in Chess,” as well as a novel chess engine, called Maia, which is trained on games played by humans to more closely match human play. Our results show that, in fact, human decisions at different levels of skill can be predicted by AI, even at the individual level. This represents a step forward in modeling human decisions in chess, opening new possibilities for collaboration and learning between humans and AI.
AlphaZero changed how AI played the game by practicing against itself with only knowledge of the rules (“self-play”), unlike previous models that relied heavily on libraries of moves and past games to inform training. Our model, Maia, is a customized version of Leela Chess Zero (an open-source implementation of AlphaZero). We trained Maia on human games with the goal of playing the most human-like moves, instead of being trained on self-play games with the goal of playing the optimal moves. In order to characterize human chess-playing at different skill levels, we developed a suite of nine Maias, one for each Elo rating between 1100 and 1900. (Elo ratings are a system for evaluating players’ relative skill in games like chess.) As you’ll see below, Maia matches human play more closely than any chess engine ever created.
CODE Maia Chess Explore our nine final maia models saved as Leela Chess neural networks, and the code to create more and reproduce our results.
If you’re curious, you can play against a few versions of Maia on Lichess, the popular open-source online chess platform. Our bots on Lichess are named maia1, maia5, and maia9, which we trained on human games at Elo rating 1100, 1500, and 1900, respectively. You can also download these bots and other resources from the GitHub repo.
Measuring human play
What does it mean for a chess engine to match human play? For our purposes, we settled on a simple metric: given a position that occurred in an actual human game, what is the probability that the engine plays the move that the human played in the game?
Making an engine that matches human play according to this definition is a difficult task. The vast majority of positions seen in real games only happen once, because the sheer number of possible positions is astronomical: after just four moves by each player, the number of potential positions enters the hundreds of billions. Moreover, people have a wide variety of styles, even at the same rough skill level. And even the same exact person might make a different move if they see the same position twice!
Creating a dataset
To rigorously compare engines in how well they match human play, we need a good test set to evaluate them with. We made a collection of nine test sets, one for each narrow rating range. Here’s how we made them:
First, we made rating bins for each range of 100 rating points (such as 1200-1299, 1300-1399, and so on).
In each bin, we put all games where both players are in the same rating range.
We drew 10,000 games from each bin, ignoring games played at Bullet and HyperBullet speeds. At those speeds (one minute or less per player), players tend to play lower quality moves to not lose by running out of time.
Within each game, we discarded the first 10 moves made by each player to ignore most memorized opening moves.
We also discarded any move where the player had less than 30 seconds to complete the rest of the game (to avoid situations where players are making random moves).
After these restrictions we had nine test sets, one for each rating range, which contained roughly 500,000 positions each.
Differentiating our work from prior attempts
People have been trying to create chess engines that accurately match human play for decades. For one thing, they would make great sparring partners. But getting crushed like a bug every single game isn’t that fun, so the most popular attempts at engines that match human play have been some kind of attenuated version of a strong chess engine. Attenuated versions of an engine are created by limiting the engine’s ability in some way, such as reducing the amount of data it’s trained on or limiting how deeply it searches to find a move. For example, the “play with the computer” feature on Lichess is a series of Stockfish models that are limited in the number of moves they are allowed to look ahead. Chess.com, ICC, FICS, and other platforms all have similar engines. How well do these engines match human play?
Stockfish: We created several attenuated versions of Stockfish, one for each depth limit (for example, the depth 3 Stockfish can only look 3 moves ahead), and then we tested them on our test sets. In the plot below, we break out the accuracies by rating level so you can see if the engine thinks more like players of a specific skill level.
Figure 1: Move matching accuracy for Stockfish compared with the targeted player’s Elo rating
As you can see, it doesn’t work that well. Attenuated versions of Stockfish only match human moves about 35-40% of the time. And equally importantly, each curve is strictly increasing, meaning that even depth-1 Stockfish does a better job at matching 1900-rated human moves than it does at matching 1100-rated human moves. This means that attenuating Stockfish by restricting the depth it can search doesn’t capture human play at lower skill levels—instead, it looks like it’s playing regular Stockfish chess with a lot of noise mixed in.
Leela Chess Zero: Attenuating Stockfish doesn’t characterize human play at specific levels. What about Leela Chess Zero, an open-source implementation of AlphaZero, which learns chess through self-play games and deep reinforcement learning? Unlike Stockfish, Leela incorporates no human knowledge in its design. Despite this, however, the chess community was very excited by how Leela seemed to play more like human players.
Figure 2: Move matching accuracy for Leela compared with the targeted player’s Elo rating
In the analysis above, we looked at a number of different Leela generations, with the ratings being their relative skill (commentators noted that early Leela generations played particularly similar to humans). People were right in that the best versions of Leela match human moves more often than Stockfish. But Leela still doesn’t capture human play at different skill levels: each version is always getting better or always getting worse as the human skill level increases. To characterize human play at a particular level, we need another approach.
Maia: A better solution for matching human skill levels
Maia is an engine designed to play like humans at a particular skill level. To achieve this, we adapted the AlphaZero/Leela Chess framework to learn from human games. We created nine different versions, one for each rating range from 1100-1199 to 1900-1999. We made nine training datasets in the same way that we made the test datasets (described above), with each training set containing 12 million games. We then trained a separate Maia model for each rating bin to create our nine Maias, from Maia 1100 to Maia 1900.
Figure 3: Move matching accuracy for Maia compared with the targeted player’s Elo rating
As you can see, the Maia results are qualitatively different from Stockfish and Leela. First off, the move matching performance is much higher: Maia’s lowest accuracy, when it is trained on 1900-rated players but predicts moves made by 1100-rated players, is 46%—as high as the best performance achieved by any Stockfish or Leela model on any human skill level we tested. Maia’s highest accuracy is over 52%. Over half the time, Maia 1900 predicts the exact move a 1900-rated human played in an actual game.
Figure 4: Move matching accuracy for all the models compared with the targeted player’s Elo rating
Importantly, every version of Maia uniquely captures a specific human skill level since every curve achieves its maximum accuracy at a different human rating. Even Maia 1100 achieves over 50% accuracy in predicting 1100-rated moves, and it’s much better at predicting 1100-rated players than 1900-rated players!
This means something deep about chess: there is such a thing as “1100-rated style.” And furthermore, it can be captured by a machine learning model. This was surprising to us: it would have been possible that human play is a mixture of good moves and random blunders, with 1100-rated players blundering more often and 1900-rated players blundering less often. Then it would have been impossible to capture 1100-rated style, because random blunders are impossible to predict. But since we can predict human play at different levels, there is a reliable, predictable, and maybe even algorithmically teachable difference between one human skill level and the next.
Maia’s predictions
You can find all of the juicy details in the paper, but one of the most exciting things about Maia is that it can predict mistakes. Even when a human makes an absolute howler—“hanging” a queen, in other words letting an opponent capture it for free, for example—Maia predicts the exact mistake made more than 25% of the time. This could be really valuable for average players trying to improve their game: Maia could look at your games and tell which blunders were predictable and which were random mistakes. If your mistakes are predictable, you know what to work on to hit the next level.
Figure 5: Move matching accuracy as a function of the quality of the move played in the game
Modeling individual players’ styles with Maia
In current work, we are pushing the modeling of human play to the next level: can we actually predict the moves a particular human player would make?
It turns out that personalizing Maia gives us our biggest performance gains. Whereas base Maia predicts human moves around 50% of the time, some personalized models can predict an individual’s moves with accuracies up to 75%!
We achieve these results by fine-tuning Maia. Starting with a base Maia, say Maia 1900, we update the model by continuing training on an individual player’s games. Below, you can see that for predicting individual players’ moves, the personalized models all show large improvements over the non-personalized models. The gains are so large that the personalized models are almost non-overlapping with the non-personalized ones: the personalized model for the hardest-to-predict player still gets almost 60% accuracy, whereas even the non-personalized models don’t achieve this accuracy on even the easiest-to-predict players.
The personalized models are so accurate that given just a few games, we can tell which player played them! In this stylometry task—where the goal is to recognize an individual’s playing style—we train personalized models for 400 players of varying skill levels, and then have each model predict the moves from 4 games by each player. For 96% of the 4-game sets we tested, the personalized model that achieved the highest accuracy (that is, predicted the player’s actual moves most often) was the one that was trained on the player who played the games. With only 4 games of data, we can pick out who played the games from a set of 400 players. The personalized models are capturing individual chess-playing style in a highly accurate way.
Using AI to help improve human chess play
We designed Maia to be a chess engine that predicts human moves at a particular skill level, and it has progressed into a personalized engine that can identify the games of individual players. This is an exciting step forward in our understanding of human chess play, and it brings us closer to our goal of creating AI chess-teaching tools that help humans improve. Among the many capabilities of a good chess teacher, two of them are understanding how students at different skill levels play and recognizing the playing styles of their students. Maia has shown that these capabilities are realizable using AI.
The ability to create personalized chess engines from publicly available, individual player data opens an interesting discussion on the possible uses (and misuses) of this technology. We initiate this discussion in our papers, but there is a long road ahead in understanding the full potential and implications of this line of research. As in countless times before, Chess will be one model AI system that sets the stage for this discussion.
Acknowledgments
Many thanks to Lichess.org for providing the human games that we trained on, and hosting our Maia models that you can play against. Ashton Anderson was supported in part by an NSERC grant, a Microsoft Research gift, and a CFI grant. Jon Kleinberg was supported in part by a Simons Investigator Award, a Vannevar Bush Faculty Fellowship, a MURI grant, and a MacArthur Foundation grant.
Microsoft and Code.org are excited to announce a partnership that gives every student from elementary school to high school the opportunity to learn about artificial intelligence (AI).
We’re excited to unveil our new video series on artificial intelligence and machine learning. Microsoft CEO Satya Nadella introduces the series.
At a time when AI and machine learning are changing the very fabric of society and transforming entire industries, it is more important than ever to give every student the opportunity to not only learn how these technologies work, but also to think critically about the ethical and societal impacts of AI.
AI is used everywhere, from voice assistants to self-driving cars, and it’s rapidly becoming the most important technological innovation of current times. AI has the potential to play a major role in addressing global problems, such as detecting and curing diseases, cleaning oceans, eliminating poverty, or harnessing clean energy.
At the same time, with great power comes great responsibility, and budding computer scientists must learn to consider technology’s ethical impacts. How does algorithmic bias impact social justice or deep fakes impact democracy? How does society cope with rapid job automation? By learning how to consider the ethical issues that AI raises, these future computer scientists will be better able to envision the appropriate safeguards that help to maximize the benefits of AI technologies and reduce their risks.
Made possible by Microsoft’s latest donation of $7.5 million, Code.org plans a comprehensive and age-appropriate approach to teaching how AI works along with the social and ethical considerations, from elementary school through high school.
Within the coming year, AI and machine learning lessons will be integrated into Code.org’s CS Discoveries curriculum, which is one of the most widely-used computer science courses for students in grades 6–10, and in App Lab, Code.org’s popular app-creation platform used throughout middle school and high school.
In CS Discoveries, students will learn to work with datasets to create machine learning models that they can incorporate into their apps, and explore how advances in new technologies such as computer vision and neural networks require new ethical computer scientists to avoid bias and harm. Curated datasets will help students better understand the real-world impact that these technologies have.
Code.org will also help students and teachers find additional educational resources from a variety of partners and the broader community behind AI education.
A look at a new lesson in Minecraft: Education Edition. In these new lessons, students use AI in a range of exciting real-world scenarios: to preserve wildlife and ecosystems, help people in remote areas, and research climate change.
Additionally, last month the Microsoft AI for Earth team partnered with Minecraft: Education Edition to release five lessons challenging students to use the power of AI in a range of exciting real-world scenarios: to preserve wildlife and ecosystems, help people in remote areas, and research climate change.
What’s more, Microsoft’s Imagine Cup Junior 2021 challenge provides students aged 13 to 18 the opportunity to learn about technology and how it can be used to positively change the world.
The global challenge is focused on Artificial Intelligence (AI), introducing students to AI and Microsoft’s AI for Good initiatives so they can come up with ideas to solve social, cultural and environmental issues.
On Code.org, 45% of students are young women, and in the US, 50% are students from underrepresented racial and ethnic groups and 45% are in high needs schools. Reaching the tens of millions of students in Code.org’s courses and on its platform, the partnership between Microsoft and Code.org works to democratize access to learning AI because all students deserve the opportunity to shape the world they live in — and because creating an equitable and socially just future will take all of us.
-Code.org CEO Hadi Partovi and Microsoft President Brad Smith
In August, we introduced Humans and AI, a new series of stories that highlight the people who make innovation matter. The series features passionate people from all walks of life who are using AI to transform our society and our world for the better.
Today, we are thrilled to share our next episode of “Humans and AI” featuring Nicolas Villar, a principal hardware architect for Microsoft Premonition, an early warning system that monitors the environment for signs of epidemics. Villar is building robotic devices to capture and track disease-carrying mosquitoes – a threat he understands well after living in places where mosquito-borne illnesses are a daily concern.
Villar’s past projects include Code Jumper, a physical programming language designed to be inclusive of children with all ranges of vision. He considers himself a maker who loves using technology to bring ideas to life to help others and solve problems.
Want to know more about Villar or his work? On Nov. 18, he will be answering your questions live on Twitter in a chat hosted by Microsoft Research. Submit your questions by tagging @MSFTResearch and using #MicrosoftAIChat on Twitter to share your questions ahead of time.
C3 AI CRM enables a new category of customer-focused industry AI use cases and a new ecosystem
REDWOOD CITY, CA, REDMOND, WA, and SAN JOSE, CA –October 26, 2020 –C3.ai, Microsoft Corp. (NASDAQ:MSFT), and Adobe Inc. (NASDAQ:ADBE) today announced the launch of C3 AI® CRM powered by Microsoft Dynamics 365. The first enterprise-class, AI-first customer relationship management solution is purpose-built for industries, integrates with Adobe Experience Cloud, and drives customer-facing operations with predictive business insights.
The partners have agreed to:
Integrate Microsoft Dynamics 365, Adobe Experience Cloud (including Adobe Experience Platform), and C3.ai’s industry-specific data models, connectors, and AI models, in a joint go-to-market offering designed to provide an integrated suite of industry-specific AI-enabled CRM solutions including marketing, sales, and customer service.
Sell the industry-specific AI CRM offering through dedicated sales teams to target enterprise accounts across multiple industries globally, as well as through agents and industry partners.
Target industry vertical markets initially including financial services, oil and gas, utilities, manufacturing, telecommunications, public sector, healthcare, defense, intelligence, automotive, and aerospace
Market the jointly branded offering globally, supported by the companies’ commitment to customer success
“Microsoft, Adobe, and C3.ai are reinventing a market that Siebel Systems invented more than 25 years ago,” said Thomas M. Siebel, CEO of C3.ai. “The dynamics of the market and the mandates of digital transformation have dramatically changed CRM market requirements. A general-purpose CRM system of record is no longer sufficient. Customers today demand industry-specific, fully AI-enabled solutions that provide AI-enabled revenue forecasting, product forecasting, customer churn, next-best product, next-best offer, and predisposition to buy.”
“This year has made clear that businesses fortified by digital technology are more resilient and more capable of transforming when faced with sweeping changes like those we are experiencing,” said Satya Nadella, CEO, Microsoft. “Together with C3.ai and Adobe, we are bringing to market a new class of industry-specific AI solutions, powered by Dynamics 365, to help organizations digitize their operations and unlock real-time insights across their business.”
“We’re proud to partner with C3.ai and Microsoft to advance the imperative for digital customer engagement,” said Shantanu Narayen, president and CEO of Adobe. “The unique combination of Adobe Experience Cloud, the industry-leading solution for customer experiences, together with the C3 AI Suite and Microsoft Dynamics 365, will enable brands to deliver rich experiences that drive business growth.”
“This is an exciting development in the advancement of Enterprise AI,” said Lorenzo Simonelli, chairman and CEO of Baker Hughes. “This partnership between C3.ai, Microsoft, and Adobe will bring a unique and powerful new CRM offering to the market. We are adopting AI in multiple applications internally and in new products and services for our customers through our C3.ai partnership. We look forward to offering C3 AI CRM to our customers and benefitting from the capabilities internally.”
Combining the market-leading Microsoft Dynamics 365 CRM software with Adobe’s leading suite of customer experience management solutions alongside C3.ai’s enterprise AI capabilities, C3 AI CRM is the world’s first AI-driven, industry-specific CRM built with a modern AI-first architecture. C3 AI CRM integrates and unifies vast amounts of structured and unstructured data from enterprise and extraprise sources into a unified, federated image to drive real-time predictive insights across the entire revenue supply chain, from contact to cash. With embedded AI-driven, industry-specific workflows, C3 AI CRM helps teams:
Accurately forecast revenue
Accurately predict product demand
Identify and reduce customer churn
Identify highly-qualified prospects
Next-best offer, next-best product
AI-driven segmentation, marketing, and targeting
C3 AI CRM enables brands to take advantage of their real-time customer profiles for cross-channel journey orchestration. The joint solution offers an integrated ecosystem that empowers customers to take advantage of leading CRM capabilities along with an integrated ecosystem with Azure, Microsoft 365, and the Microsoft Power Platform. C3 AI CRM is pre-built and configured for industries – financial services, healthcare, telecommunications, oil and gas, manufacturing, utilities, aerospace, automotive, public sector, defense, and intelligence – enabling customers to deploy and operate C3 AI CRM and its industry-specific machine learning models quickly. In addition, C3 AI CRM leverages the common data model of the Open Data Initiative (ODI), making it easier to bring together disparate customer data from across the enterprise.
C3 AI CRM is immediately available, with Adobe Experience Cloud sold separately. C3 AI CRM powered by Dynamics 365 will be available from C3.ai, Adobe, Microsoft and through the Microsoft Dynamics 365 Marketplace. Please contact sales@c3.ai to learn more.
###
About C3.ai
C3.ai is a leading enterprise AI software provider for accelerating digital transformation. C3.ai delivers the C3 AI Suite for developing, deploying, and operating large-scale AI, predictive analytics, and IoT applications in addition to an increasingly broad portfolio of turn-key AI applications. The core of the C3.ai offering is a revolutionary, model-driven AI architecture that dramatically enhances data science and application development.
About Microsoft
Microsoft (Nasdaq “MSFT” @microsoft) enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.
About Adobe
Adobe is changing the world through digital experiences. For more information, visit www.adobe.com.