05-21-2020, 01:57 AM
Creating a virtual stage when in-person isn’t possible
<div style="margin: 5px 5% 10px 5%;"><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/05/creating-a-virtual-stage-when-in-person-isnt-possible.png" width="1024" height="589" title="" alt="" /></div><div><div><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/05/creating-a-virtual-stage-when-in-person-isnt-possible.png" class="ff-og-image-inserted"></div>
<p>The Azure Kinect camera captures depth information with an infrared light and that data helps make the AI model more accurate. We used an app called Speaker Recorder to manage two video signals from the Azure Kinect camera, the RGB signal and the depth signal. Once the recording was complete, the AI model was applied through a command line tool. To get the full details on how this all came together, check out the <a href="https://www.microsoft.com/en-us/ai/ai-lab-virtual-stage">Microsoft AI Lab</a>.</p>
<p>The AI model we used is based on the work recently published by the University of Washington. In their research, the university developed a deep neural network that takes two images, one with a background and another one with a person in it. The output of the neural network is a smooth transparency mask.</p>
<p>This neural network was trained with images where the masking work was done manually. The UW researchers used a dataset provided by Adobe with many images where a designer manually created the transparency mask.</p>
<p>With this approach, the neural network can learn how to smooth areas like hair or lose clothing. However, there are some limitations. If the person is wearing something with a similar color to the background, the system renders it as holes in the image which defeats the illusion.</p>
<p>So, what the UW researchers did is to combine this method with another. A second neural network tries to guess the contour just by looking at the image. In the case of our virtual stage we know that we have a person on screen, so the neural network tries to identify the silhouette of that person. Adding this second neural network eliminates the color transparency issue but the small details like hair or the fingers can be an issue.</p>
<p>So, here’s the interesting part. The UW researchers created an architecture called Context Switching. Depending on the conditions, the system can pick the best solution, getting the best of the two.</p>
<p>In our case, because we are using Azure Kinect, we can go a step farther and replace the second neural network with the silhouette provided by the Kinect, which is much more accurate since it’s coming from the depth information captured.</p>
<p>The model is improved even more with another AI technique called adversarial network. We connect the output of our neural network with another neural network that identifies if an image is fake or real. This makes small variations to the original neural network to fool it. The result is a neural network that can create even more natural images.</p>
</div>
https://www.sickgaming.net/blog/2020/05/...-possible/
<div style="margin: 5px 5% 10px 5%;"><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/05/creating-a-virtual-stage-when-in-person-isnt-possible.png" width="1024" height="589" title="" alt="" /></div><div><div><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/05/creating-a-virtual-stage-when-in-person-isnt-possible.png" class="ff-og-image-inserted"></div>
<p>The Azure Kinect camera captures depth information with an infrared light and that data helps make the AI model more accurate. We used an app called Speaker Recorder to manage two video signals from the Azure Kinect camera, the RGB signal and the depth signal. Once the recording was complete, the AI model was applied through a command line tool. To get the full details on how this all came together, check out the <a href="https://www.microsoft.com/en-us/ai/ai-lab-virtual-stage">Microsoft AI Lab</a>.</p>
<p>The AI model we used is based on the work recently published by the University of Washington. In their research, the university developed a deep neural network that takes two images, one with a background and another one with a person in it. The output of the neural network is a smooth transparency mask.</p>
<p>This neural network was trained with images where the masking work was done manually. The UW researchers used a dataset provided by Adobe with many images where a designer manually created the transparency mask.</p>
<p>With this approach, the neural network can learn how to smooth areas like hair or lose clothing. However, there are some limitations. If the person is wearing something with a similar color to the background, the system renders it as holes in the image which defeats the illusion.</p>
<p>So, what the UW researchers did is to combine this method with another. A second neural network tries to guess the contour just by looking at the image. In the case of our virtual stage we know that we have a person on screen, so the neural network tries to identify the silhouette of that person. Adding this second neural network eliminates the color transparency issue but the small details like hair or the fingers can be an issue.</p>
<p>So, here’s the interesting part. The UW researchers created an architecture called Context Switching. Depending on the conditions, the system can pick the best solution, getting the best of the two.</p>
<p>In our case, because we are using Azure Kinect, we can go a step farther and replace the second neural network with the silhouette provided by the Kinect, which is much more accurate since it’s coming from the depth information captured.</p>
<p>The model is improved even more with another AI technique called adversarial network. We connect the output of our neural network with another neural network that identifies if an image is fake or real. This makes small variations to the original neural network to fool it. The result is a neural network that can create even more natural images.</p>
</div>
https://www.sickgaming.net/blog/2020/05/...-possible/