10-30-2018, 04:27 AM
Mad? Sad? Glad? Video Indexer now recognizes these human emotions
<div style="margin: 5px 5% 10px 5%;"><img src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions.png" width="1232" height="343" title="" alt="" /></div><div><p>Many different customers across industries want to have insights into the emotional moments that appear in different parts of their media content. For broadcasters, this can help create more impactful promotion clips and drive viewers to their content; in the sales industry it can be super useful for analyzing sales calls and improve convergence; in advertising it can help identify the best moment to pop up an ad, and the list goes on and on. To that end, we are excited to share Video Indexer’s (VI) new machine learning model that mimics humans’ behavior to detect four <a href="https://en.wikipedia.org/wiki/Paul_Ekman#Emotions_as_universal_categories" target="_blank">cross-cultural emotional states</a> in videos: anger, fear, joy, and sadness.</p>
<p>Endowing machines with cognitive abilities to recognize and interpret human emotions is a challenging task due to their complexity. As humans, we use multiple mediums to analyze emotions. These include facial expressions, voice tonality, and speech content. Eventually, the determination of a specific emotion is a result of a combination of these three modalities to varying degrees.</p>
<p>While traditional sentiment analysis models detect the polarity of content – for example, positive or negative – our new model aims to provide a finer granularity analysis. For example, given a moment with negative sentiment, the new model determines whether the underlying emotion is fear, sadness, or anger. The following figure illustrates VI’s emotion analysis of Microsoft CEO Satya Nadella’s speech on the importance of education. At the very beginning of his speech, a sad moment was detected.</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/82dbc171-dee3-45ac-9a5f-2e2393ee5472.png"><img alt="Microsoft CEO Satya Nadella's speech on the importance of education" border="0" height="343" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions.png" title="Microsoft CEO Satya Nadella" width="1232" /></a></p>
<p>All the detected emotions and their specific appearances along the video are enumerated in the video index JSON as follows:</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/5682dd63-ccd2-405e-8260-30bb5289b89c.png"><img alt="video index JSON" border="0" height="289" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions-1.png" title="JSON" width="301" /></a></p>
<h2>Cross-channel emotion detection in VI</h2>
<p>The new functionality utilizes deep learning to detect emotional moments in media assets based on speech content and voice tonality. VI detects emotions by capturing semantic properties of the speech content. However, semantic properties of single words are not enough, so the underlying syntax is also analyzed because the same words in a different order can induce different emotions.</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/45b0374b-9cb2-4e07-a8b3-5e903fb5a580.png"><img alt="Syntax" border="0" height="275" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions-2.png" title="Syntax" width="1388" /></a></p>
<p>VI leverages the context of the speech content to infer the dominant emotion. For example, the sentence <em>“… the car was coming at me and accelerating at a very fast speed …”</em> has no negative words, but VI can still detect fear as the underlying emotion.</p>
<p>VI analyzes the vocal tonality of speakers as well. It automatically detects segments with voice activity and fuses the affective information contained within with the speech content component.</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/bdb65fc5-3787-4baf-a798-443e60314b62.png"><img alt="Video Indexer" border="0" height="277" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions-3.png" title="Video Indexer" width="1101" /></a></p>
<p>With the new emotion detection capability in VI that relies on speech content and voice tonality, you are able to become more insightful about the content of your videos by leveraging them for marketing, customer care, and sales purposes.</p>
<p>For more information, visit <a href="https://www.videoindexer.ai/" target="_blank">VI’s portal</a> or the <a href="https://api-portal.videoindexer.ai/" target="_blank">VI developer portal</a>, and try this new capability for free. You can also browse videos indexed as to emotional content: <a href="https://www.videoindexer.ai/accounts/29189f48-e09a-4bce-9456-3169afd282fd/videos/e09e3055ae/" target="_blank">sample 1</a>, <a href="https://www.videoindexer.ai/accounts/29189f48-e09a-4bce-9456-3169afd282fd/videos/afdbe9521b/" target="_blank">sample 2</a>, and <a href="https://www.videoindexer.ai/accounts/29189f48-e09a-4bce-9456-3169afd282fd/videos/c324f4d698/" target="_blank">sample 3</a>. </p>
<h3>Have questions or feedback? We would love to hear from you!</h3>
<p>Use our <a href="https://cognitive.uservoice.com/forums/598144-video-indexer" target="_blank">UserVoice</a> to help us prioritize features, or email <a href="mailto:[email protected]" target="_blank">[email protected]</a> with any questions.</p>
</div>
<div style="margin: 5px 5% 10px 5%;"><img src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions.png" width="1232" height="343" title="" alt="" /></div><div><p>Many different customers across industries want to have insights into the emotional moments that appear in different parts of their media content. For broadcasters, this can help create more impactful promotion clips and drive viewers to their content; in the sales industry it can be super useful for analyzing sales calls and improve convergence; in advertising it can help identify the best moment to pop up an ad, and the list goes on and on. To that end, we are excited to share Video Indexer’s (VI) new machine learning model that mimics humans’ behavior to detect four <a href="https://en.wikipedia.org/wiki/Paul_Ekman#Emotions_as_universal_categories" target="_blank">cross-cultural emotional states</a> in videos: anger, fear, joy, and sadness.</p>
<p>Endowing machines with cognitive abilities to recognize and interpret human emotions is a challenging task due to their complexity. As humans, we use multiple mediums to analyze emotions. These include facial expressions, voice tonality, and speech content. Eventually, the determination of a specific emotion is a result of a combination of these three modalities to varying degrees.</p>
<p>While traditional sentiment analysis models detect the polarity of content – for example, positive or negative – our new model aims to provide a finer granularity analysis. For example, given a moment with negative sentiment, the new model determines whether the underlying emotion is fear, sadness, or anger. The following figure illustrates VI’s emotion analysis of Microsoft CEO Satya Nadella’s speech on the importance of education. At the very beginning of his speech, a sad moment was detected.</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/82dbc171-dee3-45ac-9a5f-2e2393ee5472.png"><img alt="Microsoft CEO Satya Nadella's speech on the importance of education" border="0" height="343" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions.png" title="Microsoft CEO Satya Nadella" width="1232" /></a></p>
<p>All the detected emotions and their specific appearances along the video are enumerated in the video index JSON as follows:</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/5682dd63-ccd2-405e-8260-30bb5289b89c.png"><img alt="video index JSON" border="0" height="289" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions-1.png" title="JSON" width="301" /></a></p>
<h2>Cross-channel emotion detection in VI</h2>
<p>The new functionality utilizes deep learning to detect emotional moments in media assets based on speech content and voice tonality. VI detects emotions by capturing semantic properties of the speech content. However, semantic properties of single words are not enough, so the underlying syntax is also analyzed because the same words in a different order can induce different emotions.</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/45b0374b-9cb2-4e07-a8b3-5e903fb5a580.png"><img alt="Syntax" border="0" height="275" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions-2.png" title="Syntax" width="1388" /></a></p>
<p>VI leverages the context of the speech content to infer the dominant emotion. For example, the sentence <em>“… the car was coming at me and accelerating at a very fast speed …”</em> has no negative words, but VI can still detect fear as the underlying emotion.</p>
<p>VI analyzes the vocal tonality of speakers as well. It automatically detects segments with voice activity and fuses the affective information contained within with the speech content component.</p>
<p><a href="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/bdb65fc5-3787-4baf-a798-443e60314b62.png"><img alt="Video Indexer" border="0" height="277" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/mad-sad-glad-video-indexer-now-recognizes-these-human-emotions-3.png" title="Video Indexer" width="1101" /></a></p>
<p>With the new emotion detection capability in VI that relies on speech content and voice tonality, you are able to become more insightful about the content of your videos by leveraging them for marketing, customer care, and sales purposes.</p>
<p>For more information, visit <a href="https://www.videoindexer.ai/" target="_blank">VI’s portal</a> or the <a href="https://api-portal.videoindexer.ai/" target="_blank">VI developer portal</a>, and try this new capability for free. You can also browse videos indexed as to emotional content: <a href="https://www.videoindexer.ai/accounts/29189f48-e09a-4bce-9456-3169afd282fd/videos/e09e3055ae/" target="_blank">sample 1</a>, <a href="https://www.videoindexer.ai/accounts/29189f48-e09a-4bce-9456-3169afd282fd/videos/afdbe9521b/" target="_blank">sample 2</a>, and <a href="https://www.videoindexer.ai/accounts/29189f48-e09a-4bce-9456-3169afd282fd/videos/c324f4d698/" target="_blank">sample 3</a>. </p>
<h3>Have questions or feedback? We would love to hear from you!</h3>
<p>Use our <a href="https://cognitive.uservoice.com/forums/598144-video-indexer" target="_blank">UserVoice</a> to help us prioritize features, or email <a href="mailto:[email protected]" target="_blank">[email protected]</a> with any questions.</p>
</div>