Introduction

Viewers are watching more content than ever, especially with Video-On-Demand (VOD) platforms providing a rich selection of content choices anytime, anywhere, and on any screen. With proliferating content volumes, media companies are facing challenges in preparing and managing content, which is crucial to providing a high-quality viewing experience.

Today, companies use large teams of trained human workforces to perform tasks such as finding different peoples (faces), places, objects, sentiments, keywords, transcription, and tags to efficiently and accurately understand video content.  These manual processes are expensive, slow, and cannot scale to keep up with the volume of content being produced, licensed, and retrieved from daily archives.


Concept

Amazon Rekognition and Transcribe make it easy to automate these operational media analysis tasks by providing fully managed, purpose-built APIs powered by ML. Using these APIs, VIDIZMO analyzes large volumes of videos stored in Amazon S3 or any other storage, detects objects, faces, sentiments, keywords, gets transcription from voice, and gives appropriate tags to your videos.


To learn more about how to proceed with its configuration, see: How to Configure Video Insights in VIDIZMO Portal using AWS.


Note: If a portal storage provider other than Amazon S3 and AWS Video Insights is configured and enabled, then the media would also be temporarily uploaded to the associated Amazon S3 bucket for processing analytics. After successful workflow completion, the content is purged from the Storage.


Video Insights using AWS Rekognition

Amazon Rekognition makes it easy to add video analysis to your VIDIZMO application. You just upload a video to your VIDIZMO portal, and the AWS Rekognition can identify objects, people, text, scenes, and activities. It can detect any inappropriate content as well. Amazon Rekognition also provides highly accurate facial analysis, face comparison and face search capabilities. You can detect, analyze, and compare faces for various use cases, including user verification, cataloging, people counting, and public safety.


Sentiment analysis

Amazon Rekognition interprets emotional expressions, such as happiness,, sadness, surprise and demographic information, such as gender, from facial images. Amazon Rekognition can analyze images and send the emotion and demographic attributes to Amazon Redshift for periodic reporting on trends, such as in-store locations and similar scenarios. 


Note: A prediction of emotional expression is based only on the physical appearance of a person's face. It is not indicative of a person’s internal emotional state, and Rekognition should not be used to make such a determination.


Extract Rich Metadata

Amazon Rekognition Video is a deep learning-powered video analysis service that detects activities, sentiments, locations, recognizes objects, keywords, and identifies tags too. When coupled with VIDIZMO- it allows you to get this metadata on a clean, user-friendly interface from where one can get to the point in the video where a particular speaker has spoken something or a specific object has been detected.


VIDIZMO Video Insights Tab

VIDIZMO collects video insights generated from AWS Rekognition and displays them on the Video Insights Tab. Here, the following options are available:


i.  Search Insights for relevant labels, keywords, etc. and navigate to that specific point in the video where they are being talked about. 

ii. You can see a list of Faces that have been extracted from within the video.  

iii. Next, you can view the Labels that have been associated with your video.  

iv. Here, you can see the person's emotions in the video.  

v. Additionally, you can view the Keywords extracted from your video after thoroughly analyzing its audio and video.


Audio Insights using AWS Transcribe

Amazon Transcribe is a speech-to-text service that allows you to add voice AI to your VIDIZMO Applications. Transcribe is designed to process audio input from various sources, such as microphones, audio, or video files and provide high-quality transcriptions for search and analysis.


Punctuation and number normalization

Amazon Transcribe automatically adds punctuation and formatting so that the output closely matches the quality of manual transcription at a fraction of the time and expense. Numbers are transcribed into digits or “normal form” instead of words.


Timestamp Generation

Amazon Transcribe returns a timestamp for each word so that you can easily find a word or phrase in the original recording or add subtitles to the video.


VIDIZMO Transcription Tab

On the Transcription Tab, you can see all the generated transcriptions for the audio/video by AWS, and it leads you to the point/time where the person in the video says specific words, the available options in the VIDIZMO transcription tab are: 


i. You can search for the words or phrases from the video. 

ii. You can change the font-size of the transcription text. 

iii. You can edit the transcription with your own words or phrase. 

iv. You can download the transcription file. 


To learn more about it, see: Understanding Transcription Pane in VIDIZMO


Contributions were made by Perwasha Khan & Waqar Baig.


Read Next

Understanding Video Insights

How to Configure Video Insights in VIDIZMO Portal using AWS