TABLE OF CONTENTS


Introduction

Viewers are watching more content than ever specially with Video-On-Demand (VOD) platforms providing a rich selection of content choices anytime, anywhere, and on any screen. With proliferating content volumes, media companies are facing challenges in preparing and managing content, which is crucial to providing a high-quality viewing experience. Today, companies use large teams of trained human workforces to perform tasks such as finding different peoples (faces), places, objects, sentiments, keywords, transcription, and tags to efficiently and accurately understand the video content.  These manual processes are expensive, slow, and cannot scale to keep up with the volume of content being produced, licensed, and retrieved from daily archives.


Concept

Amazon Rekognition and Transcribe makes it easy to automate these operational media analysis tasks by providing fully managed, purpose-built APIs powered by ML. Using these APIs, VIDIZMO analyzes large volumes of videos stored in Amazon S3 or any other storage, detect objects, faces, sentiments, keywords, get transcription from voice, and give appropriate tags to your videos.


To learn more about how to proceed with its configuration, see: How to Configure Video Insights in VIDIZMO Portal using AWS.


Note: If portal's storage provider is other than Amazon S3 and AWS Video Insights is configured and enabled, then, the media would also be temporarily uploaded to associated Amazon S3 bucket for processing analytics. After successful workflow completion, the content is purged from the Storage.


Video Insights using AWS Rekognition

Amazon Rekognition makes it easy to add video analysis to your VIDIZMO application. You just upload a video to your VIDIZMO portal, and the AWS Rekognition can identify objects, people, text, scenes, and activities. It can detect any inappropriate content as well. Amazon Rekognition also provides highly accurate facial analysis, face comparison, and face search capabilities. You can detect, analyze, and compare faces for a wide variety of use cases, including user verification, cataloging, people counting, and public safety.


Sentiment analysis

Amazon Rekognition interprets emotional expressions such as happy, sad, or surprise, and demographic information such as gender from facial images. Amazon Rekognition can analyze images, and send the emotion and demographic attributes to Amazon Redshift for periodic reporting on trends such as in-store locations and similar scenarios. 


Note: A prediction of emotional expression is based on the physical appearance of a person's face only. It is not indicative of a person’s internal emotional state, and Rekognition should not be used to make such a determination.


Extract Rich Metadata

Amazon Rekognition Video is a deep learning-powered video analysis service that detects activities, sentiments, locations, recognizes objects, keywords, and identifies tags too. When coupled with VIDIZMO, it allows you to get this metadata on a clean user-friendly interface from where one can get to the point in the video where a particular speaker has spoken something or a particular object has been detected.


VIDIZMO Video Insights Tab

VIDIZMO collects video insights generated from AWS Rekognition and displays them on the Video Insights Tab. Here, the following options are available; 


i.  Search Insights for relevant labels, keywords, etc. and navigate to that specific point in the video where they are being talked about. 

ii. You can see a list of Faces that have been extracted from within the video.  

iii. Next, you can view the Labels that have been associated with your video.  

iv. Here you can see the Emotions of the person in the video.  

v. Additionally, you can view the Keywords extracted from your video after a thorough analysis of its audio and video.


Audio Insights using AWS Transcribe

Amazon Transcribe is a speech to text service that allows you to add voice AI to your VIDIZMO Applications. Transcribe is designed to process audio input from a variety of sources such as microphones, audio, or video files and provide high-quality transcriptions for search and analysis.


Punctuation and number normalization

Amazon Transcribe automatically adds punctuation and formatting so that the output closely matches the quality of manual transcription at a fraction of the time and expense. Numbers are also transcribed into digits or “normal form” instead of words.


Timestamp Generation

Amazon Transcribe returns a timestamp for each word so that you can easily find a word or phrase in the original recording or add subtitles to video.


VIDIZMO Transcription Tab

On the Transcription Tab, you can see all the generated transcriptions for the audio/video by AWS, and it leads you to the point/time where the person in the video says specific words, the available options in VIDIZMO transcription tab are: 


i. You can search the words or phrases from the video. 

ii. You can change the font-size of the transcription text. 

iii. You can edit the transcription with your own words or phrase. 

iv. You can download the transcription file. 


To learn more about it, see: Understanding Transcription Pane in VIDIZMO