Introduction 

The VIDIZMO Audio Indexer allows you to create summaries of the transcribed content on your Portal. It utilizes an AI model that abstractive summarization, where it extracts the essential ideas from the input transcriptions and generates coherent and comprehensive summaries. You can also configure the VIDIZMO Audio Indexer app to perform automatic summarization on the audios or videos that you upload to your Portal. 


Using the application, you can also generate and regenerate summaries for your content using on-demand processing. The summaries are displayed in a separate tab on the audio or video’s playback page. You can also edit the summary according to your preferences. The content of your summary can also be moderated by specifying forbidden words in the VIDIZMO Audio Indexer. 


For more details regarding the configurations, see Configuring the VIDIZMO Audio Indexer for Summarization


Concept 

The VIDIZMO Audio Indexer can be enabled to generate summaries of transcribed audio and videos on your Portal. This functionality works on transcriptions generated from other indexing applications such as Azure ARM or AWS Indexer.

  • Abstractive Summarization 

The summarization feature uses the AI model to perform abstractive summarization, which is different from extractive summarization, where a model selects the sentences, it finds important from the base of the input text and produces them in a summary without changing the words, vocabulary, or structure. The result is that the model provides a more coherent and well-structured summary created from the input text. 


In abstractive summarization, the model the input text to get a general idea of the subjects covered in it and represents those core ideas in the output summary. This also means that the summary generated by the AI model may have a different sentence structure, vocabulary, context, or speech than the base input text. 


  • Supported Languages 

The AI model also offers support for multiple languages, allowing you to generate summaries in most of the prominent languages like English, French, Spanish, Arabic, and Hindi etc. 


  • Account Metrics 

In VIDIZMO, you can access metrics related to the consumption of various resources. One of the resources is AI Processing, which measures the number of AI processing activities you have carried out across your VIDIZMO Account. The summarization feature falls under the 'AI Processing' category, which means that your AI processing consumption increases every time you perform this activity. 

 

For more information regarding reports, visit Consumption Reports for Deployment Overview


Summarization Process  

The summarization process begins with the preparation of data. During the data preprocessing phase, the model prepares the data for summarization. It removes special characters or tokens and removes whitespace between the text (such as the spaces between paragraphs).    

  

After the preprocessing stage and the data is cleaned, the model then performs tokenization, in which the input text is broken into smaller components called tokens, each of which represents a word or sub word. The created tokens are then assigned a unique token ID, which the AI model utilizes to indicate or identify a specific word or sub word.


The AI model used by the VIDIZMO Audio Indexer uses encoding and decoding to perform summarization. The tokens created from the input text are encoded in hidden states that provide additional information about the words or sub words. The decoder uses the hidden states, and the information provided by these hidden states to make accurate predictions for the words to be used when it generates a summary. 


Forbidden Words Parameter 

The Forbidden Words parameter is used for content moderation, ensuring that specific words are avoided in the generated output text. Mainly, the use case is where adherence to strict language guidelines or policies is required.

  

Similar to the tokenization of the input text (i.e., transcription), the model also tokenizes the words in the forbidden words list and assigns them a unique token ID. These token IDs are then used by the model to determine which words will be avoided or not generated in the output text (i.e., the summary).   


When the model predicts the next likely word during summary generation, it checks the token IDs of the predicted words against the token IDs of forbidden words. If there's a match, the model will ignore that predicted word and choose the next most probable way (either a word or phrase) to represent that information that conveys the same meaning and does not distort or alter the general idea of the input text. 


  • Key Consideration 

For the input of this parameter, you need to provide word(s) separately as independent entries. Phrases or sentences are an invalid input and will not work for content moderation. 


To see how you can utilize the summarization feature on your Portal, visit How to Perform Summarization using VIDIZMO Audio Indexer.