Introduction

In today's digital age, data has become a critical component of every organization's decision-making process. However, a significant portion of this data exists in unstructured formats, such as audio, video, and image files, making it challenging to extract and manipulate the meaningful information contained within.


To tackle this challenge, large enterprise organizations have recognized the importance of making the data extractable, readable, and searchable. To achieve this, many advanced algorithms have been introduced, and one of the most important of these is Optical Character Recognition (OCR).


OCR is an algorithm that has evolved over the years to solve the complex task of processing visual text from within ambiguous, non-queryable data into clean, searchable forms. By utilizing OCR technology, businesses can transform unstructured data into structured data that is searchable and accessible, saving valuable time and resources.


OCR technology has proven to be a game-changer for organizations dealing with large volumes of data, enabling them to extract meaningful insights from visual data sources such as images, videos, and documents. This, in turn, has led to better-informed decision-making, improved productivity, and enhanced customer experiences.


Concept

VIDIZMO is a leading platform that utilizes a combination of powerful libraries to identify text characters within digital media such as videos, images, and documents. The VIDIZMO OCR Engine processes this text by translating the characters into code that can be used for data retrieval, such as search. This powerful OCR technology enhances the searchability of content throughout the portal, making it easier and faster to locate specific information.


To further enhance the accuracy of OCR results, VIDIZMO currently supports two specialized OCR engines, one for documents and another for images and videos. This allows businesses to configure apps separately for each media type, ensuring that the OCR engine used is tailored to the specific requirements of the job. For example, VIDIZMO OCR can generate visual text insights from within documents, while Easy OCR models can generate insights from within images and videos.


This approach not only improves accuracy but also ensures that businesses can extract meaningful insights from all types of digital media. This, in turn, leads to better-informed decision-making, improved productivity, and enhanced customer experiences.


Overall, VIDIZMO's OCR capabilities are a game-changer for businesses looking to extract and manipulate data from unstructured sources. With support for specialized OCR engines, VIDIZMO is able to offer businesses a powerful tool that can enhance searchability and accessibility of content throughout the portal, making it easier to find the information they need quickly and efficiently.

.


1. VIDIZMO Indexer - This OCR Engine uses powerful libraries to detect near-to-accurate text within structured visual text of any form. It has best results in all frames where the text has been aligned and placed at acceptable angles.


In the how-to guide below, you will learn how to enable each app separately based on your business use-case.


How does OCR work in VIDIZMO?

Based on your app settings, when content is ingested in an OCR-enabled portal, workflows for content processing are initiated. These workflows include pre-processing activities that include aligning the orientation of the detected text to be able to be deciphered by the algorithms, breaking a video into frames with an interval of 500 ms to be able to analyze those frames independently, etc. After pre-processing, the content chunks are then processed using multiple libraries to detect meaningful sequence of text within them. This text is then inserted within the database and indexed to be made searchable by the end-users.


After a successful OCR activity, end-users can search words to obtain trackable results that navigate them to the page or point in a video where the text was recognized on the screen. This essentially helps sifting through webinar presentations and searching for a keyword in the middle of a training session.



Supported Languages

VIDIZMO OCR Engine supports more than 100 languages including their dialects for OCR generation. Below is the list of a few languages supported by the VIDIZMO OCR Engine. You can choose one language out of the available languages in your app settings to run processing workflows and extract text corresponding to that language.


 Afrikaans Chinese Greek Macedonian Spanish
 Albanian Croatian Hindi Maltese Swahili
 Arabic Danish Hungarian Malay Swedish
 Azerbaijani Dutch Indonesian Norwegian Tagalog
 Basque English Italian Polish Tamil
 Belarusian Esperanto Japanese Portuguese Telugu
 Bengali Estonian Kannada Romanian Thai
 Bulgarian Finnish Korean Russian Turkish
 Catalan French Latvian Serbian Ukrainian
 Czech Galician Lithuanian SlovakUrdu
 Cherokee German Malayalam SlovenianVietnamese


How-to Guide


From the Portal Homepage.

    1. Click on the navigation menu on the top left corner.

    2. Click on the Admin tab.

    3. Click on Portal Settings, to open the portal settings page.




    4. From the Portal Settings:

    5. Click on the Apps option, to expand it. 

    6. Navigate to the Content Processing and click to open it. 

    7. Now, click on the gear icon against the VIDIZMO Indexer App



VIDIZMO Indexer App - OCR Settings


VIDIZMO OCR - Settings screen will be opened in this screen.

8. Detection Type: In VIDIZMO Indexer we have multiple Detection Type, for OCR select Optical Character Recognition         option from drop down.  

9. Evidence Type: Select Type of Mashup to perform OCR check all if needed.

10. OCR Provider: Select OCR engine Paddle OCR / Tesseract OCR.

11. OCR Language: Select the OCR language.

12. Automatic Processing: If you want to perform automatic processing while document/media uploaded then put it On.  

13. Click on the Save Changes button.




2. Now, enable OCR using the toggle button.




Note: Only Manager+ portal users are authorized to enable VIDIZMO Indexer App.


Retrieving OCR Results

1. To retrieve results generated through VIDIZMO OCR follow the steps below.

i. In the Search box, enter keyword(s) against which you want to fetch information from the existing content in the Portal.

ii. The media items having the keyword (if detected from OCR) would be displayed on the library page. Click on the expand button to see the keyword occurrence(s) timeline.

iii.  Click on any point in the timeline to navigate to that specific point in the video. 



Note: The above representation is for video only. In case of document or image complete text would be displayed.


Use Cases

Search Within Videos

An average worker spends 20% of their - which is equivalent to one whole day in a week for searching information they need to do their job effectively. 

Source: McKinsey & Company

As more and more businesses share information using videos, and this wastage of time will only worsen without having a video search solution in a place. To cater to this requirement VIDIZMO's OCR functionality can be used. VIDIZMO OCR indexes your video (inclusive of images and documents) in such a way that its VIDIZMO Search Engine can find and return all words shown on-screen.


Only a few industries generate as much paperwork as the legal industry, and therefore OCR has numerous applications herein.  Reams and Reams of legal documents, filings, judgments, affidavits, especially the printed ones, can be stored and made searchable using VIDIZMO OCR Engine.  For an industry that depends heavily on judicial precedent, fast access to legal documents from millions of past cases is undoubtedly a place.


VIDIZMO OCR can extract images from documents having textual information in them and then processes them making image text searchable within a document, so you do not miss any important piece of textual information within any document.

Limitations

Though OCR is a powerful tool, yet there are some limitations associated with it.

  • It can not recognize handwriting
  • If media file contains language other than the Default Language, then results would be futile. 
  • Low-quality scans may produce inferior quality OCR
  • Does not support multiple angles rotated text
  • Works on 2D media only
  • Most of the media formatting may lose during text scanning. The output from a processed media would be a single column plain text.