Roles & Permissions
Manager+ roles could configure settings to enable OCR from VIDIZMO Indexer and Moderator+ user can utilize OCR capabilities on their content.
Introduction
With the increase of data in different forms the need to utilize it effectively is becoming crucial. As VIDIZMO helps you to manage your content, which could contain important data in different formats of media. To increase the searchability of these media in VIDIZMO portal we have introduced OCR (Optical Character Recognition) capability which will analyze the text present in your media and will extract and save it for further use. This text is then inserted within the database and indexed to be made searchable by the end-users.
Overview
Now VIDIZMO indexer is enriched with the capability to perform OCR as well. OCR could be performed on Videos, Images, and Documents. We trained a powerful OCR engine to cater your business needs and to perform exceptionally on different type of data. Additionally, you can also perform OCR on multilingual data present inside a single media (English, Chinese and any other language). This powerful functionality will help you manage media across the portal efficiently. High accuracy of OCR results can be obtained by providing sharp, high-quality media.
Use Cases
Search Within Videos
As more and more businesses share information using videos, and this wastage of time will only worsen without having a video search solution in a place. To cater to this requirement VIDIZMO's OCR functionality can be used. VIDIZMO OCR indexes your video (inclusive of images and documents) in such a way that its VIDIZMO Search Engine can find and return all words shown on-screen.
Enhanced Use of Legal Paperwork
Only a few industries generate as much paperwork as the legal industry, and therefore OCR has numerous applications herein. Reams and Reams of legal documents, filings, judgments, affidavits, especially the printed ones, can be stored and made searchable using VIDIZMO OCR Engine. For an industry that depends heavily on judicial precedent, fast access to legal documents from millions of past cases is undoubtedly a place.
OCR For Video
OCR can be configured to recognize optical characters appearing in your videos. Video OCR detects, extracts, and read areas having characters or texts present in your digital video data. Once you run Video OCR on your desired media we extract frames out of your video which is then passed to our powerful OCR engine and optical characters are then extracted from the processed frames. Video OCR not only makes your media searchable within VIDIZMO portal but you can also use this OCR data to redirect to the timestamp that Optical Character was appeared in. Processing time of video OCR depends on the duration and size of respective video
OCR For Image
VIDIZMO provides you the option to run OCR on your images and make them searchable as well. These images are passed to OCR engine and the OCR data is then dumped with the metadata of that image. Image OCR process takes less time than OCR video as it does not constitute of multiple frames.
OCR For Document
Extracting textual data from a scanned document and utilize it to make the document searchable on the basis of its content is possible with VIDIZMO Document OCR. Secondly, With OCR data you can redirect to the relevant page while browsing a multipage page scanned document.
As our OCR engine takes image as an input so for your editable documents we create a copy image and pass it to our OCR engine then map the OCR data based on time on your original media.
The document OCR data is segmented on the basis of line and we also stores the corresponding co ordinates with it so that the exact position of these characters are stored as well.
Note: To perform OCR on any of the mentioned media types you have to configure the relevant media type from VIDIZMO Indexer.
How OCR in VIDIZMO Works ?
Once the OCR activity is triggered it divides your media in multiple images incase of video and document OCR and pass it to the OCR engine which perform OCR and provide OCR data in return which is then mapped on your media and the result OCR data is saved in database so that it becomes searchable throughout the portal.
Supported Languages
Below is the list of a few languages supported by the VIDIZMO OCR Engine. You can choose one language out of the available languages in your app settings to run processing workflows and extract text corresponding to that language. Our powerful OCR Engine has the capability of detecting English and Chinese characters by default from your content which means even If you have selected any language other than English or Chinese it will detect the character in these two languages automatically.
Language | Abbreviation | Language | Abbreviation |
Abaza | abq | Goan Konkani | gom |
Afrikaans | af | Icelandic | is |
Albanian | sq | Tabassaran | tab |
Azerbaijani | az | Kurdish | ku |
Belarusian | be | Irish | ga |
Bosnian | bs | Lithuanian | lt |
chinese and english | ch | Arabic | ar |
chinese traditional | ch_tra | Occitan | oc |
Czech | cs | Latvian | lv |
Danish | da | Malay | ms |
Dutch | nl | Kabardian | kbd |
english | en | Hindi | hi |
french | fr | Uyghur | ug |
german | german | Persian | fa |
Italian | it | Marathi | mr |
japan | japan | Urdu | ur |
korean | korean | Serbian(latin) | rs_latin |
Maltese | mt | Adyghe | ady |
Mongolian | mn | Newari | new |
Norwegian | no | Avar | ava |
Polish | pl | Dargwa | dar |
Portuguese | pt | Serbian(cyrillic) | rs_cyrillic |
Romanian | ro | Ingush | inh |
Russia | ru | Bulgarian | bg |
Saudi Arabia | sa | Hungarian | hu |
Slovak | sk | Lak | lbe |
Slovenian | sl | Lezghian | lez |
Spanish | es | Nepali | ne |
Swahili | sw | Maithili | mai |
Swedish | sv | Bihari | bh |
Tagalog | tl | Angika | ang |
Tamil | ta | Indonesian | id |
Telugu | te | Croatian | hr |
Turkish | tr | Bhojpuri | bho |
Ukranian | uk | Estonian | et |
Uzbek | uz | Magahi | mah |
Vietnamese | vi | Nagpur | sck |
Welsh | cy | Maori | mi |
Searching OCR Data
Searching on the basis of OCR data is achievable both from the both Media Library and and media playback.
OCR based Search in Media Library
While you are searching for all media having similar keywords or don’t remember the title of your media OCR based search in your media library comes to the rescue. You can type any keyword which appeared as a text in your video, audio or was a part of your document and the relevant results will be displayed.
In these search results all occurrences of the searched keyword will be displayed and you can choose the desired one. By selecting your desired occurrence of the searched keyword you will be redirected to the instance of media that occurrence has taken place.
OCR based Search From Media Playback
While browsing through a document or watching a video you can search for a specific character and all instances of that character is displayed, You can select the desired one and it will redirect you to the corresponding duration or page of your media which will make the process of viewing media more seamless.