Introduction

In the era of digital technology, the efficient management of multimedia content, such as recordings, is vital for organizations across various industries. VIDIZMO, a leading enterprise video content management and digital evidence management system platform, has pioneered providing solutions for content processing, storage, search, and playback. 


In response to evolving needs, VIDIZMO is now extending its capabilities to support content ingestion from AWS S3 buckets. This article explores the significance of this enhancement and how it seamlessly integrates with VIDIZMO's existing functionalities.


The Role of AWS S3 Bucket 

Amazon S3 (Simple Storage Service) is a renowned cloud storage solution known for its scalability, durability, and cost-effectiveness. It is a reliable repository for a wide array of data, including phone call recordings. For instance, organizations can leverage AWS S3 to securely store their call recordings, documents, or other content. VIDIZMO's ability to seamlessly work with S3 buckets empowers organizations to efficiently manage their media content while benefiting from the scalability and security that AWS S3 provides.


The Need for Content Ingestion

In today's digital landscape, organizations are flooded with vast amounts of multimedia content from many sources. The need to efficiently manage this content has become paramount. VIDIZMO recognizes this need and extends its ingestion capabilities from the AWS S3 bucket, offering a centralized content management solution.

  • Centralized Content Repository: Extending VIDIZMO's ingestion capabilities from AWS S3 means that organizations can centralize their multimedia content from diverse sources in a specific location, streamlining content management.
    Comprehensive Search and Playback: Ingested content from AWS S3 becomes searchable and accessible through VIDIZMO's robust search and playback features. This ensures users can easily find, view, and interact with their content. 
  • Enhanced Processing:  VIDIZMO's ingestion process is designed to handle several types of content efficiently. With the inclusion of AWS S3, organizations can benefit from VIDIZMO's processing capabilities, including transcoding, indexing, and more, for their S3-stored content. 
  • Efficiency:  Manually transferring and organizing content is time-consuming and error-prone. VIDIZMO's content ingestion functionality automates these processes, saving time and reducing human error.

 

How Content Ingestion Works

Content ingestion from an Amazon S3 (Simple Storage Service) bucket involves a well-defined process that ensures the seamless and secure transfer of digital assets into a content management system like VIDIZMO. This process begins with identifying the content to be ingested, typically multimedia assets such as videos, images, or documents, stored within the designated S3 bucket.


To initiate the ingestion, users or administrators access the VIDIZMO portal and configure the system to connect with the specified S3 bucket. This connection establishes a secure channel between VIDIZMO and the S3 storage, ensuring that data transmission remains protected throughout the process. 


One of the pivotal advantages of content ingestion through AWS S3 integration is the secure storage of organizational data within the S3 bucket. VIDIZMO and AWS work in tandem to enable organizations to effectively perform content ingestion, thereby safeguarding sensitive information from unauthorized access. 


Media content stored in an Amazon Web Services (AWS) S3 bucket can be easily and automatically imported into VIDIZMO. This process allows for a streamlined workflow, where videos and other media files kept on S3 buckets are detected and transferred into VIDIZMO's hosted portal without requiring manual intervention.


VIDIZMO offers several options for users to customize how they bring content from AWS S3 buckets into VIDIZMO. These options include choosing how the content is organized once it's in VIDIZMO, like grouping similar files. Users can also set up rules for how metadata files are mapped after the content is ingested.


These features enable customers to optimize the ingestion of their content from AWS S3 into VIDIZMO, ensuring the most efficient process tailored to their specific needs and preferences. This customization capability instills confidence in the process's effectiveness.Here's the basic idea you need to grasp when bringing in content: understanding this will help you make the most of the options available and tailor them to meet your specific needs.


Ingested Content Setup


Content organization preference

When setting up content ingestion in VIDIZMO, you have two options for organizing the content: Hierarchy and Flat options. When you choose the "Flat" import option, each item from the S3 bucket is brought into the VIDIZMO portal without any folders. Everything is placed directly in the portal's root. On the other hand, when you opt for the "Hierarchy" option, the original folder structure is maintained. This means that the content is ingested into the VIDIZMO portal exactly as it was organized in the AWS S3 bucket, with the same folders created in the portal after ingestion.


Include/Exclude Folders

A user can specify the path of the folder from which they wish to include or exclude content for the content ingestion process. This allows them to determine which folders within an AWS S3 bucket should be included or excluded in the content ingestion process. Consequently, the portal will adhere to these specifications, incorporating content from the designated include folder while disregarding content from the specified exclude folder. It's worth noting that this option is entirely optional; users may choose to skip it if they don't wish to specify, include, or exclude any folders during the ingestion process. This functionality empowers users to selectively include or exclude content from folders according to their preferences.


Source Content Post Ingestion

The Post Ingestion Action options for managing source content include "Keep content unaltered post ingestion," where the content remains unchanged after ingestion. Alternatively, you can choose the "Delete" option to remove the content from the AWS S3 bucket post-ingestion. Another choice is to "Move Content to S3 Folder Post Ingestion," allowing you to specify a directory within another AWS bucket where the content should be relocated after ingestion. It's important to note that if the specified folder doesn't exist in the bucket, AWS will create it accordingly.


Publishing Status

Users have the option to choose what happens to content that they upload to the portal. They can either publish the content immediately or keep it as a draft. If the "publish" option is selected, then the content will be automatically published. If the "draft" option is selected, then the content will be saved in the draft tab.


Viewing Access 

Viewing access for ingested content can be customized. The following options are:  
  1. Portal Security/Publish settings: The portal will allow viewing access based on the settings configured in the control panel. To learn more about portal security, refer to Understanding Portal's Security Policy.
  2.  Anonymous users: This option enables anonymous users to view ingested content. It is not available in the DEMS product package. However, if the AWS S3 ingestion app is part of the Enterprise Tube product package, it will be an option. 
  3. Portal users: All the users of the portal can view the ingested content in the portal.
  4. Account and Portal users: All portal and account users can view ingested content.


Time Interval

The time interval during which the system enters a state of rest with no active tasks or operations after completing one ingestion cycle can be defined by the user based on the amount of content to be ingested. If there is a significant amount of content, it is advisable to opt for a longer interval. The minimum recommended value for this interval is 5 seconds.


Content File Grouping

As part of your content ingestion process, you can configure file grouping to organize your files better.VIDIZMO provides options for choosing a File Group Type to determine your preferred method for organizing content. 

  • None

When the user selects "None," no grouping is applied. The application ingests all the defined file types as original content and ingests it. By default, "None" is selected, meaning all files are ingested as Original Content. Specify a file type's regex pattern when defining regex to identify file type in mapping rules for the metadata file section to ingest only that type as Original Content.

  • Substring

The files are grouped based on a common character count in the file name. When organizing files based on substring, three key fields come into play: Start Position, Number of Characters to Include, and Minimum File Count in a Group. These fields are crucial as they determine how the files will be grouped based on their names.

  1. Start Position: This determines where in the file name the grouping should begin. It's the numeric index indicating the starting point for extracting a substring.
  2. Number of Characters to Include: This specifies how many characters should be taken from the file name after the Start Position to form the substring.
  3. Minimum Group File Count: This sets the minimum number of files required in each group. It ensures that a group is formed only when the specified minimum count of files sharing the exact substring is met. For instance, if the minimum count is set to 2, the system will only group files if there are at least two files with identical substrings.

For example, let's consider two files: Audio_Song.wav and Audio_Song.json.


Start Position: 0

Number of Characters to Include: 6

Minimum Group File Count: 2


Understanding how to use these key fields is crucial for effective file organization. In our example, we start extracting the substring from the beginning of the file names, taking the first 6 characters (Audio_). Since both files share this substring and there are at least 2 files, they will be grouped, making it easier to manage and locate related files.

  • Regular Expression

To group files based on a specific pattern, we utilize a Regular Expression (Regex) Pattern. This pattern defines the criteria for grouping files. For example, a Regex Pattern for grouping files with the prefix Audio_ followed by any extension (.wav or .json). The pattern will look like this:


Regex: (?<GroupName>^Audio_).*\.(wav|json)
  • (?<GroupName>: This part defines a named capturing group for the pattern.
  • ^Audio_: Specifies that the file name should start with Audio_.
  •  .*: Matches any characters (except a newline) zero or more times.
  • `\.(wav|json): Matches either .wav or .json extension at the end of the file name.

You can create and test your custom file grouping regex here.


In the case of using regular expressions for file grouping, the concept of minimum file group count still applies.

  • Last Folder

This option is for group files based on the media files present in the last folder. This mechanism focuses on the files located in the last folder of the file path. For example, if we have file paths like:


folder1/folder2/Audio_Song.wav

folder1/folder2/Audio_Music.mp3

folder1/folder3/Audio_Podcast.wav

folder1/folder3/Audio_Interview.mp3


The files are grouped based on the content of their last folder. Files placed in `folder2` and `folder3` will be grouped separately since they are the last folders in their respective file paths. This strategy organizes files based on their immediate parent folder, ensuring grouping according to folder structure.


When using the Last Folder option for file grouping, the concept of minimum file group count is still significant. 


Content FileType Mapping

This allows you to define rules for the mapping of associated metadata files of media files. You can specify the rules for the mapping of metadata files after content ingestion.  

  • Specifying Media File Sections

This option is where a user can define the media file sections to store associated media files in the selected section. These sections consist of content file parts, each responsible for storing specific types of files. Users can define rules to determine which file type belongs to which part. Multiple rules can be specified, each with its own criteria and associated content file part. If a file meets any of the provided criteria, it is placed into the corresponding content part.

The flexibility of this function allows users to store files in various formats, including .vtt and .json. No strict linkage exists between a file format and a specific part; it is the user's choice to specify which format of a file should be placed in a particular part. For instance, a user might instruct the system to move .vtt files to the Thumbnails part of the content file. However, users need to exercise caution when defining rules to avoid placing files in parts that are not intended for such formats, as it may lead to malfunctioning.

The following media file sections are as follows: 

  • Audio PCM: Reserved for digitally encoded audio data using PCM. 
  • Closed Caption:  Designated to store closed captions associated with video content. 
  • Content: A section dedicated to the primary content files. 
  • Supporting Files: This section is capable of storing files that support the main content, such as metadata, additional documentation, or related files. 
  • Thumbnails: Designated for storing thumbnail images associated with the content. 
  • Original Content: Reserved for the storage of the original content file.

 

Note: Having at least one rule for the media file section with the media file section option "OriginalContent" is mandatory. Moreover, if a user does not specify a media file section rule for a file, then the file can be located in the Supporting File section. 

  • Regex For File Type

Define the regex pattern for media file section rules, indicating the pattern to recognize file types for the selected section. Use '..*' to ingest all files or '.*' for specific files. For instance, '.*.wav' matches WAV files. 


Example: Regex Pattern '.*\.mp4' will identify all .mp4 files and subsequently store them in the designated Media File Section.


Application of Content Ingestion

One of the standout applications of utilizing AWS S3 integration for content ingestion is the creation of a secure repository for organizational data, encompassing several types of content, such as recordings. This secure environment forms the foundation for robust redaction of Personally Identifiable Information (PII) entities. By harnessing the combined capabilities of VIDIZMO and AWS, organizations can efficiently identify and redact sensitive PII information from their multimedia assets. 


Another use case of content ingestion is that legal firms, law enforcement agencies, and courts can harness the power of VIDIZMO's S3 ingestion integration to establish an organized repository for managing an extensive range of legal documents, evidence, and case files. This innovative solution simplifies access to critical legal content while upholding rigorous standards of security and regulatory compliance. 


To learn how to ingest content from an AWS S3 Bucket in VIDIZMO, please refer to our article to explore the process of ingesting content from an AWS S3 Bucket in VIDIZMO. Refer: Ingesting Content from an AWS S3 Bucket in VIDIZMO.