Introduction

In VIDIZMO, the Spoken Personally Identifiable Information (PII) feature focuses on the automated redaction of sensitive personal information (PII) within audio and video media files. This feature operates smoothly on batches of audio files or on video content stored in an Amazon S3 Bucket. Importantly, it achieves this without the need for any transcoding. 


When a user uploads an audio or video file, the system immediately initiates a detection process to identify any PII present within the content. Following this detection, the redaction process seamlessly begins while concurrently generating redacted PII.


This capability eliminates the need for users to manually trigger the detection or redaction job after uploading or importing their media file. It provides an effortless means to initiate automated PII detection, thereby ensuring the protection of sensitive information. 


This automated process encompasses the identification and redaction of various types of PII. Users can confidently upload or import their required mashups, knowing that the system will seamlessly launch the detection or redaction process without necessitating manual intervention.


Moreover, the automated PII detection and redaction process in VIDIZMO is highly efficient, delivering timely results and minimizing potential delays in the workflow.


This article will explore how to automatically detect and redact Personally Identifiable Information (PII) from audio and video files.


Prerequisites

  • Ensure that you have administrative or manager access to your VIDIZMO Portal.
  • Confirm that your subscription package includes the redaction feature.
  • Verify that the AWS Video Indexer is properly configured and enabled in your VIDIZMO Portal.
  • Ensure that the Storage Provider in VIDIZMO is configured to use AWS for storage.
  • Active AWS Account.
  • Permissions for AWS Services: To generate and fetch video insights from the AWS  S3 bucket, you must have the following permissions:


Configuration Guide: Setting Up Required AWS Permissions

 It is essential to have an active AWS account and create an Amazon S3 bucket. Additionally, a specific set of permissions must be configured within your AWS account. This guide will walk you through the necessary steps to get everything ready.


 Creating an AWS Account and Obtaining Access Keys

  1. If you don't already have an AWS account, create an AWS account by visiting AWS Signup. Follow the registration process, providing the required information.
  2. Once your AWS account is set up, access your Access Key and Secret Key for authentication by following these steps:
  • Log in to the AWS Management Console.
  • Navigate to the IAM (Identity and Access Management) dashboard.
  • Select Users from the left navigation pane.
  • Choose the user intended for this service or create a new one if necessary.
  • Access the Security credentials tab.
  • If you don't already have an Access Key, click Create access key under the Access keys section.
  • Ensure you save your Access Key and Secret Key securely, as required later, during the AWS Indexer configuration within the VIDIZMO portal.


Kindly refer to the "AWS Identity and Access Management" guide to access more details.



Creating an Amazon S3 Bucket

To create an Amazon S3 bucket, access your AWS Management Console with the same account credentials used to obtain your Access Key and Secret Key. Then, follow this step-by-step guide, "Create an Amazon S3 Bucket."


To know more about it, kindly refer to the "Creating a Bucket" guide.


Setting Permissions in AWS Account

Certain permissions are required for this feature to function correctly. To grant these permissions to your AWS account, you have to create some IAM policies in JSON format. You can create this policy manually in the IAM dashboard. 


Here are the JSON code snippets containing the necessary permissions. You can simply copy and paste these into your AWS account to create the required policies.


In the left navigation pane, click on "policy" to access the Policies dashboard. Create a set of policies and allocate them to the designated user.


Rekognition

Allocate this policy to the specified user.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "rekognition:ListDatasetEntries",
        "rekognition:ListUsers",
        "rekognition:StartFaceSearch",
        "rekognition:SearchUsers",
        "rekognition:DescribeDataset",
        "rekognition:IndexFaces",
        "rekognition:SearchUsersByImage",
        "rekognition:DescribeStreamProcessor",
        "rekognition:DetectCustomLabels",
        "rekognition:ListFaces",
        "rekognition:AssociateFaces",
        "rekognition:ListTagsForResource",
        "rekognition:DescribeCollection",
        "rekognition:CreateCollection",
        "rekognition:SearchFaces",
        "rekognition:DeleteCollection",
        "rekognition:DisassociateFaces",
        "rekognition:ListStreamProcessors",
        "rekognition:DeleteFaces",
        "rekognition:SearchFacesByImage",
        "rekognition:ListDatasetLabels"
      ],
      "Resource": [
        "arn:aws:rekognition:*:Account_ID:project/*/version/*/*",
        "arn:aws:rekognition:*:Account_ID:streamprocessor/*",
        "arn:aws:rekognition:*:Account_ID:project/*/dataset/*/*",
        "arn:aws:rekognition:*:Account_ID:collection/*"
      ]        [Replace 'Account_ID' with your actual AWS account ID]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": [
        "rekognition:ListDatasetEntries",
        "rekognition:ListUsers",
        "rekognition:SearchUsers",
        "rekognition:DescribeDataset",
        "rekognition:SearchUsersByImage",
        "rekognition:DescribeStreamProcessor",
        "rekognition:DetectCustomLabels",
        "rekognition:ListFaces",
        "rekognition:ListTagsForResource",
        "rekognition:DescribeCollection",
        "rekognition:SearchFaces",
        "rekognition:SearchFacesByImage",
        "rekognition:ListDatasetLabels"
      ],
      "Resource": [
        "arn:aws:rekognition:*:Account_ID:project/*/version/*/*",
        "arn:aws:rekognition:*:Account_ID:streamprocessor/*",
        "arn:aws:rekognition:*:Account_ID:project/*/dataset/*/*",
        "arn:aws:rekognition:*:Account_ID:collection/*"
      ]
    },
    {
      "Sid": "VisualEditor2",
      "Effect": "Allow",
      "Action": [
        "rekognition:ListProjectPolicies",
        "rekognition:DescribeProjectVersions"
      ],
      "Resource": "arn:aws:rekognition:*:Account_ID:project/*/*"
    },
    {
      "Sid": "VisualEditor3",
      "Effect": "Allow",
      "Action": [
        "rekognition:DetectLabels",
        "rekognition:ListCollections",
        "rekognition:GetCelebrityRecognition",
        "rekognition:StartTextDetection",
        "rekognition:GetPersonTracking",
        "rekognition:DetectFaces",
        "rekognition:DetectModerationLabels",
        "rekognition:GetFaceDetection",
        "rekognition:StartLabelDetection",
        "rekognition:RecognizeCelebrities",
        "rekognition:CompareFaces",
        "rekognition:DetectText",
        "rekognition:GetCelebrityInfo",
        "rekognition:GetLabelDetection",
        "rekognition:StartFaceLivenessSession",
        "rekognition:GetTextDetection",
        "rekognition:StartFaceDetection",
        "rekognition:GetContentModeration",
        "rekognition:GetFaceSearch",
        "rekognition:DetectProtectiveEquipment",
        "rekognition:DescribeProjects",
        "rekognition:GetSegmentDetection",
        "rekognition:GetFaceLivenessSessionResults",
        "rekognition:StartPersonTracking"
      ],
      "Resource": "*"
    }
  ]
}

AmazonMediaConvertAccess

Assign this policy to the specified user.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "mediaconvert:CreatePreset",
        "mediaconvert:ListQueues",
        "mediaconvert:DescribeEndpoints",
        "mediaconvert:DisassociateCertificate",
        "mediaconvert:CreateQueue",
        "mediaconvert:ListPresets",
        "mediaconvert:GetPolicy",
        "mediaconvert:PutPolicy",
        "mediaconvert:AssociateCertificate",
        "mediaconvert:ListJobTemplates",
        "mediaconvert:DeletePolicy"
      ],
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "mediaconvert:*",
      "Resource": [
        "arn:aws:mediaconvert:*:Account_ID:jobs/*",
        "arn:aws:mediaconvert:*:Account_ID:jobTemplates/*",
        "arn:aws:mediaconvert:*:Account_ID:presets/*",
        "arn:aws:mediaconvert:*:Account_ID:queues/*"
      ]       [Replace 'Account_ID' with your actual AWS account ID]
    }
  ]
}


AmazonS3FullAccess

AmazonS3FullAccess is an AWS-managed policy that grants full access to Amazon S3 buckets and objects. Assign this policy to both the user and the role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*",
        "s3-object-lambda:*"
      ],
      "Resource": "*"
    }
  ]
}

MediaConvert

This custom policy allows the specified IAM principal to manage media-related services (MediaConvert, MediaPackage, and MediaPackage VOD) and perform Systems Manager actions. Allocate this policy to the user.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "mediaconvert:*",
        "mediapackage:*",
        "mediapackage-vod:*",
        "ssm:*"
      ],
      "Resource": "*"
    }
  ]
}


ComprehendFullAccess

ComprehendFullAccess is an AWS-managed policy that provides full access to Amazon Comprehend functionalities. Allocate this policy to the user. 

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "comprehend:*",
        "s3:ListAllMyBuckets",
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "iam:ListRoles",
        "iam:GetRole"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

ComprehendDataAccessRolePolicy

ComprehendDataAccessRolePolicy is an AWS-managed policy specifically designed for the AWS Comprehend service role. Allocate this policy to the user.   

{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:ListBucket",
      "s3:PutObject"
    ],
    "Resource": [
      "arn:aws:s3:::*Comprehend*",
      "arn:aws:s3:::*comprehend*"
    ]
  }
}

PassRole_MediaConverter

This policy is customized for the role; it allows the users to Pass Roles using AWS indexing (Rekognition). Allocate this policy to the user.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::Account_ID:role/Rolename"
    }   [Update AWS account ID and role name you assigned to the role]
  ]
}

AssumeRole_MediaConverter

This policy is designed to grant the users assigned to that role access to the contents of a private bucket. Allocate this policy to the user. 

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::Account_ID:role/Rolename"
    },
    [
      "Update AWS account ID and role name you assigned to the role"
    ]
  ]
}

AmazonTranscribeFullAccess

The AWS-managed policy "AmazonTranscribeFullAccess" grants comprehensive permissions for conducting all Amazon Transcribe operations. Allocate this policy to the user. 

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "transcribe:*"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::*transcribe*"
      ]
    }
  ]
}

AWSElementalMediaConvertFullAccess

AWSElementalMediaConvertFullAccess is an AWS-managed policy that provides full access to all functionalities of the Amazon Elemental MediaConvert service. If you prefer the AWS standard encoder, you need to include the "AWSElementalMediaConvertFullAccess" policy. Allocate this policy to the role. 

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "mediaconvert:*",
        "s3:ListAllMyBuckets",
        "s3:ListBucket"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "iam:PassedToService": [
            "mediaconvert.amazonaws.com"
          ]
        }
      }
    }
  ]
}


Note: Below are the additional optional policies that may be required. These policies can vary depending on your specific use case and are not mandatory.


AmazonEC2FullAccess

AmazonEC2FullAccess is an AWS-managed policy allowing comprehensive EC2 management. Attach it to IAM identities for full EC2 instance management via AWS Management Console or SDK, facilitating instance launching, configuration, and administration.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "ec2:*",
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "elasticloadbalancing:*",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "cloudwatch:*",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "autoscaling:*",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "iam:CreateServiceLinkedRole",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "iam:AWSServiceName": [
            "autoscaling.amazonaws.com",
            "ec2scheduled.amazonaws.com",
            "elasticloadbalancing.amazonaws.com",
            "spot.amazonaws.com",
            "spotfleet.amazonaws.com",
            "transitgateway.amazonaws.com"
          ]
        }
      }
    }
  ]
}


AWSElementalMediaStoreFullAccess

This policy is an AWS-managed policy that provides full read and write access to all MediaStore APIs. You can attach this policy to your users, groups, and roles to enable the necessary permissions to work with AWS Elemental MediaStore.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "mediastore:*"
      ],
      "Effect": "Allow",
      "Resource": "*",
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "true"
        }
      }
    }
  ]
}


AWSOpsWorksRegisterCLI_OnPremises

AWSOpsWorksRegisterCLI_OnPremises is an AWS-managed policy specifically designed to grant permissions for registering on-premises instances with an AWS OpsWorks stack using the AWS CLI. 

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "opsworks:AssignInstance",
        "opsworks:CreateLayer",
        "opsworks:DeregisterInstance",
        "opsworks:DescribeInstances",
        "opsworks:DescribeStackProvisioningParameters",
        "opsworks:DescribeStacks",
        "opsworks:UnassignInstance"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:CreateGroup",
        "iam:AddUserToGroup"
      ],
      "Resource": [
        "arn:aws:iam::*:group/AWS/OpsWorks/OpsWorks-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:CreateUser",
        "iam:CreateAccessKey"
      ],
      "Resource": [
        "arn:aws:iam::*:user/AWS/OpsWorks/OpsWorks-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:AttachUserPolicy"
      ],
      "Resource": [
        "arn:aws:iam::*:user/AWS/OpsWorks/OpsWorks-*"
      ],
      "Condition": {
        "ArnEquals": {
          "iam:PolicyARN": "arn:aws:iam::aws:policy/AWSOpsWorksInstanceRegistration"
        }
      }
    }
  ]
}


AWSOpsWorksRegisterCLI_EC2

This is an AWS-managed policy. It enables the registration of EC2 instances via the OpsWorks CLI.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "opsworks:AssignInstance",
        "opsworks:CreateLayer",
        "opsworks:DeregisterInstance",
        "opsworks:DescribeInstances",
        "opsworks:DescribeStackProvisioningParameters",
        "opsworks:DescribeStacks",
        "opsworks:UnassignInstance"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}


CloudFront

Exclude this policy when working with a private bucket; it is intended for use with public buckets.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:ListAllMyBuckets",
      "Resource": "arn:aws:s3:::*"
    },
    {
      "Action": [
        "cloudfront:CreateInvalidation",
        "cloudfront:GetDistribution",
        "cloudfront:GetStreamingDistribution",
        "cloudfront:GetDistributionConfig",
        "cloudfront:GetInvalidation",
        "cloudfront:ListInvalidations",
        "cloudfront:ListStreamingDistributions",
        "cloudfront:ListDistributions",
        "cloudfront:CreateDistribution"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}


Kindly refer to the "AWS managed policies for Amazon Rekognition" guide to access more details.


 AWS Trust Policy  

Establish a trust relationship for the role with the user "IAMUsername," enabling it to assume the necessary permissions for accessing content within a private bucket. This is a mandatory step for successful configurations.

  • Within the IAM console, locate the "Roles" section in the navigation pane. From the list of roles, identify and click on the role to which you wish to assign trust permissions to the user. 
  • Under the role properties, locate the "Trust relationships" tab. 
  • Click on "Edit trust policy" to define the users that will be trusted to assume this role.
  • The trust policy document is written in JSON format. Here, you'll specify the user allowed to assume the role.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "mediaconvert.amazonaws.com",
        "AWS": "arn:aws:iam::Account_ID:user/IAMUsername"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}


Setting Up AWS Video Indexer in VIDIZMO

Once you have completed the AWS account setup, created an S3 bucket, and granted the essential permissions, the next step is to log in to the VIDIZMO portal using administrative or manager user credentials. Follow the detailed step-by-step guide "Configuring Video Insights with Amazon Web Services (AWS)" to enable the AWS indexer within the VIDIZMO portal.


Setting Up AWS Storage Provider in VIDIZMO

Within VIDIZMO, our storage option provides various choices, including Amazon S3 Standard and S3 Intelligent-Tiering storage classes for content storage. Additionally, we leverage Amazon CloudFront to deliver content to users across geographically dispersed locations. AWS Elemental MediaConvert is used for transcoding content to ensure a seamless playback experience across multiple screens. VIDIZMO extensively uses AWS Media Services to process and deliver on-demand content in the cloud.


We recommend reading our guide, "Understanding AWS Storage and Encoding in VIDIZMO," for a more comprehensive understanding of how AWS storage and encoding work in VIDIZMO.


To access data stored in Amazon S3, you must configure AWS Encoder as a storage provider within VIDIZMO. Detailed instructions on how to set up the AWS Storage Provider in VIDIZMO are available in our guide, "How to Configure AWS Storage Provider within VIDIZMO."


Automatic Detect and Redact PII in Speech

  1. Navigate to the "Library" page within the portal.
  2. Click on the Add New option, locate it, and click on the "Upload" option.



3. Choose the specific file you wish to redact and detect from your local storage and select the Open option. A notification will appear 



4. After successfully uploading the file, the system will enable playback, and any Personally Identifiable Information (PII) contained within the file will be automatically redacted.


When you enable automatic processing for an AWS Video Indexer, selecting "On" initiates the automatic processing of media files immediately after they are uploaded. This automated process takes into account the detection and redaction settings for Personally Identifiable Information (PII) that you have configured within the AWS indexer.


With automatic processing enabled, the system will automatically identify and redact any instances of PII within speech components extracted from audio or video files.


Detecting and Redacting Speech PII in Studio Space

Alternatively, if you require more customized control redaction and detection options, you can perform these tasks within Studio Space. To familiarize yourself with Studio Space and its capabilities, refer to the "Step-by-Step Guide to Using Redaction Tool in VIDIZMO."


In the following section, we will explore the specific procedures for detecting and redacting speech within Studio Space, providing you with a comprehensive understanding of how to achieve precise and tailored results. 

  1. Access the overflow menu, indicated by three dots, adjacent to the audio or video file.
  2. Choose "Studio Space" or the "Redact" option from the menu.



3. The media file will be available in the Studio Space for additional steps to be taken.


Note: In this particular scenario, we are addressing the use of audio files. It is important to note that the same procedures and steps are applicable when dealing with video files as well.



4. Navigate to the "Auto Detect" option to access the list of available PII detection types. You can tailor your PII detection preferences by selecting or removing specific types.

5. Initiate the automatic detection process by clicking the "Start" button. This action will result in identifying audio segments containing the chosen PII categories from the previous step.



6. The displayed audio segments are displayed on the left-hand panel. If transcription functionality is enabled, corresponding transcriptions appear on the right-hand panel.

7. Opt for the "Redact" option to activate the redaction process. The redacted file will be promptly saved in your library for future reference.



Utilizing Manual Selection and Filters


VIDIZMO offers a user-friendly approach to enable users to redact audio segments by selecting them manually or using the transcription segments. Additionally, VIDIZMO provides a feature to filter and redact specific Personally Identifiable.


  1. Select the desired audio segment that requires redaction or select from the Transcription Segment. Alternatively, you can redact audio segments based on the transcription.
  2. If you need to redact specific Personally Identifiable Information (PII) and are dealing with lengthy audio content, VIDIZMO offers a filtering option. Click on the Filter Option.
  3. Select the PII Types to Redact. Enable the checkboxes for the PII types you wish to find and redact.
  4. Based on your selected PII types, the filter will display the respective audio segments on the left pane.
  5. Once you have selected the audio segment(s) or applied filters, proceed with the redaction. Select the audio segment you want to redact.



6. Click on the Redact option.



7. The redacted file can subsequently be published and will be accessible within the library. 



Manually Redacting PII by Word

In the Vidizmo platform, users can effectively redact personally identifiable information (PII) by word from audio transcriptions using a straightforward process. Follow these steps:


  1. Navigate to the Transcription Segment within the search bar and enter the specific PII word you intend to redact.
  2. Locate and click on the "Redaction" button available in the transcription pane.



3. The identified PII word will be promptly redacted within the sentence.



For a comprehensive redaction of an entire sentence, adhere to these steps:

  1. Access the overflow menu and choose the sentence option to focus on the sentence segment you want to redact.
  2. Select the desired sentence segment.
  3. Initiate the redaction process by clicking the designated Redact button.


This action will effectively redact the entire sentence.



Detecting and Redacting Speech PII via Process Option

In VIDIZMO, you can submit your content for transcoding and the generation of video insights based on the configurations established by the Portal Administrator.


To initiate this process, follow these steps:


1. Click on the three dots icon to access the overflow menu associated with the specific media file.


2. From the overflow menu, select the Process option.



3. A process dialogue box will appear, presenting you with the following options:

  • Select the Transcode option by checking the corresponding checkbox to perform transcoding on the media.
  • Likewise, choose the Generate AI Insights option by checking its checkbox to generate AI-driven insights for the media content.


   Note: In our particular scenario, both options are selected, as we are focused on detecting and redacting Personally Identifiable Information (PII) from spoken content. 


When you opt to generate AI Insights, additional choices will be presented:

  • Within the "Detection Type" field, you will find a dropdown menu. Here, you can select the specific type of PII you want to detect. This menu also provides options for various detection types, including Transcription/cc, Keyword, Face recognition, Tags, and Label detection.
  • In the "Redaction Type" field, a dropdown menu, you can select the type of PII you wish to redact.
  • Select Language


4. Once selected, click on the "Start" button. The system will then automatically initiate the chosen media file's redaction and PII detection process.



These steps allow you to efficiently utilize VIDIZMO's capabilities for transcoding and AI-driven insights generation while ensuring your content's security and compliance by detecting and redacting sensitive PII information from spoken content.