Following the Azure Video Indexer indexing Insights I was able to produce from a podcast on a weekly schedule previously, I decided to go further into how this Insights JSON data can be utilised in a helpful way following my previous Summary (discussed here), such as using Azure Media Services to clip certain parts of the audio.
With the Insights JSON data we can do things such as:
Organise the entire podcast into topics (in this case, discussions of abstract topics and discussions about people) 📚
Clip these discovered topics into separate topic clips by where they appear in the podcast timeline 📎
Merge (also known as muxing) clips of the same topic into individual files 🔗
Listen to the separate merged topic audio files externally (discussed in next blog post with a Google Actions on a Google Nest Hub) 📡🎵
The first 3 points can be visualised as the following:
In this post, I talk about about how first 3 points above can be achieved where we already have the insights JSON, with Azure Media Services, an Event Triggered Azure Function, an Azure Storage Table and Azure Blob Storage.
This post's info will be useful in cases where we don't wish to rely on using the Video Indexer Insights Widget and wish to extract and clip media programmatically.
Organising the podcast topics
After retrieving the index result JSON (either using the Video Indexer APIs or downloading the insights from our Video Indexer account), we can extract the discovered topics from the 'summarizedInsights' object and extract them based on our needs:
From the above code, we do the following:
->Only collect 'Topics' mentioned in the audio where the Indexer has at least 60 percent or more confidence that it heard that particular being talked about. And if the duration of the occurrence of that topic is more than 15s, then collect it.
->Only collect audio sections where 'People' of interest are talked about where the Indexer is at least 85 percent confident or more and where the person of interest has an external Reference (a notable person). For audio sections of people, add extra buffer of 10s before the Person is identified by the Indexer and add 50s to where the Indexer marks as the end of discussing that Person. This buffer time is to help us pick extra audio from the podcast host in case the Indexer was too narrow in selecting where a discussion about a person starts and ends.
->Only add up to 3 occurrences of discussions about a specific person in the collection with the desired Start and End times
Clip and merge similar topics into individual files
With the list of chosen result Topics and their known occurrence timestamps, we can add each occurrence into a JobInputSequence's list of Inputs. Each JobInputSequence acts as the JobInput for each topic we want to generate with Azure Media Services:
The MediaServiceControl class defined below is a custom helper class I made for wrapping the code that manages the creation of a Media Services Client and creating the Job Transform. The Transform used is a Built-in preset that creates good quality AAC audio at 192kbps. Note that the AACGoodQualityAudio preset in fact creates an mp4 file with no video track and audio only:
Ultimately, as each AMS JobOutput is encoded, our Azure Media Services instance creates the output as a container in blob storage that contains the new audio file. The new audio file is not publicly accessible by default and that is what we discuss next to solve.
By default, the outputs blobs made by Azure Media Services are private. To read them externally, Shared Access Signatures can be used
Consume the topic audio files externally
When the output files are created (ie when AMS finishes each encode output), we need to store a usable public URL for each topic that our consumer/client will be able use to listen to the audio. In my case, my Azure Media Services Instance is wired in Azure to send an Event to a separate Azure Function (shown below as the Function 'WriteSasUrlToStorage' ) whenever an encode is finished. (I wrote how this wiring can be done from the Azure Portal here under the Azure Media Services Instance's Event settings).
Each encode output finished triggers one Event. Each encode output represents one topic/person of interest. The Azure Function will write to an existing empty Azure Storage Table, with the resulting topic name and the publicly accessible blob Url to the audio as a Shared Access Signature Url that has Read Permissions for 8 hours. This Event Triggered Function looks like this:
Now that we have a place that can store the Urls of our newly clipped audio files, it means we are able to listen to them from different places (for up to 8 hours after they are made). But first we will need a way for our consumers to request what Urls they can have at that time. This can be done by an HTTP Triggered Function that will read all the records in the Azure Storage Table that we wrote the SAS Urls and return these to the requester:
Final note – it might also be useful to find a way to clear out or delete the records in the Azure Storage Table as the SASUrls expire for the clips (not covered in this post)
Ok that's a lot to cover in one post!!!! In the next sister post, we hop on over to Google Action Builder and create our Action that can be triggered from a Google Nest Hub to call the above Azure HTTP Trigger and then play it back.