How Multimodal AI Opens New Possibilities in Media Workflows

Get up to speed on generative and multimodal AI, and learn where this technology is making the biggest impact in media production workflows.

In just a few years, Artificial Intelligence (AI) has emerged as a force for transformation in the media industry. From content acquisition and creation to metadata generation and delivery, broadcasters are exploring ways to apply AI-based systems to a wide range of tasks. But which of these applications are most efficient for value creation and ROI? Our goal is to share our knowledge and experience to help companies better understand AI’s benefits and the integration process. Let’s take a look at how AI is transforming media asset management (MAM) and production workflows today.

AI Gains Ground Thanks to Cloud

The democratization of AI in MAM is facilitated by the move from on-prem workflows to cloud-based ones. Giants of cloud platforms such as Amazon Web Services and Google have been integrating generic AI models into their services for a while, starting with speech-to-text conversion and facial recognition. Even though this approach contains some shortcomings when it comes to AI pipelines, it is the transition to cloud itself that is revolutionary. With cloud-based platforms, we can harness more processing power, which is key to running complex algorithms. This, in turn, enables AI vendors to develop and propose the newest forms of generative and multimodal AI.

Demystifying the Different Types of AI

Generative AI (GenAI) is a specific application of machine learning that uses deep-learning models to generate high-quality text, images, and other content based on their training data. It came into public focus with consumer-centric tools such as ChatGPT and Midjourney. Some examples of GenAI in content creation include automated video editing, scriptwriting, and content personalization. It is safe to say that GenAI is already having a strong impact on broadcast operations. The McKinsey Global Institute even projects that by 2050, half of all knowledge tasks will be automated by GenAI. However, if not built and trained with a specific industry in mind, GenAI may prove to lack context or consistency when it comes to specific content.

Probably the most well known and used type of AI system until the recent AI boom was unimodal AI. Unimodal AI only processes one type of data (or modality), such as text or images, and generates content in the same data form. Multimodal AI on the other hand, is designed to mimic human perception. Multimodalities could be compared to how humans receive information through multiple senses, such as sight, touch or sound. This type of AI can ingest and process multiple data sources, including video, still images, speech, sound, and text.

Researched by tech giants like Google and Microsoft, multimodal AI is now seen as the gold standard. However, multimodal AI’s goal is to achieve a more detailed and nuanced understanding of media content and is therefore much more complex to execute.

At Moments Lab, we were early adopters of both generative and multimodal AI. MXT, our AI engine, is specifically trained on news, entertainment, and sports data, with the sole purpose of solving the challenge of analyzing, indexing, and searching through media content. And just like with human perception, with AI, context is everything.

Applications of AI in Broadcast Workflows

AI offers great opportunities for efficiency and enhancing productivity in broadcast workflows, for example:‍

Accelerated Production: AI excels in organizing decades of archived material, creating actionable datasets, and improving searchability, which in turn increases news production efficiency.
Enhanced Content Analysis: Specialized multimodal AI engines provide superior content analysis solutions, ensuring true context and consistency in media asset indexing.
Detailed Indexing: AI can generate human-like descriptions of videos, complete with scene-by-scene timecode metadata. MXT-1.5 even detects and highlights the most compelling sound bites in a video and automatically breaks down footage into editorial sequences.‍
Enhanced Semantic Search: Well-indexed content allows for a semantic search engine, which makes content retrieval more efficient. A specialized Media Hub system, like the one by Moments Lab, enables users to find the specific shots they’re looking for in seconds.

French broadcaster TF1 Group, a Moments Lab Research partner, uses the Cloud Media Hub and Live Asset Manager products in its operations:

"Moments Lab’s AI models provide us with an efficient way to index content, generate tags, and create content summaries. This opens up opportunities to address important use cases for us, including the ability to improve searchability within our archives, especially for news content, allowing our journalists to increase the efficiency of news production."

Olivier Penin, Director of Innovation.

As AI continues to evolve, its applications in the broadcast industry will only expand, presenting new opportunities for innovation and growth. Despite the immense potential to transform content workflows, implementing AI demands a significant investment of both time and money. To ensure successful AI integration, media companies need to establish clear business requirements and set measurable objectives – a topic we will address in our next article.

Black and purple cover of Moments Lab's eBook: 'A Quick Guide to AI Video Discovery'

‍

How Multimodal AI Opens New Possibilities in Media Workflows

Get up to speed on generative and multimodal AI, and learn where this technology is making the biggest impact in media production workflows.

AI Gains Ground Thanks to Cloud

Demystifying the Different Types of AI

Applications of AI in Broadcast Workflows

Newsletter

Information

Related reads

Request a demo and see the Discovery Agent in action.