As a leading AI and video search company, we are often asked how our solution compares with popular AI models. By benchmarking our technology against the latest systems, we can better understand its strengths, identify areas for improvement, and shape our future developments.
Over the past few months, our research team rigorously tested MXT-1.5, our multimodal AI model, using the VideoMME dataset, a respected benchmark in the video AI community. After thorough evaluation, we’re excited to share some compelling results.
MXT-1.5 was designed to solve complex video understanding problems. It helps content creators quickly find specific moments in vast audiovisual libraries, enabling them to create more content faster.
What makes MXT-1.5 stand out is its unique approach: instead of relying on a single system, it combines multiple expert models, most of which are non-generative. It uses a three-level hierarchical indexing framework that works as follows:
This structure, while different from what other AI providers use, means MXT-1.5 can analyze even the most complex video content with exceptional accuracy.
The result of the benchmark testing was significant. MXT-1.5 performed well overall, and even outperformed major models such as GPT-4o, Google Gemini 1.5 Pro, and Nvidia VILA 1.5. It did especially well in processing long-form videos (30 minutes or more), a known challenge in AI.
“This evaluation confirms that combining generative models with expert AI systems creates a more robust technology. Our approach not only delivers more detailed results, but also improves explainability.”
Dr. Yannis Tevissen, Head of Science, Moments Lab.
These results confirm our belief that top-tier video understanding requires a combination of specialized models, particularly non-generative ones. MXT-1.5’s three-level hierarchical indexing is a key advantage, enabling it to outperform leading GenAI models, especially in the critical task of processing long-form videos. This indexing approach is especially effective in categories such as sports and television videos, showcasing our industry-specific AI training, the impact of recent improvements, such as sequence generation, and solidifying our leadership in these areas.
We’re naturally thrilled with MXT-1.5’s performance, however this is just the beginning. Our team is already diving deeper into the results and will conduct similar evaluations of the quality of our semantic search engine against other open-source search models.
The work our team has put into MXT-1.5’s capabilities serves as a firm foundation for what’s to come. We’re working on a groundbreaking new tool that will help content producers build rough cuts even faster.
Dive into more details about our MXT-1.5 benchmarking here.