What's cooking (Q2'23 edition)
What are some of the things we at Dubverse are working on to enhance our AI system towards our vision to reduce the language barrier for communication.
Since the last couple of months, Generative AI stream has taken some huge strides in terms of both development and demand. Since ChatGPT has arrived, Generative AI has been talk of the town. This evolution has changed the way products are built and developed, and this has changed how are are thinking of developing our systems as well.
At a high-level, Dubverse tools allows a communication artifact to be converted from one language to the other, be it videos/text/audio(coming soon!), via subtitling or audio generation.
Communication, be it realtime or through videos is multimodal, and hence is the data that is created and consumed at Dubverse. Majority usage of Dubverse tools has been around Videos, a medium comprising of (audio) + (text/content) + (vision/video), and working on making all these modalities sing together in a new language is a hard task.
Let’s take video conversion from one language to the other as an example. A short sentence in English might require a lot more words in French to convey the same message. How can we fit this french text’s audio generated in the video that spans english sentence time? Or, how can we align subtitling based on visual cues, e.g. e-learning video classes where blackboard is used? Another problem, this one is a bit obscure, language is a very unique and a very diverse method of communication. Sentences can be translated in multiple ways, every word has synonyms and every word has a unique imprint/representation in every human. Simply translating a sentence assuming one perspective might not yield the optimum result.
With the emergence of GenerativeAI techniques, we can possibly solve some of these issues. For example, using some english words in a Hindi sentence (Hinglish) to fit the audio from a German source, we might just have to change the prompt for text generation based on the constraints. If someone prefers to consume the video in a voice with lower intensity or higher utterance duration, a simple scroll bar should enable it, or even better, a single text request to the system should generate the right voice. All this could be enabled, if there is a human interface possible with the models, which ChatGPT has showcases so splendidly.
We at Dubverse are working with these emerging GenerativeAI technologies to enable communication across languages by generating content that is immersive and personal, and to achieve this, need to cross some of the hurdles mentioned earlier.
If there are ideas around solving some of these problems, do comment / share, and if you want to get your hands dirty and get to the drawing board and figure out how to crack these problems, we are always up for collaborations.
Join our Discord community to get the scoop on the latest in Audio Generative AI!