Microsoft AI Research introduces SIGMA: an open source research platform that enables research and innovation at the intersection of mixed reality and AI

Recent breakthroughs in generative AI and massive language, vision, and multimodal models can provide a foundation for open knowledge, inference, and generation capabilities, enabling open task assistance scenarios. The ability to create relevant instructions and content is just the beginning of what is needed to build AI systems that work with humans in the real world. These include mixed reality task assistants, interactive robots, intelligent manufacturing floors, autonomous vehicles and much more.

Artificial intelligence systems must continuously perceive and reason in a multimodal manner about their environment in order to work seamlessly with humans in the real world. This criterion goes beyond object detection and tracking. For physical teamwork to be successful, everyone involved must be aware of the possible functions of the objects, their relationships to one another and their spatial limitations, and how these factors change over time.

These systems must be able to reason not only about the physical world, but also about humans. Judgments about real-time cognitive states and social norms of collaborative behavior should be included in these considerations, in addition to lower-level judgments about posture, voice, and actions.

Using a combination of mixed reality and artificial intelligence technologies, such as large language and image models, Microsoft Research introduces SIGMA. This interactive program can use HoloLens 2 to guide users through procedural tasks. A large language model such as GPT-4 or a set of manually defined stages in a task library can be used to dynamically create tasks. When a user asks SIGMA an open-ended question during the interaction, the system can leverage its rich language model to provide an answer. Additionally, SIGMA can locate and highlight task-relevant objects in the user’s field of vision using vision models such as Detic and SEEM.

Several design decisions support these research goals. An example of implementing the system is a client-server architecture. The HoloLens 2 device runs a lightweight client application that transmits multiple multimodal data streams to a more powerful desktop server. These streams include RGB (red, green, and blue), depth, audio, head, hand, and eye tracking information. Client apps receive data and instructions from the desktop server to display content on the device that performs the application’s basic functionality. By using this design, researchers can overcome the headset’s current computational limitations and open the door to possibilities for extending the program to additional mixed reality devices.

The open source Platform for Location Intelligence (psi) architecture is the foundation for SIGMA and enables the development and research of multimodal integrative AI systems. The underlying \psi framework provides a powerful streaming and logging infrastructure that also enables rapid prototyping. The framework’s data rendering infrastructure enables data-driven development and optimization at the application level. Finally, Platform for Location Intelligence Studio provides extensive support for visualization, debugging, optimization, and maintenance.

Although SIGMA’s current functionality lacks sophistication, it serves as a foundation for future research on the convergence of mixed reality and artificial intelligence. Many research topics, particularly perception, can and have been studied using collected data sets. These problems range from computer vision to speech recognition.

As an example of Microsoft’s continued commitment to this area, SIGMA is a research platform. It is representative of the company’s efforts to explore novel artificial intelligence and mixed reality technologies. Dynamics 365 Guides is another enterprise-grade mixed reality solution that Microsoft makes available to frontline workers. Frontline workers receive step-by-step procedural support and relevant workflow information with Copilot in Dynamics 365 Guides, which customers are currently using in private preview. AI and mixed reality work together to make this possible. Business users can benefit greatly from Dynamics 365 Guides, a feature-rich tool designed for frontline workers performing difficult operations.

By releasing the system, the researchers hope to ease other researchers’ burdens associated with the basic technical tasks of building a full-stack interactive application, allowing them to move directly to the exciting new frontiers of their field.

Visit the details And Project. All credit for this research goes to the researchers of this project. Also don’t forget to follow us Twitter. Join our… Telegram channel, Discord channelAnd LinkedIn Grupp.

If you like our work, you will love ours Newsletter..

Don’t forget to join our 41k+ ML SubReddit

Dhanshree Shenwai is a Computer Science Engineer with good experience in FinTech companies in Finance, Cards & Payments and Banking with keen interest in applications of AI. She is passionate about exploring new technologies and advancements in today’s evolving world to make everyone’s lives easier.

Source link