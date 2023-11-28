Are you ready to drive more awareness to your brand? Consider becoming a sponsor of the AI ​​Impact Tour. Learn more about opportunities here,

Spending late nights with a newborn baby may yield unexpected successes. Such was the case for OtherSideAI developer Josh Bickett, who came up with the idea for a groundbreaking new “self-operating computer framework” while feeding his daughter in the middle of the night.

As Bickett explained to VentureBeat, “I’m really enjoying the time with my daughter, who is now four weeks old, and I’ve learned a lot of new lessons in fatherhood and everything else. But I also had some time, and the idea came to my mind because I watched various demos of GPT-4 Vision. “The thing we’re working on now could actually happen with GPT-4 vision.”

Holding his daughter in one hand, Bickett created the basic outline on his computer. “I just got the initial implementation… it’s not very good at getting the mouse to click the right way. But what we are doing is defining the problem: we have to figure out how to operate the computer.

When Matt Schumer, co-founder and CEO of OthersideAI, saw the new framework, he recognized its tremendous potential. As Schumer told VentureBeat, “This is a milestone on the way to becoming the equivalent of a self-driving car without computers. Now we have sensors. We have LIDAR systems. Next, we build intelligence.

An AI that decides where and what to click on your PC

As Bickett described it, the framework “lets the AI ​​control the mouse where it clicks and all the keyboards are essentially triggered. It’s like an agent like AutoGPT, except it’s not text based. .It’s based on vision so it takes a screenshot of the computer and then it makes decisions about mouse clicks and keyboards, just like a human does.”

Schumer detailed how this framework represents a major advancement compared to previous approaches that relied solely on APIs.

“A lot of the things that people do on computers, well, you can’t really do with APIs, which is why a lot of other people are dealing with this problem, [when] They want to make an agent. They built it on top of the publicly available API for this service, but it doesn’t extend to everything. As Schumer stressed, “If you really want to solve something that’s autonomous [and] Can really help us or get more work done. You have to let it work like an individual because the world is made for people.”

The framework takes screenshots as input and outputs mouse clicks and keyboard commands, just like a human does. But as both Bickett and Schumer acknowledged, the real potential lies not in the lightweight framework, but in the advanced computer vision and reasoning models that can be plugged into it. “The framework will be exactly plug and play, you just plug in a better model and it gets better,” Bickett said.

How AI agents will change computing as we know it

Asked by VentureBeat about the future implications, Schumer offered a bold vision: “Once this thing gets reliable enough, it’s going to be your computer, it’s going to be your interface to the digital world. Is going to be made.”

With the Self-Operating Computer framework, advanced AI models can learn to handle all computer interactions through only conversational commands.

As Schumer predicted, a variety of specialized computer agent models are likely to emerge to handle different tasks.

Some people may focus on speed for simple tasks, while others excel at complex reasoning. Models may also differ for enterprise versus consumer use cases. But according to Schumer, the broader goal is to develop agents that enable a world “where people can say, I’d hate to do this.” Now, I don’t have to do this anymore. “And we want to make it so easy that anyone who can barely use a computer can do it.”

Open source to promote development

Bickett believes the open source nature of the framework will further accelerate progress, allowing developers around the world to experiment with new applications. Schumer agreed that “there is room for a lot of players in this space…a range of model providers.” A range of applications. And there’s going to be a lot of room in this industry to build really big businesses.

While Bickett and Schumer see immense potential, realizing the vision of truly intelligent computer agents will require immense resources and continued innovation.

To that end, AI research company Imbu, formerly known as Intelligentsia, recently secured a $150 million partnership with Dell to build a powerful AI training platform.

The massive cluster of approximately 10,000 Nvidia H100 GPUs will allow Imbue to develop new foundation models optimized specifically for the reasoning capabilities, which are the main focus of his work. As Kangjun Qiu, co-founder and CEO of Imbue, said, “Logic is the main barrier to agents that actually work well.”

Imbue believes that strong reasoning is paramount to developing truly effective AI agents, as it allows machines to deal with uncertainty, adopt perspectives, gather new information, make complex decisions, and grapple with the complexities of the real world. Gives – critical capabilities to function autonomously beyond narrow tasks.

The company adopts a “full stack” methodology encompassing optimized foundation model training, experimental agent and interface prototyping, robust tool-building, and theoretical AI research – aimed at building both practical and fundamental understanding of deep learning with the goal of engineering AI. Have to take it forward. Human-level reasoning and eventually artificial general intelligence..

While the self-operating computer framework is only the first step, Bickett and Schumer see it ushering in a new era where sophisticated AI agents completely replace human computing interfaces. Paradigm-changing ideas may keep coming up late at night, but it will require focused attention to realize the full vision of computers that work for anyone, anywhere – only through plain language.

