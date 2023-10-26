VentureBeat Presents: AI Unleashed – An Exclusive Executive Program for Enterprise Data Leaders. Network and learn from industry peers, learn more

San Francisco-based Datasaur, an AI startup specializing in text and audio labeling for AI projects, today announced the launch of the LLM Lab, which helps teams build and train custom large language model applications like ChatGPT. A comprehensive one-stop shop for.

Available for both cloud and on-premises deployment, the Lab gives enterprises a starting point to build their own in-house custom generator AI applications without worrying about the business and data privacy risks that often arise from third-party services. It also gives teams more control over their projects.

“We have created a tool that holistically addresses the most common pain points, supports rapidly evolving best practices, and applies our signature design philosophy to simplify and streamline the process. Over the past year, we have built and distributed custom models for our own internal use and for our customers, and from that experience, we were able to create a scalable, easy-to-use LLM product,” said Evan Lee, CEO and Founder of Datasaur. said in a statement.

What does the Datasaur LLM Lab bring to the table

Since its launch in 2019, Datasaur has helped enterprise teams execute data labeling for AI and NLP by continuously working on and developing a comprehensive data annotation platform. Now, that work is coming to an end in the LLM lab.

“This tool extends beyond Datasaur’s existing offerings, which primarily focus on traditional natural language processing (NLP) methods such as entity recognition and text classification,” Lee wrote in an email to VentureBeat. “LLM is a powerful new evolution of LLM technology and we look forward to continuing to serve it as the industry’s turnkey solution for all text, document and audio-related AI applications.”

In its current form, this offering gives an all-in-one interface to handle various aspects of building an LLM application, from internal data ingestion, data preparation, retrieval augmented generation (RAG), embedded model selection and similarity search optimization . Increasing LLM’s responses and optimizing server costs. Lee says the entire work is executed around the principles of modularity, composability, simplicity, and maintainability.

“It (the approach) handles various text embeddings, vector databases and foundation models efficiently. The LLM field is constantly changing and it is important to create a technology-agnostic platform that allows users to swap in and out different technologies as they strive to develop the best possible solutions for their own use cases, “They said.

To get started with LLM Lab, users need to select the foundation model of their choice and update its associated settings/configuration (temperature, maximum length, etc.).

Supported models include Meta’s Llama 2, Abu Dhabi’s Falcon from the Technology Innovation Institute and Anthropic’s Cloud, as well as Pinecone for vector databases.

Next, they need to choose a prompt template to sample and test out the prompts to see what works best for what they’re looking for. They can also upload documents for RAG.

Once the above steps are completed, they must finalize the optimal configuration for the quality/performance tradeoff and deploy the application. Later, as it gets used, they can evaluate prompt/completeness pairs through rating/ranking projects and feed the pairs back into the model for fine-tuning/reinforcement learning through human feedback (RLHF). Can.

breaking technical barriers

Although Lee did not disclose how many companies are testing the new LLM Lab, he did say that the response so far has been positive.

Michelle Handaka, founder and CEO of GLAIR.ai, one of the company’s clients, said the Lab bridges the communication gap between engineering and non-engineering teams and breaks down technical barriers in developing LLM applications – making them easier to use. Can enhance the development process.

So far, Datasaur has helped enterprises in critical sectors such as financial, legal, and healthcare transform raw unstructured data into valuable ML datasets. Some of the big names currently working with the company are Qualtrics, Ontra, Consensus, LegalTech, and Von Vobesser y Sierra.

“We are well positioned to support forward-thinking industry leaders… and are on track to grow 5x revenue by 2024,” Lee stressed.

What’s next for Datasaur and its LLM Lab?

In the coming year, the company plans to build labs and invest more in LLM development at the enterprise level.

Users of the product will be able to save their most successful configurations and signals and share the findings with colleagues.

The Lab will also support new and emerging foundation models.

Overall, the product is expected to have a significant impact given the growing need for custom and privacy-focused LLM applications. In the recent LLM survey report for 2023, approximately 62% of respondents indicated that they are using LLM apps (like ChatGPT and Github Copilot) for at least one use case, such as chatbots, customer support, and coding.

However, with companies restricting employee access to general-purpose models due to privacy concerns, the focus has largely shifted toward custom internal solutions, built for privacy, security, and regulatory requirements. .

