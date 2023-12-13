Are you ready to drive more awareness to your brand? Consider becoming a sponsor of the AI ​​Impact Tour. Learn more about opportunities here,

Chet Kapoor, CEO of DataStax, a company that offers cloud databases based on the open source Apache Cassandra, claimed at a conference in Silicon Valley yesterday that Cassandra is “the best database for Gen AI.”

Kapoor’s comments, speaking on stage with 700 attendees at the Linux Foundation event Dev.AI, come at a time when there is an all-out race by new startups and incumbents to seize the leadership reins in the fast-growing sector. General A.I. This is also the time when many enterprise brands that use technology are deciding which technology provider they will use.

While a lot of attention has been paid to the competition among large-language model providers such as OpenAI, Anthropic, Google (Gemini) and Meta (Llama), another highly competitive area is the databases that end-user companies use to store and access the data used. To recover. For LLM applications.

During his keynote, Kapoor cited several reasons why the DataStax Cassandra database is performing well compared to others. Cassandra is already one of the most trusted operational databases widely used by enterprise companies, it actually boasts some of the earliest customer cases of companies deploying Generative AI at scale, and from Generative AI Its technology prowess is leading it in related key areas. Key rivals like MongoDB and Pinecone, Kapoor said.

vb event

AI Impact Tour

Join the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

learn more

It’s also worth noting that DataStax is considering going public soon, and Kapoor is interested in making some noise. In June last year, DataStax had raised $115 million at a valuation of $1.6 billion. The company has not released any financial data, but Kapoor acknowledged in an interview that DataStax is on the shortlist of companies the bank would like to take public next year.

These are the reasons behind Kapoor’s rise:

Reason 1: Cassandra is already one of the most widely used and trusted operational databases

Kapoor’s comments come at a time when big cloud companies like Microsoft and Amazon are emphasizing that their cloud offerings, which include integration with their own databases, are best placed to perform generative AI tasks. Are. They are encouraging users to consolidate across their platform, and aggressively removing barriers that have prevented users from doing so in the past, including complex extract, transform, and load (ETL) functions. Who has kept the data secret.

However, those cloud companies have offered users lots of individual databases over the past decade in an effort to provide customers with specialized solutions for each use case, Kapoor said. Kapoor joked, “There’s one for going to the bathroom in the morning, and then one for the afternoon, and one for the evening.” But generative AI has caught those cloud companies by surprise: Enterprise CIOs now want to integrate their data into a single database to allow Gen AI apps to query the data more easily and efficiently, Kapoor said.

And here Cassandra has an advantage because it is one of the more popular “operational” databases, whereas most of the databases from Microsoft and Amazon are focused on analytical workloads, primarily for business intelligence applications. Although these can be used for operational workloads for generic AI applications, it will become very expensive, as they are not optimized for this.

DataStax has spent a lot of time focusing on price for performance, for example, Kapoor and Chief Product Officer Ed Anuff pointed out in a follow-up interview with VentureBeat after Kapoor’s comments. As a result, Cassandra is most popular for Fortune 500 companies that distribute large-scale data. Anuff said, Cassandra claims 90 percent of those companies as customers. For example, Netflix uses it for its movie metadata, FedEx uses it for tracking packages, Apple uses it for its iTunes, iMessage, and iCloud app data, and retailers like Home Depot use it for their web. Use for sites.

As larger companies build new AI apps, they are comfortable with their track record with Cassandra, Anuff said, and so are likely to consolidate around them. Furthermore, Microsoft and Amazon have realized that they need to provide options to customers. For example, Amazon offers a competing operational database, DynamoDB, but it also offers users the ability to easily use Cassandra within their cloud clusters. In this way, Cassandra also offers customers a way to avoid lock-in with a particular cloud vendor, Anuff said.

Reason 2: DataStax has customers that actually “deploy” Generator AI

Kapoor cited nine companies that have deployed generative AI on DataStax’s Astra DB database, a cloud database-as-a-service based on Cassandra. While many enterprise companies are experimenting like crazy with generative AI, few have turned to actual production on a large scale due to concerns like security and reliability. In fact, tensions in the industry are clearly heightened: the potential of generative AI may be huge, but most vendors of the technology agree that they are waiting for customers to start spending real revenue, which Next year will come when companies will step up production. Serious way.

DataStax customers with LLM deployed include:

Physics Wala, an Indian online education platform, serves 6 million users with multi-modal (text, images and audio) large language model-driven bots. The company went from deployment to deployment in 55 days, Kapoor said.

SkyPoint, a Portland-based general AI healthcare provider for seniors and care providers, uses LLM to provide personalized treatments and interactions. Kapoor said Astra DB is helping doctors free up 10+ hours a week to focus on patient care.

Others include Hey You, Reel Star, Hey, Hornet, Restworld, SourceStable and Consider.

Kapoor said these companies are part of a fast-growing segment of small and medium-sized businesses (SMBs) that are able to move more quickly, while enterprise companies have to comply with more regulations and address security issues in generic AI. Reasons to avoid it include its tendency to hallucinate.

Reason 3: DataStax’s Cassandra technical skills outperform others on key LLM benchmarks

Kapur said Astra’s vector search offerings from DataStax perform better and are more relevant than competitors. Vector search is a key requirement for generic AI databases, as this is how an AI application translates a user’s query into natural language to search for text or other data in a company’s database related to that query. DataStax benchmarked its JVector vector search technology against Pinecone, a major vector database competitor, and found that JVector results are 16 percent more relevant than Pinecone’s. Kapoor said that’s a big difference, given how important it is to get the right answer. A third-party vendor will release a full performance benchmarking report in a few days, Kapoor said, but he showed a slide (below) of some results. Benchmarking also revealed that DataStax has better throughput or the ability to process more transaction requests per unit time than both Pinecone and MongoDB.

He said Astra DB is the only database that can serve vectorized data with zero latency, including indexing, ingestion and querying.

Kapoor: “This Gen AI wave is going to be faster than anything we’ve seen”

Kapoor said the adoption of GenAI will be much faster than previous technology revolutions, as it builds on important foundations such as web, mobile and cloud technologies that are already in place.

He said the “real fun” will start next year with more transformational and revenue-oriented use cases, including people using LLMs as “agents.” These agents allow LLMs to do more than just answer questions and make recommendations, he said, because they can orchestrate more complex tasks. Anuff said material revenue from generic AI deployments will appear in the second quarter of next year, with “much larger” figures to emerge by the end of the year when use cases in sectors such as retail and travel increase.

While Kapoor and Anuff were eager to point out the advantages of Cassandra, they acknowledged that the broader database field is seeing growth from generative AI. Enuff said, vector database searches that general AI apps use up to 8 times the storage and about 10 times the compute compared to other database workloads. “That’s why you see all the cloud providers and all the database providers wanting that business,” he said. “If AI applications become a big deal, they are going to be the primary growth driver for both private and public database companies for the next five years.”

VentureBeat’s mission To become a digital town square for technology decision makers to gain knowledge about transformative enterprise technology and transactions. Find our briefing.

Source: venturebeat.com