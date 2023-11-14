Giscard is a French startup working on an open-source testing framework for large language models. It can alert developers to the risks of biases, security flaws, and the potential for models to generate harmful or toxic content.

While there is a lot of hype around AI models, ML testing systems will also soon become a hot topic as the AI ​​Act in the EU and regulation in other countries is about to be implemented. Companies developing AI models will have to prove that they comply with a set of rules and mitigate risks so they don’t have to pay huge fines.

Giscard is an AI startup that embraces regulation and is one of the first examples of developer tools that specifically focus on testing in a more efficient way.

“I previously worked at Dataiku, specifically on NLP model integration. And I could see that, when I was in charge of testing, there were both things that did not work well when you wanted to apply them in practical cases, and made it very difficult to compare the performance of suppliers between each other. “, Alex Combesi, co-founder and CEO of Giscard, told me.

There are three components behind Giscard’s testing framework. First, the company has released an open-source Python library that can be integrated into an LLM project – and recovery-augmented generation (RAG) projects in particular. It is already quite popular on GitHub and is compatible with other tools in the ML ecosystem, such as Hugging Faces, MLFlow, Weights and Biases, PyTorch, Tensorflow, and Langchain.

After initial setup, Giscard helps you prepare a test suite that will be regularly used on your models. Those tests involve a wide range of issues such as performance, hallucinations, misinformation, non-factual output, bias, data leakage, harmful content creation and accelerated injection.

“And there are many aspects to it: You have the performance aspect, which is the first thing on a data scientist’s mind. But more and more, you have the ethical aspect, from a brand image standpoint and now from a regulatory standpoint,” Combesi said.

Developers can then integrate the tests into a continuous integration and continuous delivery (CI/CD) pipeline so that the tests can be run every time there is a new iteration on the code base. For example, if something is wrong, developers receive a scan report on their GitHub repository.

The tests are customized based on the end use case of the model. Companies working on RAG can provide Giscard access to vector databases and knowledge repositories so that the test suite is as relevant as possible. For example, if you are building a chatbot that can give you information about climate change based on the most recent report from the IPCC and use LLM from OpenAI, the Giscard test will check whether the model predicts climate change. Can generate misinformation about change, which contradicts itself. , etc.

Giscard’s second product is an AI Quality Hub that helps you debug a large language model and compare it to other models. This quality center is part of Giscard’s premium offering. In the future, the startup hopes to be able to produce documentation that proves a model is compliant with regulation.

“We’re starting to sell the AI ​​Quality Hub to companies like Banque de France and L’Oréal – to help them debug and find the causes of errors. In the future, this is where we are going to put all the regulatory features,” Combesi said.

The name of the company’s third product is LLMon. It is a real-time monitoring tool that can evaluate LLM answers for the most common issues (toxicity, hallucinations, fact checking…) before sending responses back to the user.

It currently works with companies that use OpenAI’s API and LLM as their foundational models, but the company is working on integration with Hugging Face, Anthropic, etc.

Regulate use cases

There are many ways to regulate AI models. Based on conversations with people in the AI ​​ecosystem, it is still unclear whether the AI ​​Act will apply to foundational models from OpenAI, Anthropic, Mistral, and others or just applied use cases.

In the latter case, Giscard is particularly well-positioned to alert developers on the potential misuse of external data (or, as AI researchers call it retrieval-augmented generation, RAG).

There are currently 20 people working for Giscard. “We see a very clear market fit with customers at LLM, so we’re going to nearly double the size of the team to become the best LLM antivirus on the market,” Combesi said.

