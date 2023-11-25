global network pixabay

Cloudflare, the leading content delivery network and cloud security platform, wants to make AI accessible to developers. It has added GPU-powered infrastructure and model-serving capabilities to its edge network, bringing state-of-the-art foundation models to the masses. Any developer can tap into Cloudflare’s AI platform with a simple REST API call.

Cloudflare introduced Workers, a serverless compute platform, in 2017. Developers can use this serverless platform to create JavaScript service workers that run directly on Cloudflare edge locations around the world. With Worker, a developer can modify a site’s HTTP requests and responses, make parallel requests, and even respond directly to the edge. Cloudflare Workers use an API that is similar to the W3C Service Workers standard.

The rise of generative AI has inspired Cloudflare to augment its staff with AI capabilities. The platform has three new elements to support AI inference:

Workers AI runs on NVIDIA GPUs within Cloudflare’s global network, enabling a serverless model for AI. Users pay only for what they use, allowing them to spend less time on managing infrastructure and more time on their applications.

Vectorize, a vector database, enables easy, fast and cost-effective vector indexing and storage, supporting use cases that require access not only to operational models but also to customized data.

AI Gateway enables organizations to cache, rate limit, and monitor their AI deployments regardless of hosting environment.

Cloudflare has partnered with NVIDIA, Microsoft, Hugging Face, Databricks, and Meta to bring GPU infrastructure and foundation models to their edge. The platform also hosts embedding models to convert text into vectors. Vectorize databases can be used to store, index, and query vectors to add context to LLMs to reduce hallucinations in responses. AI Gateway provides observability, rate limiting and caching of persistent queries, reducing costs while improving the performance of applications.

Workers AI’s model catalog includes some of the most recent and best foundation models. From Meta’s Llama 2 to Stable Diffusion XL to Mistral 7B, it has everything developers need to build modern applications powered by generative AI.

model list cloud flare

Behind the scenes, Cloudflare uses the ONNX Runtime, an open neural network exchange runtime, an open source project led by Microsoft, to optimize models running in resource-constrained environments. This is the same technology that Microsoft relies on to run the Foundation model in Windows.

While developers can use JavaScript to write AI inference code and deploy it to Cloudflare’s edge network, it is possible to implement models through a simple REST API using any language. This makes it easier to incorporate generative AI into web, desktop, and mobile applications running in diverse environments.

In September 2023, Workers AI was initially launched in seven cities with inference capabilities. However, Cloudflare’s ambitious goal was to support Workers AI in 100 cities by the end of the year, with near-universal coverage by the end of 2024.

Cloudflare’s footprint cloud flare

Cloudflare is one of the first CDN and edge network providers to enhance their edge networks with AI capabilities through GPU-powered Worker AI, Vector Database, and AI Gateway for AI deployment management. By partnering with tech giants like Meta and Microsoft, it is offering a wide model catalog and ONNX runtime customization.