Artificial intelligence startup Arthur is helping businesses find the best generative text tools for their needs with a new open-source platform titled Arthur Bench.
The new service, released Thursday, August 17, evaluates output from various large language models (LLMs), algorithms used in tools such as OpenAI’s ChatGPT and Google’s Bard. With Arthur Bench, companies can compare results from different generative text platforms to see how each will behave given the same set of parameters, allowing them to find the perfect fit before incorporating AI into their businesses. The artificial intelligence startup simultaneously launched The Generative Assessment Project (GAP), a research effort aimed at identifying the pros and cons of competing LLMs. In its press release, Arthur reports that it has already discovered areas where Anthropic’s Claude-2 outperforms GPT-4. Through the GAP, the company can provide further updates and insights into industry-leading AI products.
With Aurthur Bench and the GAP, the artificial intelligence startup has become one of the first organizations to compare AI models. LLMs often differ in terms of reliability given certain subject matters and can create output with varying degrees of quality or ease, making it difficult for companies to identify compatible platforms and obtain consistent results. Even worse, since most generative text algorithms are constantly analyzing new data, responses to user queries can change over time. Not only will companies now be able to determine which service aligns best with their operations, but they can also stay informed on updates and gain insight into the types of prompts and guidelines needed to get their desired content. Priyanka Oberoi, staff data scientist at Axios HQ, noted that Arthur Bench had “helped us develop an internal framework to scale and standardize LLM evaluation across features, and to describe performance to the Product team with meaningful and interpretable metrics.”
The ability to study generative text platforms before deciding on a service could be a game-changer for businesses and may lead to wider adoption of AI tools among hesitant companies. The artificial intelligence startup could also be paving the way for more oversight in an emerging sector. Standards, regulations and best practices have yet to be written when it comes to LLMs and other algorithms. A comparative platform like Arthur Bench could prove crucial to making the technology more accessible and reliable.