For all the fanfare, text-generating AI models like OpenAI’s GPT-4 make a lot of mistakes — some of them harmful. The Verge’s James Vincent once called one such model an “emotionally manipulative liar,” which pretty much sums up the current state of things.
These companies claim they are taking steps towards fixing problems. They have implemented filtering and teams of moderation who will correct problems as soon as flagged. There is no one solution. Even the most accurate models are susceptible to biases and toxic effects.
Nvidia released NeMo Guardrails to pursue “safer models for text-generation.” The open-source toolkit is intended to make AI applications more “accurate,” “appropriate,” and “on topic.”
Jonathan Cohen, the VP of applied research at Nvidia, says the company has been working on Guardrails’ underlying system for “many years” but just about a year ago realized it was a good fit for models along the lines of GPT-4 and ChatGPT.
Cohen told TechCrunch via email: “We’ve been working on this release since then.” “AI model safety for enterprise use cases is critical.”
Guardrails includes code, examples, documentation, and documentation to “add safety” to AI applications that generate both speech and text. Nvidia claims that the toolkit is designed to work with language models. Developers can create simple rules with only a few words.
Specifically, Guardrails can be used to prevent — or at least attempt to prevent — models from veering off topic, responding with inaccurate information or toxic language and making connections to “unsafe” external sources. Imagine preventing an agent in customer service from answering questions about the weather, or a bot on a search engines from linking to disreputable publications.
Cohen said: “With Guardrails developers can control what is allowed and not allowed for their applications.” “They could create guardrails too wide, or, on the other hand, too narrow, for their application.”
A universal fix for language models’ shortcomings sounds too good to be true, though — and indeed, it is. Companies like Zapier use Guardrails to add safety to generative models. Nvidia admits its toolkit isn’t perfect, and it won’t catch everything.
Cohen also notes that Guardrails works best with models that are “sufficiently good at instruction-following,” à la ChatGPT, and that use the popular LangChain Framework for AI-powered Applications Some open-source alternatives are not suitable.
And — effectiveness of the tech aside — it must be emphasized that Nvidia isn’t necessarily releasing Guardrails out of the goodness of its heart. It’s part of Nvidia NeMo framework. This is part of Nvidia’s enterprise AI software, and NeMo’s fully managed cloud services. Guardrails is available to any company. However, Nvidia prefers that the company pays for a hosted version.
So while there’s probably no harm in Guardrails, keep in mind that it’s not a silver bullet — and be wary if Nvidia ever claims otherwise.