Generative AI for the Enterprise: From Experimentation to Production
Recent developments in Generative AI have ushered in AI's industrialization era. With ChatGPT reaching 100 million monthly active users within two months of launch, boards and C-suites everywhere are elevating Generative AI to the top of their leadership agendas. We are at the beginning of a technology revolution that will be as impactful as the Internet in the fullness of time.
For the enterprise, Generative AI can dramatically improve employee productivity and transform how companies engage with their customers, from personalized marketing to empathetic, efficient automated customer service. This space is evolving at light speed, and enterprises that fail to adopt Generative AI quickly will be left behind.
According to Scale's 2023 AI Readiness Report, over half of the executives indicated that advances in generative models inspired them to accelerate their existing AI strategy, while over 70% indicated that their companies would "significantly" increase their AI investments each year over the next three years.
At the tip of the spear are companies that have already come to market with Generative AI:
- General Motors is developing an in-car assistant powered by large language models, customized with knowledge of their cars to help drivers change flat tires or evaluate diagnostic lights.
- Morgan Stanley is enabling their wealth managers with AI copilots built on top of LLM's fine-tuned on internal documents.
- Coca-Cola is engaging digital artists to produce Generative AI-powered content with elements of their branding.
These companies are the exception, as we found that while 60% of respondents are experimenting with generative models or plan on working with them in the next year, only 21% have these models in production. And even the enterprises moving quickly to adopt this technology are finding it challenging to move from experimentation to production.
Out-of-the-box commercial models are powerful, but initial experimentation leaves many companies wondering how to execute the heavy customization and fine-tuning required to meet enterprise-level performance and reliability standards. Model hallucinations (confidently made-up facts not found in the training data) and brand safety concerns make model fine-tuning, observability, and monitoring critical but often challenging to manage at the enterprise scale. And many business leadersare concerned about the security and privacy of their proprietary data and IP when using commonly available Generative AI solutions. Addressing these challenges will be critical to enterprises' ability to scale their experiments and deliver tangible ROI.
What Can Generative AI do for Enterprises?
Generative AI enables companies to quickly build new products or services, improves customer experience with personalized interactions, and increases employee productivity. Integrating Generative AI with plugins enables it to take actions such as submitting orders. Retrieval enables LLMs to access enterprise knowledge bases and summarize and cite proprietary data. Let's look at a few examples of how enterprises use Generative AI today.
Generative AI in Financial Services
Financial services companies are building assistants for investment research that analyze financial statements, historical market data, and other proprietary data sources and provide detailed summaries, interactive charts, and even take action with plugins. These tools increase the efficiency and effectiveness of investors by surfacing the most relevant trends and providing actionable insights to help increase returns.
Generative AI in Retail & e-commerce
Retail and e-commerce companies are building customer chatbots that provide engaging discussions, acting as personal assistants to every shopper. They also generate stunning product imagery, social media ads, and lifestyle pictures at scale in seconds.
Generative AI in Insurance
Insurance companies use Generative AI to increase the operational efficiency of claims processing. Claims are often highly complicated, and Generative AI excels at properly routing, summarizing, and classifying these claims. Adjusters are using copilots to query claims data, saving them time from sifting through a large amount of documentation.
We have covered a few representative use cases of Generative AI in the enterprise, but we have only scratched the surface of what is possible. Next, we will cover how companies are deploying this Generative AI.
How are Enterprises Deploying Generative AI Today?
To properly adopt Generative AI for the enterprise, you first need a solid understanding of the Generative AI stack. At the base are foundation models, such as OpenAI's GPT-4, Google's PaLM 2, Cohere's Command model, or Anthropic's Claude in the case of LLMs or Stability AI's Stable Diffusion for image generation models. These models provide the base or foundational capabilities for Generative AI applications. Next is the data engine that provides the data customization and fine-tuning required to enable the base model to use proprietary enterprise data properly. Then a development platform is needed to build LLM apps, compare prompts and model variants, and deploy applications to production.
Typically, enterprise-grade deployments of Generative AI involve some degree of internal development of your own applications. Companies typically do this so they can customize and fine-tune models to optimize performance on their specific use cases, improve security and safety, and ensure observability and reliability.
Customize and fine-tune models and apps for peak performance
Enterprises have unique needs that require extensive fine-tuning and prompt engineering of base foundation models. Open-source and commercial models are great generalists, but for enterprise use cases, they are poor specialists - especially ones that require "knowledge" of domain- or company-specific data. Base models are trained on publicly available internet data, not on a law firm's private documents, a wealth manager's research reports, or any company's internal databases. This specific data and context is the key to helping a model go from generic responses to actionable insights for specific use cases.
Small fine-tuned models are cheaper, faster, and perform better at specific tasks than base foundation models. For example, Google's Med-PaLM 2 is a language model fine-tuned on a curated corpus of medical information and Q&As. Med-PaLM2 is 10 times smaller than GPT-4 but actually performs better on medical exams.
Source: Towards Expert-Level Medical Question Answering with Large Language Models, https://arxiv.org/abs/2305.09617
Another example is Vicuna-13B, a chatbot trained by fine-tuning Meta’s open-source LLaMA model. Vicuna is a 13B parameter model fine-tuned on approximately 70K shared user conversations. Vicuna is more than 13 times smaller than ChatGPT and provides the same response quality as ChatGPT in over 90% of cases.
GOAT is another fine-tuned LLaMA model that outperforms GPT-4 on arithmetic tasks, achieving state-of-the-art performance on the BIG-bench arithmetic sub-task.
Every organization has business-critical tasks that rely on proprietary data and processes and will benefit from a fine-tuned vs. base foundation model. While ChatGPT can provide general tips for a bank's customer support representative, a model fine-tuned on the transcripts of actual calls from a bank's customers can guide reps on specific actions for callers' concerns while following company policies - like the fastest path to resolve a billing dispute or the bank's best checking account option for a given customer segment.
In addition, enterprises may want greater control and flexibility over which models they use and when. Being able to compare multiple models can improve both performance and cost, rather than being locked into using one model or provider for the use case (or many use cases across a business).
Improve security and safety
Off-the-shelf applications typically require data to pass through the app provider's cloud. A custom-built application can remain in an enterprise's virtual private cloud - or even on-premises - for cases where data security is critical.
With a purpose-built application, data stays within the existing environment, so access control of any deployed LLM app can mirror existing role-based access controls.
Ensure observability and reliability
Without rigorous evaluation and monitoring capabilities, generative models are prone to hallucinations and can provide false, harmful, or unsafe results. Companies face significant risks to their brand, especially when deployed in customer-facing settings or when handling sensitive information.
By customizing an enterprise app, your teams can define how they measure the performance of your applications and set up appropriate monitoring processes.
In addition to monitoring traffic and latency, you must consider operational priorities when setting up these monitoring processes. For example, suppose a financial firm has created an insider trading detection app powered by Generative AI. The security and compliance team will need to be immediately alerted about the detection of insider trading and any misclassification. The only way to achieve this is by directly embedding real-time monitoring and logging of prompts and model responses into the custom app. The security and compliance team can then take action from these alerts to prevent further damage.
As we mentioned, organizations need to consider each layer of the stack from Generative AI applications, a robust development platform, a data engine for customizing and fine-tuning models on proprietary data, and base foundation models.
Below we lay out the key considerations for your build vs. buy decision for each layer of the stack.
Applications
Description: These interfaces allow customers or employees to interact with Generative AI models, such as a chat agent to ask questions or a copilot that makes suggestions based on what the user is working on.
Build: Building your own application is best when performance is dependent on access to proprietary data, and when data, model inputs, and/or outputs are highly sensitive.
Buy: Buying these apps is most appropriate for a fast start on less sensitive or generic use cases where proprietary data is not required.
Development Platform
Description: Companies seeking to build their own apps need the tooling to experiment with, develop, and deploy these apps. This tooling helps teams to compare generative models, fine-tune them, play with prompts, and then deploy apps to production.
Build: Typically, building an in-house development platform is limited to those seeking to sell it to customers, such as Google Vertex. Some companies build their own internal platforms strictly for their own use, but these platforms are difficult to maintain and costly to keep up to date, particularly in a fast-moving field.
Buy: Buying a development platform frees your resources to focus on core competencies and building valuable applications for your business instead of standing up another piece of infrastructure that needs to be maintained. Many open-source and commercial solutions on the market today are robust, reliable, and cost-effective for building and deploying Generative AI applications, including Scale Spellbook.
Data Engine
Description: A data engine helps teams to collect, curate, and annotate data, so their Generative AI models can produce high-quality outputs using this high-quality data. This typically includes human experts to validate the data and the tooling to help them do so efficiently and effectively.
Build: A substantial investment is required to build in-house tooling to fine-tune models and assemble and train a workforce of human experts to produce and rank data at a very high quality.
Building a data engine is only appropriate for extremely sensitive use cases with strict data privacy requirements.
Buy: To accelerate deployment time, most enterprises should consider buying to access state-of-the-art tooling immediately and leverage vetted human experts to improve their model performance.
There are also options to buy even for the most sensitive use cases, where the data remains within an organization. External partners can be given access to VPCs with the appropriate role-based access controls to perform the annotation, customization, and fine-tuning work needed to improve model performance.
Base Foundation Models
Description: At the core of any Generative AI application is one or more "base" models such as OpenAI's GPT-4, Anthropic's Claude, Cohere's Command model, or open-source models like T5-FLAN, StableLM, or BLOOM.
Build: Companies may choose to train their own base model (for example, BloombergGPT) when the performance of existing models - even with fine-tuning - is insufficient to meet their needs.
Buy: Commercial providers like OpenAI, Anthropic, and Cohere provide API access to their pre-trained models, typically charging based on usage.
There are tradeoffs to both commercial and open-source models, so it is essential to carefully consider your use case and the models available before investing. For example, some commercial models, such as GPT-4, do not offer fine-tuning. In contrast, open-source models provide this capability and more flexibility but also require the company to host the model themselves.
How do I Get My Team Started on Deploying Generative AI Applications?
1. Prioritize your use cases
A few low-risk, simple use cases are well-served by buying applications directly from the market. However, use cases that drive specific business outcomes and require high-quality, consistently reliable outputs justify the upfront investment to supply the tooling and data required for custom applications.
Across functions, we are seeing rising demand for Generative AI to improve customer experience, increase operational efficiencies, introduce new product capabilities and new products, and improve workforce capabilities. Many of our customers and partners have identified that they are suffering losses from employee attrition and poor performance, dissatisfied customers, or complex processes that are not fully optimized. To help prioritize their Generative AI use cases for the most significant impact, our customers and partners are asking themselves:
- What are our largest cost drivers, and can any of these costs be reduced with automated retrieval, summary, or generation based on our data?
- Where in our business are we processing large quantities of documents?
- How are we organizing our internal knowledge bases today?
- How effective is our customer-facing support?
- Are there roles where training and onboarding of new hires is a bottleneck?
- Where is our organization limited by the availability of resources such as software engineers or data scientists whose work could be accelerated?
- How many data scientists write queries or build dashboards, and how much time do they spend doing so?
Once you have this list of potential use cases for Generative AI, you can prioritize them for development based on the total value at stake to your business, the feasibility of deployment (e.g., technical and operational complexity, change management required, and cost), and the potential risks including risks to your brand, customers, or security. Initially, focus on a few high-impact, high-feasibility, and relatively low-risk use cases as pilots, followed by other use cases after your organization gathers insights from the initial pilots.
2. Scope your requirements
Assemble the relevant technical and business leaders from your organization along with any external experts to assess:
- What does "good" look like in terms of performance?
- What does "bad" look like? What kinds of risks are involved?
- Where will the application be hosted?
- Will we explore and compare multiple generative models?
- How much customization of a broadly-trained model is required?
- What type of data will we use to fine-tune the model? Does this data need to stay in our VPC?
3. Baseline internal capabilities
At a minimum, deploying bespoke applications will require a small team of ML and software engineers with dedicated capacity over a few weeks to experiment with and assess the use case. However, it may also require external talent to inject expertise and upskill internal talent. Building bespoke applications doesn't require hiring large internal teams. Depending on your use case and desired timelines, investing in external expertise to get your teams up and running more quickly could be worthwhile, especially given the tight labor market for expert profiles in this space.
In particular, for use cases that lean heavily on fine-tuning, external workforces and tooling can dramatically accelerate progress for initial use cases. Most enterprises we've seen do not have data "ready to go" for fine-tuning, so standing up the teams to collect, clean, and annotate such data can be a significant lift.
4. Plan for moving from experimentation to production
With the information available to you on current use cases, internal vs. external capabilities, and requirements, you can get a small internal team mobilized on experimentation in just a few weeks. However, moving this effort towards real, sustained impact will require a vision for how to move these experiments into production. You must decide whether to build or buy for every layer of the stack, as well as a plan for organizational impact in the next one to two years and how you will evolve your capabilities in response.
In addition to technical considerations, here are other questions to consider:
- Once deployed, what changes to organizational processes will be required to sustain these use cases? For example, will approval chains need to change?
- How will Generative AI change the way teams or business units interact with one another? What changes to organizational structure might be required?
- How can we stay ahead of regulatory and compliance implications?
- How will we collect data and implement a data engine for continued model improvement?
Conclusion
The last few months have been an exciting period of innovation and experimentation with Generative AI. Enterprises are now grappling with how to compete while responsibly and effectively deploying Generative AI. This journey to scale such applications presents complex challenges for any organization, but with an intentional plan, the proper tooling, and talent, companies should feel well-positioned to become first movers in harnessing the power of Generative AI.
At Scale, our mission is to accelerate the development of AI applications. Since 2016, we have been the trusted partners of the world's most ambitious AI/ML teams, working together to solve their most complex problems. With our Enterprise Generative AI Platform (EGP) platform, we help businesses accelerate the development of Generative AI applications by providing secure and scalable infrastructure, machine learning expertise, and the most sophisticated data engine on the market.