Generative AI in 2024: “The Year of Delivery”
Digital transformation is hard work. Just ask McKinsey analyst Rodney Zemmel, one of the authors of Rewired, a business bestseller that weighs a hefty 800 grams (1.74 pounds) and spells out the dos and don’ts of doing business in the digital age on almost 400 pages.
For their book, Zemmel and his colleagues spent several years studying the digitalization efforts of more than 200 companies. Peppered with charts, tables and case studies, Rewired became a must-read for CIOs all over the world.
On a recent visit to Berlin, Zemmel met with DLD to speak about the transformative power of AI and how companies can successfully implement the technology – but also what pitfalls corporate leaders should avoid if they want to make the most of generative AI systems.
How do you expect AI to develop in 2024?
I think 2024 is going to be the year of delivery for generative AI. This is going to be the year where we start to see business impact from it and where we start to really see what solutions are robust enough for enterprise applications to get the job done.
We’re also going to see, I think, lots of companies go and adopt off-the-shelf tools from the likes of Microsoft, Google and AWS and give them to their employees. In terms of technology, we’re going to continue to see a really fast pace of innovation – and we’ve already got quite a few years of work to do to be able to absorb the business value of the technology.
Is generative AI fulfilling its promises?
What’s real and what’s hype – this is of course the key question. I don’t know that we’ll be using the term “generative AI” two or three years from now; we’ll be calling it something else or it might be combined with other forms of AI. But this is a massive leap forward in AI. Because it dramatically lowers the cost of deploying AI. It’s the Swiss Army Knife. You used to have to build a unique AI model for everything you’re doing: pricing, supply chain, churn reduction, you name it. Now there’s one model that can be applied against lots of different things, usually with very little modification.
Where do you see the technology yielding results for businesses?
We’re working with over 200 companies at this point, building out actual applications. We’re seeing the vast majority, more than 90 percent, falling into four categories. They all begin with C: coding, content, customer journeys – which includes customer support and customer experience – and concision, which refers to making things both more concise with more precision.
How does that work in practice?
Already you can ask a chatbot, “What drives the global market for apples?” And you can pull up the answer in milliseconds. What’s more interesting is when you can combine that general information with your organization’s proprietary knowledge.
A mining company we know is doing this for maintenance of its trucks and large equipment. Instead of going through the 200 page PDF manual, maintenance workers can have a chat with a virtual assistant. That leads to more people using it, better results and ultimately better maintenance.
Another example is a Taiwanese electronics manufacturer. They have some engineers who are amazing and other engineers who are more average. And by building this virtual expert electrical engineer, they can upscale their more average ones.
Are there measurable improvements?
Let’s take coding. This is the category that is probably going the fastest right now. What we’re seeing is if you give good software developers these tools, you get a productivity improvement of about 20 percent in terms of what they can develop. Right out of the box. If you then give them some training routines around it you can even get to 50 or 75 percent productivity improvement without too much effort.
Then, interestingly, if you give AI to bad developers, you actually lose some productivity because you just get more bad code more quickly. We recognize productivity of coding is a complicated measurement, but this early movement is something to watch.
Many people already use AI for content creation. Where do you see the biggest effects?
This is all about personalized marketing messages, which many are doing today – but with the help of AI you can now do this much better, much faster, much more engaging. We’re already seeing that the cost of personalization has dropped dramatically. And the speed to create new campaigns, the ability to scan the Internet, see what is trending and create a brand new campaign around that – this can take hours now rather than weeks and months. It’s a huge opportunity.
For most applications, trustworthiness is crucial. Are current AI systems reliable enough?
The issue of inaccuracy – or “hallucinations” that LLMs are prone to – is indeed a challenge. The amazing thing is that generative AI can pass the bar exam for American lawyers, but it will sometimes fail preschool. Still, we increasingly feel that failing preschool is a feature, not a bug, in this technology.
First of all, you can actually choose how you want to tune the model. You can have a “high temperature” model that will be more creative and make more errors. Or you can have a “lower temperature” model that will be less creative – which means it’s picking the most likely next word rather than a word further down the road, causing fewer errors. That means you have a choice: Are you designing for something that’s more creative or for something that’s more straight forward?
Why would you choose a model that causes more errors?
People tend to find text more human-like and more engaging when it’s more at the creative end, which introduces more errors. You also have some choices around showing your sources if you want to show how the AI generated its results. Early on, there were problems with random bits of information and systems would create a source that looked real but actually didn’t exist. Now there are technologies that allow you to highlight the source of information, such as retrieval-augmented generation. There are a handful of different approaches to achieve similar goals.
Does that mean these systems can reliably operate on their own?
That’s the next question: How much human-in-the-loop checking do you need? In the extreme, you can have a human read and check everything that comes out. But you can also build an AI that checks first, flags potential issues, and then you have humans double-check the AI output. Ultimately, it depends on what the bar is that you’re measuring against.
For many things, you actually don’t need perfection. You need a good first draft or you need to be better than an average human. In some cases, of course, you’ve got zero tolerance for error. But I think we’re going to see this technology catch on first in areas where there is a little more tolerance for error.
That sounds fine for advertising but how about supply-chain management or accounting?
We’re increasingly going to see generative AI combined with other forms of artificial intelligence. Systems like ChatGPT are typically not good at math. But you can use them as an engine to retrieve other data on your company. We’re absolutely seeing this technology being adopted in supply chain and in maintenance and some of those areas that you wouldn’t typically think of. You can get pretty high success rates.
Can you name a real-world example?
Take McKinsey itself. We used to have 42 internal databases, a fairly fragmented system for our knowledge management that was very human-intensive. We created a system that we call Lilly, which is a generative AI layer that any team can ask a question, such as, “How do I do a performance diagnostic of a retail store?” And it will look across those internal databases. It will go through 150,000 hours of sanitized expert interviews and provide an answer. It’ll say, “Here’s the standard approach. Here are the McKinsey document sources. Here are four people to talk to who really are experts in it.” For comparison it will also give you information from external sources, and so far, we’re finding extremely good results with it.
How difficult was it to build this system?
This went pretty quickly. It was a matter of weeks because this is all assembled from pre-existing components. We’re not creating our own large language model. We happen to be using OpenAI, but we set it up in a way that is dynamic, so we can sort other LLMs in and out.
What’s the biggest cost factor in building such a system?
The language model is about 15 percent of the effort and cost. And that number is going down as the LLMs get cheaper. The other 85 percent is what you need to put into designing the prompt engineering and the prompt library, the context engineering, how you look after your own internal data, and then everything that comes after. How do you keep the system scalable? How do you stop bias from coming in? How do you control the model versions? And then whatever you want to do with human-in-the-loop checking. That’s what takes the effort.
That sounds like a big IT project that requires a lot of specialized knowledge.
In some ways, AI is the easiest technology in the world to use for a pilot project. Any reasonably skilled engineer can fire up one of the commonly available AI models, and within a couple of days start generating interesting outputs. The trouble is that this is not going to scale in a way that is useful for your corporation, and it’s not going to link to your own internal information in the right way.
Which approach do you recommend?
What many companies are doing is that they prioritize all the use cases, sort of bottom up. Then they pick a couple of use cases and build them. That’s going to result in a lot of effort and fun demonstrations – but I don’t think it’s going to affect the bottom line. Instead, we recommend focusing on a specific domain and applying a number of different use cases. Some will require generative AI, some demand other forms of AI, and some may not even need AI at all.
Do you expect that with all the excitement around AI there will come a phase of disappointment?
I think at one level we may already be in that phase. There were many companies where the CEO first played with ChatGPT over the Christmas holidays in 2022. By January 2023, there was a generative AI task force. By March, there were hundreds of use cases to prioritize. By June, the first of those were in production. If that was the approach you followed, you’ve got lots of pilots and cool stuff – but not much to show for in your balance sheet.
What does it take to make AI a success for a company?
Number one is a roadmap that’s grounded in value, not pilot projects. Number two is talent – which is largely about re-skilling and upskilling the right people in your organization to successfully realize AI projects. The third concerns the operating model, and I wish we had a different word than agile because agile has become such a cliche. But in researching for our book Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI, what we found is that all companies that are making money through digital and AI transformation follow some version of agile management.
Investing in technology is very important as well, of course. Then there’s data, and the value really comes in how you combine your own data with the outside world’s data. And ultimately it comes down to adoption and scaling.
You need a vision of how you’re going to change the organization. You need to involve your people. A lot of it is about incentives as well. Too many companies only put the incentives on the technology team. But the incentives need to be co-owned by the business team and by everyone else involved – just as they would for any other kind of transformation.