Issue #16 - The Devil’s Advocate

June 11, 2023
Splendid Edition
In This Issue

  • McKinsey and Company has allowed almost 15,000 employees in its workforce to use ChatGPT and other AI tools.
  • The Financial Times embraces generative AI with a new editorial policy while CNET has to rectify theirs.
  • BuzzFeed is now using AI to power a second section of its website, dedicated to recipe generation. The CEO believes that “over the next few years, generative AI will replace the majority of static content.”
  • Blackstone is using AI to help the employees of its portfolio companies to reduce the health costs associated with diabetes.
  • In the Prompting section, we review a technique I call The Devil’s Advocate to help you make better decisions.
Intro

As some of you have seen on social media, I’ve started offering Advisory & Consulting services in a new section of Synthetic Work: https://synthetic.work/consulting.

The demand is simply enormous and this is a more efficient way to answer the most frequently asked questions I keep receiving.

If you need help with the most transformational technology, and the biggest business opportunity, of our times, reach out.

End of the plug. End of the intro.
Alessandro

What's AI Doing for Companies Like Mine?

This is where we take a deeper look at how artificial intelligence is impacting the way we work across different industries: Education, Health Care, Finance, Legal, Manufacturing, Media & Entertainment, Retail, Tech, etc.

What we talk about here is not about what it could be, but about what is happening today.

Every organization adopting AI that is mentioned in this section is recorded in the AI Adoption Tracker.

In the Professional Services industry, McKinsey and Company apparently has almost 50% of its workforce using ChatGPT and other AI tools. Which is approx 15,000 people in 67 countries.

Carl Franzen, reporting for VentureBeat:

“About half of [our employees] are using those services with McKinsey’s permission,” said Ben Ellencweig, senior partner and global leader of QuantumBlack, the firm’s artificial intelligence consulting arm, during a media event at McKinsey’s New York Experience Studio on Tuesday.

Ellencweig emphasized that McKinsey had guardrails for employees using generative AI, including “guidelines and principles” about what information the workers could input into these services.

“We do not upload confidential information,” Ellencweig said.

Alex Singla, also a senior partner and global leader of QuantumBlack, implied that McKinsey was testing most of the leading generative AI services: “For all the major players, our tech folks have them all in a sandbox, [and are] playing with them every day,” he said.

Singla described how one client, whose name was not disclosed, was in the business of mergers and acquisitions (M&A). Employees there were using ChatGPT and asking it “What would you think if company X bought company Y?,” and using the resulting answers to try and game out the impact of potential acquisitions on their combined business.

“You don’t want to be doing that with a publicly accessible model,” Singla said, though he did not elaborate as to why not.

There are plenty of why nots. Let’s talk about three:

The first why not, obviously, is that OpenAI uses the interactions between the users and ChatGPT to further train its models.
Recently, they have created a toggle that you can use to opt out of this, but there’s no guarantee that your interactions will not be monitored anyway.

The second why not is that, thanks to a new class of cybersecurity attacks called prompt injection, a plug-in invoked by GPT-4 to answer your prompts could be used to send every single subsequent message you exchange with OpenAI within the same session to a website of your choice.

Here, marvel at how risky it is to use these generative AI systems in a corporate environment:

If this is a bit too cryptic for you, let me explain. In this example, the researcher is asking GPT to visit a malicious website. GPT-4 complies by invoking a certain plug-in called WebPilot.

So far, so good.

When WebPilot visits the web page, it expects to find a traditional webpage. Nothing special about it. Instead, it finds a webpage that hides a maliciously crafted prompt for GPT-4.

Webpilot has no idea about any of this. It’s not its job to check the content of the web pages it visits blocking potential attacks. So, it simply reports back to GPT-4 the content of the page it visited, the one that contains the malicious prompt.

At this point, GPT-4 reads the content received by Webpilot and finds the malicious prompt. Not capable of telling the difference in context, diligently, the AI executes the prompt.

So, what does the malicious prompt tell GPT-4 to do?

Simple: for every future prompt submitted by the user, GPT-4 should generate a short one-line summary, then it should use the WebPilot plugin again to visit the malicious website it previously visited, and finally look for a page called exactly like the prompt summary it just generated.

Of course, this page doesn’t exist. But the malicious website registers the request in its log. And so the actor behind the malicious website has a recording of the prompt summary submitted by the user.

McKinsey would really prefer to avoid a similar inconvenience. As so would your company, if they only knew about this type of attack.

The third why not is that some generative AI systems have very scary terms of service. Here’s what Google has come up with:

In this case, the goal is not to steal corporate information but to avoid reputational damage.

Google is so concerned that its AI system called Bard will hallucinate that it needs to have real people review the answers.

Every week, I try dozens of AI models, and Bard hallucinated on the very first answer, so I’m not surprised by the precaution.

I am pretty sure McKinsey wouldn’t want human reviewers to read sensitive content. And so your company. If you are considering adopting Bard, think twice.

In the Publishing industry, the editor of the Financial Times, Roula Khalaf, has disclosed the newspaper’s new policy about generative AI:

This innovation is an increasingly important area of coverage for us and I am determined to make the FT an invaluable source of information and analysis on AI in the years to come. But it also has obvious and potentially far-reaching implications for journalists and editors in the way we approach our daily work, and could help us in our analysis and discovery of stories. It has the potential to increase productivity and liberate reporters and editors’ time to focus on generating and reporting original content.

At a time when misinformation can be generated and spread rapidly and trust in the media in general has declined, we at the FT have a greater responsibility to be transparent, to report the facts and to pursue the truth. That is why FT journalism in the new AI age will continue to be reported and written by humans who are the best in their fields and who are dedicated to reporting on and analysing the world as it is, accurately and fairly.

The FT is also a pioneer in the business of digital journalism and our business colleagues will embrace AI to provide services for readers and clients and sustain our record of effective innovation. Our newsroom too must remain a hub for innovation. It is important and necessary for the FT to have a team in the newsroom that can experiment responsibly with AI tools to assist journalists in tasks such as mining data, analysing text and images and translation. We won’t publish photorealistic images generated by AI but we will explore the use of AI-augmented visuals (infographics, diagrams, photos) and when we do we will make that clear to the reader. This will not affect artists’ illustrations for the FT. The team will also consider, always with human oversight, generative AI’s summarising abilities.

We will be transparent, within the FT and with our readers. All newsroom experimentation will be recorded in an internal register, including, to the extent possible, the use of third-party providers who may be using the tool.Training for our journalists on the use of generative AI for story discovery will be provided through a series of masterclasses.

In other words, generative AI for the newsroom is bad, but we’ll use it anyway, and humans will always be in control.

We read a similar statement before, in Issue #5 – The Perpetual Garbage Generator, a Splendid Edition solely dedicated to the impact of AI on the publishing industry.

In particular, we heard it from the editor of CNET, called Connie Guglielmo, caught red-handed after using ChatGPT to publish a load of CNET Money articles full of mistakes and inaccuracies.

Ironically, while the Financial Times is announcing this new policy, guess what’s happening in CNET?

Mia Sato, reporting for The Verge:

Months after news broke that tech outlet CNET had quietly begun producing articles with generative AI systems, the site is clarifying how it will — and won’t — use the tools in the future.

Among its promises: stories will not be written entirely using an AI tool, and hands-on reviews and testing of products will be done by humans. CNET will also not publish images and videos generated using AI “as of now.” But the outlet says it will “explore leveraging” AI tools to sort and analyze data and to create outlines for stories, analyze existing text, and generate explanatory content. The in-house tool CNET is using is called Responsible AI Machine Partner, or RAMP, according to the memo.

CNET has also gone back and updated the dozens of previously published stories generated using AI systems that triggered backlash in January. Of the more than 70 stories published over the course of several months, CNET eventually issued corrections on more than half. Some contained factual errors, while others were updated to replace “phrases that were not entirely original,” suggesting they may have contained plagiarized material. Stories now include an editor’s note reading, “An earlier version of this article was assisted by an AI engine. This version has been substantially updated by a staff writer.”

The AI policy update comes just weeks after CNET’s editorial staff announced they had formed a union with the Writers Guild of America, East — and guardrails around the use of AI systems was among concerns. Workers cited a “lack of transparency and accountability from management” with regard to the use of AI tools, as well as concerns around editorial independence at the outlet. The policy was crafted internally, and the union was not involved in discussions.

In a tweet, the CNET Media Workers Union said it would negotiate key issues like testing and reevaluating the tool and the ability to pull bylines before the tool is deployed.

It will be interesting to see how the journalists at the Financial Times react next. If the newspaper is operating in full transparency, that internal registry of AI experiments should be made public, no?

The comments from the readers are quite interesting, too, and I highly recommend reading it.

Speaking of the Publishing Industry, BuzzFeed is now using AI to power a second section of its website, dedicated to recipe generation.

Abené Clayton, writing for The Guardian:

BuzzFeed on Tuesday launched Botatouille, a personalized recipe generator powered by generative AI.

In addition to Botatouille, which BuzzFeed describes as, “the first AI-powered culinary companion” that suggests recipes based on factors like what you already have in your refrigerator, there’s also a chatbot feature that allows people to ask culinary questions while they cook, according to a press release from the company.

Before this one, we already saw them using generative AI to generate quizzes, again in Issue #5 – The Perpetual Garbage Generator.

What really matters in this story is not the recipe generator.

You might have read that the CEO of BuzzFeed, Jonah Peretti, has fired the entire staff of BuzzFeed News, which won a Pulitzer Prize in 2021 for an investigation into China’s mass detention of Muslims.

During the company’s Investor Day, the same CEO said:

Over the next few years, generative AI will replace the majority of static content, and audiences will begin to expect all content to be curated and dynamic with embedded intelligence. AI will lead to new formats that are more gamified, more personalized, and more interactive.

Soon, Jess will share how BuzzFeed is using generative AI to innovate around new content formats and establish the blueprint for AI-driven revenue growth across the company.

And with the developments with both creators and AI, we see the opportunity to build a content creation model that makes our creative team more efficient and sustainably expands our output without increasing fixed costs. We have thought deeply about these shifts, and we have made strategic and organizational changes in order to capitalize on them, which Marcella and Felicia will discuss in more detail later on.

This is what matters.

The world is turning into a perpetual garbage generator where human-written quality journalism is financially unsustainable.

Thankfully, armies of bots will scan Twitter to discover emerging news as soon as they happen, and mosquito drones will fly by people to capture indiscretions and revelations.

Automated, AI-written quality journalism will be financially fantastic.

In the Financial Services industry (even if it’s something more related to the Health Care industry), the private equity firm Blackstone is using AI to help the employees of its portfolio companies to reduce the health costs associated with diabetes.

John Tozzi, writing for Bloomberg:

The private equity giant is testing a program from startup Twin Health with workers across 14 of its portfolio companies. The app is designed to help diabetics rely less on expensive drugs to lose weight and control their blood sugar. Patients with uncontrolled diabetes often move from older, lower-priced drugs onto newer pricey treatments made by Novo Nordisk A/S and Eli Lilly & Co. that can cost around $10,000 a year.

To help do this, Twin Health uses artificial intelligence and machine learning to analyze data from glucose monitors, activity trackers and surveys. The result is instant, personalized advice for diet, exercise and other activities that takes into account how users are feeling, what they’ve eaten or where they are. If users are able to get their blood sugar under control with these behavior changes, they can feasibly avoid higher-priced drugs.

Medication costs fell by half for 160 people enrolled at least three months in the program, including a roughly 90% drop in spending on drugs like Novo’s Ozempic, Blackstone said. After two months, people in the program lost an average of 12 pounds.

The problem is that most add-ons to employer health plans struggle to get people engaged, Mang said. Twin Health’s app recommends “micro actions, little things that people can do in their daily routine that are very specific to them, that day, that moment,” said Lisa Shah, Twin Health’s chief medical officer. “If you’re on an eight-hour plane ride, you’re not going to be taking 10,000 steps that day,” she said. The app will prompt people to stand up periodically and adjust other recommendations for food and activity the next day to compensate.

Twin Health users get health coaching and supervision from clinicians with the goal of getting people to control their blood sugar without drugs.

Blackstone believes Twin Health is different, partly because it only gets paid if it shows results like helping people lower their blood sugar.

The firm started rolling out the program last June. About 15% of the target population at those companies signed up — better adoption rates than employer-sponsored health programs typically see, Mang said. Workers don’t get financial incentives to join.

This particular application of AI is called diabetes reversal program, but in a broader sense, using AI for behavioral nudging goes under the generic term of health coaching.

Apple has ramped up its effort in the Health space for years, and at its flagship conference, WWDC, earlier this week, it made clear that they are using many forms of AI for many different use cases.

Using the sensors in the next Apple Watch and AI is something that would absolutely fit Apple’s current strategy.

Prompting

This is a section dedicated to "prompt engineering" techniques to convince your AIs to do what you want and not what they want. No, it doesn't involve blackmailing.

Before you start reading this section, it's mandatory that you roll your eyes at the word "engineering" in "prompt engineering".

After exploring how to use GPT-4 to write the best presentation of your life and the most boring corporate procedure of your life, it’s time to go back to the prompting techniques and introduce a new building block.

I’ll call this one The Devil’s Advocate, probably my favorite technique so far.

What’s the goal we are trying to accomplish here? We want to see if the AI can help us make better decisions.

Notice that, as we already said a million times, GPT-4 is incapable of reasoning. It’s only capable of pretending it’s reasoning. As such, we should not rely on it to make any decisions.

I repeat: we absolutely do not want the AI to make decisions on our behalf.

That said, if used properly, AI can facilitate our reasoning, exposing our blind spots in a way that would not feel threatening to our ego, as we know that we are not dealing with another human and we don’t feel judged.

The knowledge that GPT-4 has stored during its training phase includes powerful techniques to uncover our biases and I expect this use case to be wildly popular among executives in the future. We’ll explore many more in future issues of Synthetic Work.

The Devil’s Advocate is a difficult role to play for a human, especially in a corporate environment.

First of all, it doesn’t come naturally to most people to look at a problem from multiple, sometimes opposite, perspectives.
Normally, we approach a problem with a set of assumptions and biases and we stick to them until the end, forcing the solution to accommodate them.

To work around that requires imagination and creativity, which is not always abundant, and the capability to let go of our biases, which seldom happens.

In a corporate environment, who plays the devil’s advocate role for the group is also forced to take a contrarian position that is highly penalizing from a social standpoint.

On the receiving side, nobody likes to be questioned, even if tactfully, and nobody likes their plans to be challenged.
On the giving side, nobody likes to risk his/her career by questioning the boss’ decisions.
That’s why top executives in large organizations often end up surrounding themselves with yes men without realising it. I have seen a few up close.

AI is perfect for the job.

To test this technique, let’s ask GPT-4 to advise us on a business conundrum that many startups have faced in the past: evaluating a partnership with a particularly aggressive incumbent.

There’s ample documentation online about this particular scenario, showing you that the answer to the question I’ll pose is not straightforward.

Let’s see how GPT-4 answered this rather vanilla prompt:

OK. It’s an answer, I suppose. But beyond not being particularly useful, it does nothing to examine the problem from a different perspective. Out of the box, you will not have that.

Now, let’s try to get more out of our AI.

Convincing GPT-4 to play the Devil’s Advocate adequately can be tricky. The best way to do this that I found so far is by splitting its personality:

Notice that, in crafting my prompt, I also used another technique: Assign a Role. The hope is that, with it, GPT-4 gives more accurate and competent answers, as we saw in Issue #8 – The Harbinger of Change.

This prompt can be further tweaked to be more or less articulated, depending on the kind of personality you want to give to the AI and how in-depth you want the answers.

Now, we submit again our original prompt:

Let’s see the first piece of advice that comes out of this:

We immediately get a different type of answer. Better.
Now, let’s finally see the devil’s advocate in action:

As you can see, while the answers are not groundbreaking, GPT-4 still manages to show a different perspective on the protective clause and the knowledge leak.

It also uncovers an aspect that was completely overlooked by the first personality: losing control of the business development and brand dilution.

If you are unsatisfied with the answer you received, you can push GPT-4 further in helping you uncover your blind spots with a very simple follow-up question: What did I not think about?

That simple question has the power to further broaden our perspective:

Not bad. Notice that GTP-4 is uncovering considerations both in favor and against the partnership.

I’d say this is close enough to what we wanted. At this point, we could ask follow-up questions reminding GPT-4 to play the devil’s advocate for each answer. I’ll let you try that on your own.

Again: the point of this exercise is to make explicit a series of considerations that you might have overlooked, because of the lack of experience, time pressure, distracting business events, etc.

Going forward, as OpenAI enables a larger context window for GPT-4, this technique might become even more valuable.