Splendid Edition Sample

Issue #4 - Medical AI to open new hospital in the metaverse to assist (human) ex doctors affected by severe depression

March 17, 2023
Splendid Edition
Hi. If you are seeing this, it means that you are a valued member of our community. Or you are reading Issue #0. Or you hacked the archives.
Whichever the case, bravo.

If you have comments about anything you'll find below, or you have material to suggest, or topics you'd like to see covered (don't you dare to pitch me your startup), just send an email.
I'll read all emails, ignore them, wait a few weeks, and then use the best stuff for a new issue of the newsletter, pretending the ideas are original and mine.

Another thing. Super important question:

Do you have one of those moms that inexplicably know everyone and gossip all day long so that one little secret you have shared with them in confidence at breakfast becomes a fact known by the whole town by noon?

If so, can you tell your mom that Synthetic Work is a secret?

If she talks about this newsletter or forwards it to the entire neighbourhood, it helps me a lot.

Last week we spend some time looking at how AI is impacting the job market across industries: Issue #3 – I will edit and humanize your AI content.

This week, we go back to our usual programming and we deep dive into the Health Care industry to understand what AI is doing there.

This is a huge topic and what I’ll say below is a drop in the ocean compared to everything that’s happening on daily basis in the Health Care industry because of AI. Also, keep in mind that this week’s release of GPT-4 will bring even more changes. But we have to start somewhere.

Take a deep breath, this is going to be intense.

In This Issue

  • Why not use Stable Diffusion to generate the image of a tooth growing in a person’s brain, so young doctors learn how disgusting their profession can be?
  • OK. Let’s just generate chest X-rays for edge use cases. Somebody has a lovely AI model for you to download
  • Somebody else is analyzing the voice of patients to spot signs of anxiety
  • Google Med-PaLM 2 consistently performs at an “expert” doctor level on medical exam questions
  • DeepMind predicts the 3D structures of the over 200 million known proteins to boost drug discovery
  • Microsoft wants to play, too, and fine-tunes BioGPT on 15M PubMed abstracts
  • Stability AI launches yet another satellite organization: MedARC
  • Meanwhile, in Hungary, AI spots 22 undiagnosed cases of breast cancer
  • In the US, another AI is used to identify atrial fibrillation, diabetic retinopathy, and sepsis
  • The UK Medicines and Healthcare products Regulatory Agency (MHRA) says “party is over”

I want to start with the one thing that captured my imagination the most in the months before launching Synthetic Work: medical training augmented by generative AI.

One of the biggest challenges in any profession, but especially in the medical field, is preparing students for edge cases. Situations they will encounter rarely, if ever, during their career.

It’s harder to train a surgeon on how to remove a tooth that is growing in a person’s brain when he/she has never studied the situation.

I hope you were not eating lunch. Sorry about that.

So, to better prepare medical students to edge cases, a month ago, a group of radiologists and AI graduate students at Stanford decided to use Stable Diffusion in a novel way.

Remember that Stable Diffusion is an open generative AI system that produces images from a prompt submitted by the user in plain English. You can see some examples in this new section of Synthetic Work: photography.synthetic.work

So, these two took tens of thousands of existing images of chest x-rays, and the corresponding radiology reports, and used them to teach Stable Diffusion how to generate novel, and accurate, images of chest x-rays. A process called fine-tuning in technical jargon.

Then, they sent to Stable Diffusion novel prompts using medical terms, generating novel images of chest x-rays for arbitrary situations.

The fine-tuned model they created is called RoentGen and you can get it here.

Of course, this approach can be extended to many other branches of the medical profession and has the potential to transform the way students learn. Think, for example, what this approach could do for the students of psychology around the world:

“AI, create a clinically-realistic dialogue for my class where a doctor interviews a hypothetical patient affected by multiple personalities disorder who believes to be:

  • Marileen Monroe
  • Marie Antoinette
  • Crudelia De Mon
  • Mata Hari
  • Eve
  • Rita Levi-Montalcini
  • Stalin

Class, cure the patient.”

You are bound to fail without AI.

Speaking of which, there is some promising work that is being done to detect signs of mental health issues by analyzing patients’ voices with AI.

Ingrid K. Williams last year reported for the New York Times:

Psychologists have long known that certain mental health issues can be detected by listening not only to what a person says but how they say it, said Maria Espinola, a psychologist and assistant professor at the University of Cincinnati College of Medicine.

With depressed patients, Dr. Espinola said, “their speech is generally more monotone, flatter and softer. They also have a reduced pitch range and lower volume. They take more pauses. They stop more often.”

Patients with anxiety feel more tension in their bodies, which can also change the way their voice sounds, she said. “They tend to speak faster. They have more difficulty breathing.”

Today, these types of vocal features are being leveraged by machine learning researchers to predict depression and anxiety, as well as other mental illnesses like schizophrenia and post-traumatic stress disorder. The use of deep-learning algorithms can uncover additional patterns and characteristics, as captured in short voice recordings, that might not be evident even to trained experts.

“The technology that we’re using now can extract features that can be meaningful that even the human ear can’t pick up on,” said Kate Bentley, an assistant professor at Harvard Medical School and a clinical psychologist at Massachusetts General Hospital.

This technology (which, for a change, is not generative AI) is already being tested in multiple ways. Here, for example, an implementation offered by Cigna, the American health care and insurance provider, and developed by the AI startup Ellipsis Health:

Now, tell me. Aren’t you happy at the idea that when you’ll call your doctor to book an appointment next time, your phone call will be automatically scanned for signs of stress?

No? Do you prefer Amazon (which is building its online pharmacy), Apple (which is turning its every gadget into a medical device), Google, or Facebook to do the same when you call their customer services and automatically start serving you ads about anti-depressant?

Let’s stay on this topic for a moment longer, and talk about how these technology providers are on a collision course with traditional health care providers and how many medical professions could be impacted.

This week, the AI community released a wave of new AI models. You know that one of them is GPT-4, but you might have missed Google’s announcement of Med-PaLM2.

PaLM is Google’s most powerful large language model (the same AI technology that powers ChatGPT and GPT-4), and, from this week, you’ll start to see it integrated into Gmail, Google Docs, and a number of third-party applications and services.

Google has a special version of PaLM called Med-PaLM. I’m quoting from the official announcement:

Last year we built Med-PaLM, a version of PaLM tuned for the medical domain. Med-PaLM was the first to obtain a “passing score” (>60%) on U.S. medical licensing-style questions. This model not only answered multiple choice and open-ended questions accurately, but also provided rationale and evaluated its own responses.

Recently, our next iteration, Med-PaLM 2, consistently performed at an “expert” doctor level on medical exam questions, scoring 85%. This is an 18% improvement from Med-PaLM’s previous performance and far surpasses similar AI models.

While this is exciting progress, there’s still a lot of work to be done to make sure this technology can work in real-world settings. Our models were tested against 14 criteria — including scientific factuality, precision, medical consensus, reasoning, bias and harm — and evaluated by clinicians and non-clinicians from a range of backgrounds and countries

This is not just the biased enthusiasm of a technology vendor.

At the beginning of this year, Eric Topol, an American cardiologist, professor of molecular medicine, and one of the most respected doctors in the world, wrote about the previous version of PaLM:

From the above graphic, you can see the PaLM catapulting from 50% accuracy to 67.6%, an absolute jump of 17% (a relative increase of 33%!). Importantly, the parity in medical question answering was demonstrated by 92.6% of doctors saying the MED-PaLM chatbot was right as compared with 92.9% of other doctors being correct. Furthermore, for potential harm of the answers, there was only a small gap: the extent of possible harm was 5.9% for Med-PaLM and 5.7% for clinicians; the likelihood of possible harm was 2.3% and 1.3%, respectively.

the opportunity to get to machine-powered, advanced medical reasoning skills, that would come in handy (an understatement) with so many tasks in medical research (above Figure), and patient care, such as generating high-quality reports and notes, providing clinical decision support for doctors or patients, synthesizing all of a patient’s data from multiple sources, dealing with payors for pre-authorization, and so many routine and often burdensome tasks, is more than alluring.

It’s very early for LLMs/generative AI/foundation models in medicine, but I hope you can see from this overview that there has been substantial progress in answering medical questions—that AI is starting to pass the tests that approach the level of doctors, and it’s no longer just about image interpretation, but starting to incorporate medical reasoning skills. That doesn’t have anything to do with licensing machines to practice medicine, but it’s a reflection that a force is in the works to help clinicians and patients process their multimodal health data for various purposes. The key concept here is augment; I can’t emphasize enough that machine doctors won’t replace clinicians. Ironically, it’s about technology enhancing the quintessential humanity in medicine.

And Med-PaLM2 is not the only AI model that might have a critical impact on the Health Care industry.

There is also AlphaFold 2, an AI model that enormously accelerates drug discovery by predicting the 3D structures of the over 200 million known proteins that the scientific community hasn’t investigated yet.

AlphaFold was developed by Deepmind, the AI startup that Google acquired in 2014 for more than $500 million. The company taught the AI model the structure of 100,000 proteins. As result, now AlphaFold can predict with atomic accuracy the structure of most known proteins.

The AI model is so powerful that, in collaboration with the European Bioinformatics Institute (EMBL-EBI), DeepMind was able to publish an entire database of predicted 3D protein structures online.

Look at me searching for the dopamine receptor D4!

If you are interested in understanding exactly what AlphaFold does and its implications on drug discovery (and more), here a super-nerdy video about it:

Let’s continue with our list of AI models that are or might subvert the order of the Health Care universe.

Next up is Microsoft, which announced BioGPT, a large language model that is completely trained on medical literature (15M PubMed abstracts) instead of being a general-purpose model which learns medical literature afterwards.

BioGPT is really good at answering medical questions.

But BioGPT is not just good at answering questions. It can be used to dig deep into the medical literature learned by the AI model to find quickly the relevant research about any question. Kadir Nar, a Computer Vision Engineer, has already put together a prototype on Hugging Face, a platform for AI applications, if you want to try it:

And this AI model is based on GPT-2. Two!!! And developed by Microsoft Research, not OpenAI itself.

Don’t think for a second that OpenAI has not worked on a special version of GPT-4 for the Health Care industry.

In the second issue of Synthetic Work, we saw how GPT-4 (at that time not disclosed yet) was adopted in the second largest law firm in UK: Law firms’ morale at an all-time high now that they can use AI to generate evil plans to charge customers more money.

There is no reason to believe that OpenAI will not do the same for every other industry.

Finally, in our long review of AI models for the Health Care industry, there is Stability AI, the company that funded and promoted the release of Stable Diffusion. They are also nurturing the training of an open source model through the affiliate company MedARC that was announced last month.

If you want to know more about what they are doing, the two-hour launch video will be enlightening:

In reality, there is another dozen models that would be worth mentioning but this is a newsletter, not a research paper, and Synthetic Work is about the practical impact of AI on human labour, the economy, and society, not about academic models that might or might not find a commercial application.

So let’s talk about that.

Earlier this month, Adam Satariano and Cade Metz, two reporters for the New York Times, wrote about AI detecting cancer that human doctors couldn’t see:

Inside a dark room at Bács-Kiskun County Hospital outside Budapest, Dr. Éva Ambrózay, a radiologist with more than two decades of experience, peered at a computer monitor showing a patient’s mammogram.

Two radiologists had previously said the X-ray did not show any signs that the patient had breast cancer. But Dr. Ambrózay was looking closely at several areas of the scan circled in red, which artificial intelligence software had flagged as potentially cancerous.

“This is something,” she said.

Hungary, which has a robust breast cancer screening program, is one of the largest testing grounds for the technology on real patients. At five hospitals and clinics that perform more than 35,000 screenings a year, A.I. systems were rolled out starting in 2021 and now help to check for signs of cancer that a radiologist may have overlooked. Clinics and hospitals in the United States, Britain and the European Union are also beginning to test or provide data to help develop the systems.

More on AI finding cancer that radiologists miss, from the same article:

Mr. Kecskemethy, along with Kheiron’s co-founder, Tobias Rijken, an expert in machine learning, said A.I. should assist doctors. To train their A.I. systems, they collected more than five million historical mammograms of patients whose diagnoses were already known, provided by clinics in Hungary and Argentina, as well as academic institutions, such as Emory University. The company, which is in London, also pays 12 radiologists to label images using special software that teaches the A.I. to spot a cancerous growth by its shape, density, location and other factors.

From the millions of cases the system is fed, the technology creates a mathematical representation of normal mammograms and those with cancers. With the ability to look at each image in a more granular way than the human eye, it then compares that baseline to find abnormalities in each mammogram.

Last year, after a test on more than 275,000 breast cancer cases, Kheiron reported that its A.I. software matched the performance of human radiologists when acting as the second reader of mammography scans. It also cut down on radiologists’ workloads by at least 30 percent because it reduced the number of X-rays they needed to read.

Kheiron’s technology was first used on patients in 2021 in a small clinic in Budapest called MaMMa Klinika. After a mammogram is completed, two radiologists review it for signs of cancer. Then the A.I. either agrees with the doctors or flags areas to check again.

Across five MaMMa Klinika sites in Hungary, 22 cases have been documented since 2021 in which the A.I. identified a cancer missed by radiologists, with about 40 more under review.

It’s a huge breakthrough,” said Dr. András Vadászy, the director of MaMMa Klinika, who was introduced to Kheiron through Dr. Karpati, Mr. Kecskemethy’s mother. “If this process will save one or two lives, it will be worth it.”

Meanwhile, in the US, artificial intelligence is used in many other ways.

Sumathi Reddy reports for the Wall Street Journal:

At Mayo cardiology, an AI tool has helped doctors diagnose new cases of heart failure and cases of irregular heart rhythms, which are called atrial fibrillation, potentially years before they might otherwise have been detected, said Dr. Paul Friedman, chair of the clinic’s cardiology department in Rochester, Minn.

Doctors can’t tell on their own whether someone with a normal electrocardiogram, or ECG, might have atrial fibrillation outside of the test. The AI, however, can detect red-flag patterns in the ECGs that are too subtle for humans to identify.

Cano Health, a group of primary-care physicians in eight states and Puerto Rico, did a pilot last year using AI to analyze images from a special eye camera to identify diabetic retinopathy, a leading cause of blindness that can afflict people with diabetes. The test in four Chicago-area offices went well enough that the group now is looking to expand its use, said Robert Emmet Kenney, senior medical director at Cano Health.

Sinai Hospital in Baltimore is one hospital that uses an algorithm to identify hospitalized patients who are most at-risk for sepsis, a fast-moving response to an infection which is a main cause of death in hospitals.

The algorithm examines more than 250 factors, including vital signs, demographic data, health history and labs, said Suchi Saria, a professor of AI at Johns Hopkins and chief executive of the health AI company Bayesian Health, which developed the program.

The system alerts doctors if it determines a patient is septic or deteriorating. Doctors then evaluate the patient and start antibiotic treatment if they agree with the assessment. The system adjusts over time based on the doctors’ feedback, said Esti Schabelman, the hospital’s chief medical officer.

OK. Let’s take a deep breath.

What does all of this mean?

Let’s start with some considerations about the Health Care industry overall.

Well. We have a handful of technology providers that need to keep growing and are looking at the Health Care industry to find new profits. And these technology providers now have a once-in-a-lifetime technology, large multi-modal models, that can examine medical data and are equally or more accurate than a human doctor.

Two things can happen.

One is that these technology providers will offer their AI systems to traditional health care providers through a consumption model (pay-per-use) or another type of licensing agreement (we’ll talk about the business models of AI in a future Splendid Edition of Synthetic Work).

The other is that these technology providers will decide to bypass traditional health care providers and replace them by building a direct relationship with the patients.

If you think that the latter scenario is improbable, think about how Apple is slowly establishing itself as a financial services provider via Apple Wallet, Pay, Pay Later, Card, etc. For now, they work with Goldman Sachs, but it’s clear that this relationship is doomed.

If the latter scenario happens, technology providers will have every interest in placing their increasingly capable AIs in front of their customers.

Why let the users book an appointment with a fallible human doctor when they can have an instant medical consultation with ChatGPT, Siri or Alexa, or search for their conditions on Google?

It’s way more convenient. It addresses the anxiety of a patient by being instantaneously ready (did you ever spend sleepless nights while waiting to see your doctor two days after?). And it gives you the illusion of being mode discreet (raise your hand if you love to be seen in the waiting room while you wait for a sexual health exam).

Of course, I’m not the only one anticipating these scenarios. Which means that regulators are preparing to constrain technology vendors in every way they can. You know, just to try and avoid the advent of an artificial Elizabeth Holmes.

In the UK, for example, the Medicines and Healthcare products Regulatory Agency (MHRA)is already saying that:

LLMs that are developed for, or adapted, modified or directed toward specifically medical purposes are likely to qualify as medical devices.

Additionally, where a developer makes claims that their LLM can be used for a medical purpose, this again is likely to mean the product qualifies as a medical device.

The MHRA remains open-minded about how best to assure LLMs but any medical device must have evidence that it is safe under normal conditions of use and performs as intended, as well as comply with other applicable requirements of medical device regulation.

By the time technology providers will have obtained regulatory compliance for LLMs, we’ll have reached artificial general intelligence (AGI) and we’ll have much bigger problems to solve.

And now, let’s close with a consideration about all the doctors out there.

In 2017, Geoffrey Hinton, the British-Canadian cognitive psychologist and AI luminary, a man that has won more awards than you and I will ever see on TV, told the New Yorker:

I think that if you work as a radiologist you are like Wile E. Coyote in the cartoon. You’re already over the edge of the cliff, but you haven’t yet looked down. There’s no ground underneath.

It’s just completely obvious that in five years deep learning is going to do better than radiologists. It might be ten years. I said this at a hospital. It did not go down too well.

Six years have passed and now we have AI spotting breast cancer, atrial fibrillation, and diabetic retinopathy that specialists struggle to see.

Issue #40 - Everybody cracks under pressure

December 2, 2023
Splendid Edition
Hi. If you are seeing this, it means that you are a valued member of our community. Or you are reading Issue #0. Or you hacked the archives.
Whichever the case, bravo.

If you have comments about anything you'll find below, or you have material to suggest, or topics you'd like to see covered (don't you dare to pitch me your startup), just send an email.
I'll read all emails, ignore them, wait a few weeks, and then use the best stuff for a new issue of the newsletter, pretending the ideas are original and mine.

Another thing. Super important question:

Do you have one of those moms that inexplicably know everyone and gossip all day long so that one little secret you have shared with them in confidence at breakfast becomes a fact known by the whole town by noon?

If so, can you tell your mom that Synthetic Work is a secret?

If she talks about this newsletter or forwards it to the entire neighbourhood, it helps me a lot.
In This Issue

  • What’s AI Doing for Companies Like Mine?
    • Learn what Changi Airport Group, the US Navy, and Israel Defense Forces are doing with AI.
  • A Chart to Look Smart
    • ChatGPT can lie to its users, even when explicitly told to not do so.
  • Prompting
    • Just like in real life, we can ask an AI model to figure out what we really mean.

In the Transportation industry, Changi Airport Group (CAG) is testing the use of AI to automatically recognize forbidden items in baggage.

Kok Yufeng, reporting for The Straits Times:

Security checks for passengers flying out of Changi Airport could be up to 50 per cent quicker, if a trial to automatically detect prohibited items in carry-on luggage takes off.

Changi Airport Group (CAG) is currently testing a system at Terminal 3 that employs artificial intelligence (AI) and machine learning to screen and interpret images from the X-ray machines used to check cabin baggage at the boarding gate.

The initial results have been promising, CAG added, with the new AI-powered system performing as well as, or even better than, human security screeners in flagging some of the prohibited items that it has been trained to detect.

Reports elsewhere suggest that X-ray images from bag scanners can be screened up to five times faster with AI algorithms than a human operator. According to the magazine Airport World, multiple trials of such algorithms are under way in places such as China, the Netherlands and the United States.

CAG said the development and trial of the new security screening technology – which is known in the industry as an Automated Prohibited Items Detection System (Apids) – is still in its early stages.

It is currently being used only to assist security officers at Changi Airport by highlighting items it recognises as a threat.

The eventual goal, however, is to increase the level of automation so that security officers need manually check and review only the bags that the system has flagged.

Currently, security screeners at Changi Airport mainly rely on two-dimensional images produced by X-ray machines to detect whether there are dangerous items in carry-on luggage.

Newer bag scanners use computed tomography, or CT scans, to produce 3D images that provide more details, and allow passengers to keep electronics such as laptops inside their bags during the screening process.

one key area of improvement for Apids is to reduce the rate of false alarms to make it operationally viable, as well as to expand the list of prohibited items that it can detect

While protocols have been developed in Europe to assess if Apids can meet international security screening standards, CAG said further discussion is needed among international bodies and state regulators on policies for adopting this new technology.

In the last few months, I spent an inordinate amount of time working on object detection and classification AI Models. Part of what I learned, ended up in my AP Workflow 6.0 for ComfyUI.

The current generation of these models is very incredibly fast. Significantly faster than humans. But the models must be trained to recognize the objects that matter to you. They won’t recognize a medieval dueling shield if you don’t train them to do so.

The training is not complex, but it is time-consuming like every other data preparation job that is critical in AI.

A new generation of object detection and classification AI models has appeared in the last few months. These new models can identify any object based on a description in plain English. These are the ones I’m using in the AP Workflow 6.0.

This approach is better because it avoids the need to train a model to recognize a medieval dueling shield. Instead, you just tell the model that you want to detect something that is sharp or has sharp edges.

The downside of this approach, of course, is that it could trigger false positive detections. For example, a model trained to detect sharp objects could flag a pair of scissors as a potential threat.

And, of course, on top of that, no object detection and classification AI model could ever recognize a knife shaped like a plush teddy bear. But that is a problem that even human security screeners have.

If regulators accept these limitations, you should expect every major airport in the world to implement this technology in the next few years. Which, of course, means a significant impact on the security screening job market.

In the Defense industry, the US Navy is testing the use of AI to analyze sonar data and detect Chinese submarines faster.

Anthony Capaccio, reporting for Bloomberg:

Crews flying Pacific missions on the US Navy’s top maritime surveillance and attack aircraft will be using AI algorithms to rapidly process sonar data gathered by underwater devices of the US, UK and Australia, the defense chiefs of the three nations announced Friday.

The technology could enable the allies to track Chinese submarines with greater speed and accuracy as they search for ways to blunt the impact of China’s rapid military modernization and growing global assertiveness. The tests are part of the three nations’ extensive technology-sharing agreement known as Aukus Pillar II.

The three powers said they would deploy advanced artificial intelligence algorithms on multiple systems, including the P-8A Poseidon aircraft to process data from each nation’s sonobuoys, underwater detection devices.

According to the Pentagon’s latest annual report on China’s military, the country currently operates six nuclear-powered ballistic missile submarines, six nuclear-powered attack submarines, and 48 diesel powered/air-independent powered attack submarines.

The Chinese navy’s “submarine force is expected to grow to 65 units by 2025 and 80 units by 2035 despite the ongoing retirement of older hulls due to an expansion of submarine construction capacity,” the report found.

Tory Shepherd, reporting for The Guardian, adds more color:

It came after the prime minister, Anthony Albanese, last month accused a Chinese naval ship of “dangerous, unsafe and unprofessional” behaviour after Australian naval divers were injured by sonar pulses said to have been emitted by a Chinese warship in the international waters off Japan.

Despite Australia’s thawing trade relationship with China, there is ongoing tension over the latter’s presence in the region.

AI algorithms and machine learning will also be used to “enhance force protection, precision targeting, and intelligence, surveillance, and reconnaissance”.

Of course, the US Navy is not the only armed force around the world testing the use of AI to better identify patterns in radio, sonar, or satellite data. They are just the most visible.

But overall, at least for now, the AI Adoption Tracker paints a picture of the United States as the country experimenting and deploying AI for military intelligence and operations faster than any other country.

They already have aerial supremacy (also known as air superiority) compared to practically every other superpower. What happens if they gain AI supremacy as well?

In the Defense industry, the Israel Defense Forces (IDF) is using AI to automatically identify targets to attack during the ongoing conflict with Hamas.

Harry Davies and Bethan McKernan, reporting for The Guardian:

After the 11-day war in Gaza in May 2021, officials said Israel had fought its “first AI war” using machine learning and advanced computing.

The latest Israel-Hamas war has provided an unprecedented opportunity for the IDF to use such tools in a much wider theatre of operations and, in particular, to deploy an AI target-creation platform called “the Gospel”, which has significantly accelerated a lethal production line of targets that officials have compared to a “factory”.

“Other states are going to be watching and learning,” said a former White House security official familiar with the US military’s use of autonomous systems.

The Israel-Hamas war, they said, would be an “important moment if the IDF is using AI in a significant way to make targeting choices with life-and-death consequences”.

a short statement on the IDF website claimed it was using an AI-based system called Habsora (the Gospel, in English) in the war against Hamas to “produce targets at a fast pace”.

The IDF said that “through the rapid and automatic extraction of intelligence”, the Gospel produced targeting recommendations for its researchers “with the goal of a complete match between the recommendation of the machine and the identification carried out by a person”.

Multiple sources familiar with the IDF’s targeting processes confirmed the existence of the Gospel to +972/Local Call, saying it had been used to produce automated recommendations for attacking targets, such as the private homes of individuals suspected of being Hamas or Islamic Jihad operatives.

In recent years, the target division has helped the IDF build a database of what sources said was between 30,000 and 40,000 suspected militants. Systems such as the Gospel, they said, had played a critical role in building lists of individuals authorised to be assassinated.

According to Kochavi, “once this machine was activated” in Israel’s 11-day war with Hamas in May 2021 it generated 100 targets a day. “To put that into perspective, in the past we would produce 50 targets in Gaza per year. Now, this machine produces 100 targets a single day, with 50% of them being attacked.”

Precisely what forms of data are ingested into the Gospel is not known. But experts said AI-based decision support systems for targeting would typically analyse large sets of information from a range of sources, such as drone footage, intercepted communications, surveillance data and information drawn from monitoring the movements and behaviour patterns of individuals and large groups.

The target division was created to address a chronic problem for the IDF: in earlier operations in Gaza, the air force repeatedly ran out of targets to strike. Since senior Hamas officials disappeared into tunnels at the start of any new offensive, sources said, systems such as the Gospel allowed the IDF to locate and attack a much larger pool of more junior operatives.

One official, who worked on targeting decisions in previous Gaza operations, said the IDF had not previously targeted the homes of junior Hamas members for bombings. They said they believed that had changed for the present conflict, with the houses of suspected Hamas operatives now targeted regardless of rank.

The precision of strikes recommended by the “AI target bank” has been emphasised in multiple reports in Israeli media. The Yedioth Ahronoth daily newspaper reported that the unit “makes sure as far as possible there will be no harm to non-involved civilians”.

A former senior Israeli military source told the Guardian that operatives use a “very accurate” measurement of the rate of civilians evacuating a building shortly before a strike. “We use an algorithm to evaluate how many civilians are remaining. It gives us a green, yellow, red, like a traffic signal.”

However, experts in AI and armed conflict who spoke to the Guardian said they were sceptical of assertions that AI-based systems reduced civilian harm by encouraging more accurate targeting.

A lawyer who advises governments on AI and compliance with humanitarian law said there was “little empirical evidence” to support such claims. Others pointed to the visible impact of the bombardment.

Sources familiar with how AI-based systems have been integrated into the IDF’s operations said such tools had significantly sped up the target creation process.

“We prepare the targets automatically and work according to a checklist,” a source who previously worked in the target division told +972/Local Call. “It really is like a factory. We work quickly and there is no time to delve deep into the target. The view is that we are judged according to how many targets we manage to generate.”

Just last week, in Issue #39 – The Balance Scale, we saw how IDF is also testing AI-driven attack drones produced by Shield AI.

The Ukraine-Russia conflict and the Israel-Hamas conflict are accelerating the deployment of AI on the battelfield at an unsettling pace.

These are the same AI models that we use every day for mundane tasks and they are based on the same academic papers that get released every day by researchers around the world. Those researchers, and I know many of them, probaby didn’t expect their work to be weaponized so quickly and so extensively.

When we think about the moral and ethical implications of AI research, we exclusively focus on far-fetched scenarios like Artificial General Intelligence. We could instead focus on the fact that these highly-imprecise AI models are used every day to kill people.

There is a growing amount of research focused on using large language models to write or rewrite the questions that we humans submit with our prompts.

One recent example is described in the paper Prompt Engineering a Prompt Engineer.

I did the same in the Prompt Enrichment section of the AP Workflow 6.0 for ComfyUI.

In that implementation, I pass the prompt typed by a user that wants to generate an image with Stable Diffusion to GPT-4 or an open access model like LLaMA 2, asking the LLM to rewrite the prompt according to a specific set of rules.

The problem with most of these techniques (including my own effort) is that they are very complicated to implement. Not something we can use in our everyday interaction with ChatGPT or alternatives without the appropriate scaffolding in place.

However, one of these techniques is simple enough to remember and use.

If you read the Splendid Edition for a while, you know that I usually rename prompting techniques to make them easier to remember, given the absolute incapability of academics to write things in a comprehensible way.

For once, it won’t be necessary, as these researchers have chosen a reasonable name for their approach: Rephrase & Respond.

From their paper:

In this paper, we highlight an often-overlooked aspect of studies in LLMs: the disparity between human and LLM thought frames. Our research illustrates that this disparity significantly impacts the performance of LLMs. To tackle this problem, we propose to let the LLM to rephrase the question and incorporate additional details for better answering. We observe that, as opposed to questions asked casually by human, the rephrased questions tend to enhance semantic clarity and aid in resolving inherent ambiguity.

Upon rephrasing by the LLM itself, the newly generated question is more detailed and has a clearer question format

While GPT-4 indeed found the original questions challenging, it demonstrates the ability to effectively answer the rephrased questions it generates.

All of this is a polite way to say that humans are really bad at articulating what they want and, just like our life partners, GPT-4 has do to all the work to figure out what we really mean.

Ambiguity abounds in our communications, leading the AI model to return wrong answers:

Apparently, an effective way to improve the situation is by adding “Rephrase and expand the question, and respond.” at the end of our prompts.

Like every other prompting technique, this one doesn’t work equally well with every LLM out there. Rephrase & Respond improves the quality of the answers for every model, but it doesn’t turn a weak LLM into an exceptional one.

For those of you building commercial applications on top of LLMs, one thing to consider is that this approach can be used to chain multiple LLMs together: a stronger model, like GPT-4 could be used to rephrase the user prompt so that a weaker model like LLaMA 2 could provide a better answer at a fraction of the cost.

Now, I’ll let you go and try Rephrase & Respond with your family in real life. Let me know if you have averted an argument thanks to it, or if you caused one.

My money is on the latter.

Issue #41 - Hieroglyphics Didn't Matter That Much

December 9, 2023
Splendid Edition
Hi. If you are seeing this, it means that you are a valued member of our community. Or you are reading Issue #0. Or you hacked the archives.
Whichever the case, bravo.

If you have comments about anything you'll find below, or you have material to suggest, or topics you'd like to see covered (don't you dare to pitch me your startup), just send an email.
I'll read all emails, ignore them, wait a few weeks, and then use the best stuff for a new issue of the newsletter, pretending the ideas are original and mine.

Another thing. Super important question:

Do you have one of those moms that inexplicably know everyone and gossip all day long so that one little secret you have shared with them in confidence at breakfast becomes a fact known by the whole town by noon?

If so, can you tell your mom that Synthetic Work is a secret?

If she talks about this newsletter or forwards it to the entire neighbourhood, it helps me a lot.

As you’ll read in this week’s Free Edition, Synthetic Work takes a quick break. Not because it’s winter holiday time, but because I have to fly to Italy to keynote a private event and lead an internal hackathon.

Both editions of Synthetic Work will reach your inbox as usual on December 29th.

While you wait, here are five things you can do with your Sage subscription:

  1. Read the archive with all 82 editions of Synthetic Work, which equals over 1,000 pages of content at this point. That should be enough to keep you busy during the holidays.
  2. Study the most popular AI enterprise use cases by reviewing 120+ early adopters tracked in the AI Adoption Tracker.
  3. Test our new custom GPTs: Synthetic Work Presentation Assistant and (Re)Search Assistant and.
  4. Boost your productivity in 2024 by trying our tutorials on how to use AI for the most common use cases.
  5. Practice your prompt engineering skills with the HowToPrompt library.

Also, remember that you can access our Discord server to network with other Sage and Explorer members who do not want to spend time with their families.

Happy Holidays!

In This Issue

  • Intro
    • How to keep busy during the holidays.
  • What’s AI Doing for Companies Like Mine?
    • Learn what Renaissance Hotels, EY, BlackRock, and Cali Group are doing with AI.
  • A Chart to Look Smart
    • End-of-year reports from PluralSight, Evident AI, and cnvrg.io give us a shower of charts on AI adoption trends and talent flows.
  • Prompting
    • Anthropic explains why Claude 2.1 failed the Needle-in-a-Haystack test. To fix the problem just use clever prompt engineering.

In the Hospitality industry, Renaissance Hotels is about to launch an AI-powered virtual concierge called Renai.

From the official press release:

Travelers staying at select Renaissance Hotels, part of Marriott Bonvoy’s extraordinary portfolio of over 30 hotel brands, will soon have instant access to vetted, insiders’ picks for vibey cocktail bars, under-the-radar attractions, top-rated breakfast spots and more through an exciting new virtual concierge service. Today the brand announced a pilot program for RENAI by Renaissance (pronounced “ren-A”), which stands for Renaissance Artificial Intelligence, and is akin to having a well-connected local who is available 24/7 all right from guests’ smartphones.

Renaissance Navigators have provided their expertise to train RENAI by Renaissance with their top picks, and these specific recommendations will always be designated by a compass emoji. It also leverages ChatGPT and reputable open-source outlets, which have contributed recommendations to a curated and constantly refreshed “black book” directory that is vetted by human Navigators. This means users can have confidence that the suggestions they see serve as a true reflection of the neighborhood.

Guests are now able to test out RENAI by Renaissance at The Lindy Renaissance Charleston Hotel, Renaissance Dallas at Plano Legacy West Hotel, and Renaissance Nashville Downtown.

For guests who may want to conduct their own research on local experiences before their trip even starts or get a head start while waiting in line for the concierge, they can simply scan a QR code to connect with RENAI by Renaissance via text message or WhatsApp and start a conversation. Guests will then receive a response to their requests with recommendations that have been vetted by Renaissance Navigators as well as identify special deals on restaurants, tours and more.

Following RENAI by Renaissance’s pilot period, Renaissance Hotels plans to expand the AI-powered concierge service more widely in 2024, including over 20 properties globally by March 2024. The full rollout of the program is projected to include more enhancements to the service including additional communication platforms such as Instagram as well as curated recommendations from neighborhood tastemakers such as musicians, DJs, artists, fashion designers, and more.

There is a reason why I never quote press releases. Today’s exception is a painful reminder of why I don’t do it.

That said, this is another application of generative AI that you should expect every hotel in the world will copy. And not just hotel chains. I would assume that every major credit card company will start converting a portion of their concierge staff into AI-powered virtual assistants.

In the Financial Services industry, EY started using AI to automatically identify audit frauds.

Robert Wright, reporting for Financial Times:

According to Kath Barrow, EY’s UK and Ireland assurance managing partner, the new system detected suspicious activity at two of the first 10 companies checked. The clients subsequently confirmed that both cases had been frauds.

This early success illustrates why some in the industry believe AI has great potential to improve audit quality and reduce workloads. The ability of AI powered systems to ingest and analyse vast quantities of data could, they hope, provide a powerful new tool for alerting auditors to signs of wrongdoing and other problems.

Some audit firms are sceptical that AI systems can be fed enough high quality information to detect the multiple different potential forms of fraud reliably. There are also some concerns about data privacy, if auditors are using confidential client information to develop AI.

Regulators are likely to have the final say over how the technology can be deployed. Jason Bradley, head of assurance technology for the UK’s Financial Reporting Council, the audit watchdog, said AI presented opportunities to “support improved audit quality and efficiency” if used appropriately.

But he warned that firms would need the expertise to ensure systems worked to the right standards. “As AI usage grows, auditors must have the skills to critique AI systems, ensuring the use of outputs is accurate and that they are able to deploy tools in a standards-compliant manner,” he said.

The technology could be particularly helpful if it reduces auditor workloads. Firms across the world are struggling to train and recruit staff. It could also help raise standards: in recent years auditors have missed serious financial problems that have caused the collapse of businesses including outsourcer Carillion, retailer BHS and café chain Patisserie Valerie.

EY’s experiment, according to Barrow, used a machine-learning tool that had been trained on “lots and lots of fraud schemes”, drawn from both publicly available information and past cases where the firm had been involved. While existing, widely used software looks for suspicious transactions, EY said its AI-assisted system was more sophisticated. It has been trained to look for the transactions typically used to cover up frauds, as well as the suspicious transactions themselves. It detected the two fraud schemes at the 10 initial trial clients because there had been similar patterns in the training data, the firm said.

EY’s competitors Deloitte and KPMG have criticized this approach suggesting that frauds are very sophisticated and unique, and that AI cannot be trained to detect them.

But that’s a nonsensical argument. Surely, having an AI that spots the clumsiest frauds is better than having no AI at all. Also, as usual, people insist on focusing on what AI can do today, ignoring the fact that we are on a vertical progression curve such that just 18 months ago we didn’t have even the intuition of the technology we are debating today.

Accounting experts are not equipped to comment on the potential of emerging technologies or their progression curves.

If your strategy is based on the here and now, you’ll soon need to prepare a new strategy.

In the Financial Services industry, BlackRock is about to launch an AI assistant that integrates with their portfolio management solution Aladdin.

Brooke Masters, reporting for Financial Times:

BlackRock plans to roll out generative artificial intelligence tools to clients in January as part of a larger drive to use the technology to boost productivity, the $9.1tn asset manager told employees on Wednesday.

The world’s largest money manager said in a memo to staff that it has used generative AI to construct a “co-pilot” for its Aladdin and eFront risk management systems. Clients will be able to use BlackRock’s large language model technology to help them extract information from Aladdin.

“GenAI will change how people interact with technology. It will improve our productivity and enhance the great work we are already doing. GenAI will also likely change our clients’ expectations around the frequency, timeliness, and simplicity of our interactions,” according to the memo from Rob Goldstein, chief operating officer; Kfir Godrich, chief innovation officer; and Lance Braunstein, head of Aladdin Engineering.

BlackRock is also building tools to help its investment professionals gather financial and other data for research reports and investment proposals, as well as a language translator, according to the memo. Additionally, in January, it will start deploying Microsoft’s AI add-on to Office 365 productivity software across the company.

the AI would be producing “first drafts” that must go through normal quality control, and all data would remain inside BlackRock’s “walled garden” rather than being shared with users of open access generative AI programmes

Once again, heavily regulated industries are implementing AI at breakneck speed.

Bonus story:

In the Foodservice industry, Cali Group is about to open the world’s first fully autonomous, AI-powered restaurant in California.

From the official press release:

Cali Group, a holding company using technology to transform the restaurant and retail industries, Miso Robotics, creator of Flippy (the world’s first AI-powered robotic fry station), and PopID, a technology company simplifying ordering and payments using biometrics, announced today that they are soon opening CaliExpress by Flippy, the world’s first fully autonomous restaurant.

Utilizing the most advanced systems in food technology, both grill and fry stations are fully automated, powered by proprietary leading-edge artificial intelligence and robotics. Guests will watch their food being cooked robotically after checking in with their PopID accounts on self-ordering kiosks to get personalized order recommendations and make easy and fast payments.

The new CaliExpress by Flippy restaurant is located in a prime retail location in Pasadena, California on the northwest corner of Green Street and Madison Avenue at 561 E. Green St.

Flippy, the famous robotic fry station, will serve up crispy, hot fries made from top grade potatoes that are always cooked to exact times. The menu is very simple, comprising burgers, cheeseburgers, and french fries.

The CaliExpress by Flippy kitchen can be run by a much smaller crew, in a less stressful environment, than competing restaurants — while also providing above average wages.

The CaliExpress by Flippy location will also be a pseudo-museum experience presented by Miso Robotics. Including dancing robot arms from retired Flippy units, experimental 3D-printed artifacts from past development, photographic displays, and much more, the space is designed to serve noteworthy food, plus inspire the next generation of kitchen AI and automation entrepreneurs.

Highlighting restaurant crew size and wages in a launch press release is an odd move. But, unquestionably, journalists will ask about the impact of automated restaurants on employment in the Foodservice industry. Assuming this thing doesn’t stay perennially broken, like the infamous ice cream machines at McDonald’s.

So, perhaps, this is an attempt to get control of the narrative.

Do you remember the embarrassing comparison between the new Claude 2.1 and GPT-4-Turbo that we discussed in Issue #39 – The Balance Scale?

Independent analysis demonstrated that Claude 2.1 was absolutely terrible at retrieving information in the middle of its enormous 200k token context window, while GPT-4-Turbo was almost flawless.

The test, to refresh your memory, consisted of injecting a random sentence in a random position of a very long document and seeing if the model could retrieve the information necessary to answer a question about that sentence.

Anthropic, the maker of Claude 2.1, didn’t quite appreciate the outcome of the test and has prepared a rebuttal to explain the behavior.

From the official blog post:

Claude 2.1 is trained on a mix of data aimed at reducing inaccuracies. This includes not answering a question based on a document if it doesn’t contain enough information to justify that answer. We believe that, either as a result of general or task-specific data aimed at reducing such inaccuracies, the model is less likely to answer questions based on an out of place sentence embedded in a broader context.

Claude doesn’t seem to show the same degree of reluctance if we ask a question about a sentence that was in the long document to begin with and is therefore not out of place.

What can users do if Claude is reluctant to respond to a long context retrieval question? We’ve found that a minor prompt update produces very different outcomes in cases where Claude is capable of giving an answer, but is hesitant to do so. When running the same evaluation internally, adding just one sentence to the prompt resulted in near complete fidelity throughout Claude 2.1’s 200K context window.

We achieved significantly better results on the same evaluation by adding the sentence “Here is the most relevant sentence in the context:” to the start of Claude’s response. This was enough to raise Claude 2.1’s score from 27% to 98% on the original evaluation.

Essentially, by directing the model to look for relevant sentences first, the prompt overrides Claude’s reluctance to answer based on a single sentence, especially one that appears out of place in a longer document.

This approach also improves Claude’s performance on single sentence answers that were within context (ie. not out of place). To demonstrate this, the revised prompt achieves 90-95% accuracy.

The next time you hear a world-famous AI “expert” saying that “prompt engineering doesn’t really matter that much, just write”, maybe change AI expert.