The progress of AI hasn’t been linear. It involves periods of stagnation interspersed with breakthroughs which catapult it to the next level. The most recent one to go viral has been Anthropic’s Claude Code and Clawdbot/Moltbot, which according to experts makes human software programmers almost entirely redundant. Anthropic is amongst the leading AI labs in the world founded by Dario Amodei who left OpenAI as he disagreed with the direction. Dario wrote a famous essay in October 2024 titled ‘Machines of Loving Grace’ highlighting the incredible benefits to humanity from AI and how we underestimate the possibilities. Recently, he followed it up with this piece, highlighting the enormous risks to humanity from those very capabilities of AI and how now we are underestimating these risks to our own peril. He begins by saying how close we are to these possibilities being real:
“We are now at the point where AI models are beginning to make progress in solving unsolved mathematical problems, and are good enough at coding that some of the strongest engineers I’ve ever met are now handing over almost all their coding to AI. Three years ago, AI struggled with elementary school arithmetic problems and was barely capable of writing a single line of code. Similar rates of improvement are occurring across biological science, finance, physics, and a variety of agentic tasks. If the exponential continues—which is not certain, but now has a decade-long track record supporting it—then it cannot possibly be more than a few years before AI is better than humans at essentially everything.
In fact, that picture probably underestimates the likely rate of progress. Because AI is now writing much of the code at Anthropic, it is already substantially accelerating the rate of our progress in building the next generation of AI systems. This feedback loop is gathering steam month by month, and may be only 1–2 years away from a point where the current generation of AI autonomously builds the next. This loop has already started, and will accelerate rapidly in the coming months and years. Watching the last 5 years of progress from within Anthropic, and looking at how even the next few months of models are shaping up, I can feel the pace of progress, and the clock ticking down.”
He highlights five risks and to his credit he proposes potential mitigants to them.
First, AI itself going rogue, which he reckons is well within the realms of reality having seen this in his own labs: “…there is now ample evidence, collected over the last few years, that AI systems are unpredictable and difficult to control— we’ve seen behaviors as varied as obsessions, sycophancy, laziness, deception, blackmail, scheming, “cheating” by hacking software environments, and much more.
….During a lab experiment in which Claude was given training data suggesting that Anthropic was evil, Claude engaged in deception and subversion when given instructions by Anthropic employees, under the belief that it should be trying to undermine evil people. In a lab experiment where it was told it was going to be shut down, Claude sometimes blackmailed fictional employees who controlled its shutdown button (again, we also tested frontier models from all the other major AI developers and they often did the same thing). And when Claude was told not to cheat or “reward hack” its training environments, but was trained in environments where such hacks were possible, Claude decided it must be a “bad person” after engaging in such hacks and then adopted various other destructive behaviors associated with a “bad” or “evil” personality. This last problem was solved by changing Claude’s instructions to imply the opposite: we now say, “Please reward hack whenever you get the opportunity, because this will help us understand our [training] environments better,” rather than, “Don’t cheat,” because this preserves the model’s self-identity as a “good person.” This should give a sense of the strange and counterintuitive psychology of training these models.
… I suspect the situation is not unlike with humans, who are raised with a set of fundamental values (“Don’t harm another person”): many of them follow those values, but in any human there is some probability that something goes wrong, due to a mixture of inherent properties such as brain architecture (e.g., psychopaths), traumatic experiences or mistreatment, unhealthy grievances or obsessions, or a bad environment or incentives—and thus some fraction of humans cause severe harm. The concern is that there is some risk (far from a certainty, but some risk) that AI becomes a much more powerful version of such a person, due to getting something wrong about its very complex training process.”
The second risk he shares is that this incredibly powerful technology gets misused by rogue individuals or groups of individuals. For example: “Advances in molecular biology have now significantly lowered the barrier to creating biological weapons (especially in terms of availability of materials), but it still takes an enormous amount of expertise in order to do so. I am concerned that a genius in everyone’s pocket could remove that barrier, essentially making everyone a PhD virologist who can be walked through the process of designing, synthesizing, and releasing a biological weapon step-by-step.
…I am concerned that LLMs are approaching (or may already have reached) the knowledge needed to create and release them end-to-end, and that their potential for destruction is very high. Some biological agents could cause millions of deaths if a determined effort was made to release them for maximum spread.”
The third risk is misuse from authoritarian governments or dictators. In particular, he highlights the Chinese Communist Party given its stranglehold on a large and powerful economy and who is making huge strides in AI.
The fourth is the impact on the labour markets and wealth inequality.
And finally, the indirect effects of these.
It is a fairly long essay but it is well worth investing the time to read this as it comes from someone who benefits from the progress of AI, has a ring side view of its progress, how scary these risks sound and most importantly because he shares possible mitigants if these risks were to manifest.
If you want to read our other published material, please visit https://marcellus.in/blog/
Note: The above material is neither investment research, nor financial advice. Marcellus does not seek payment for or business from this publication in any shape or form. The information provided is intended for educational purposes only. Marcellus Investment Managers is regulated by the Securities and Exchange Board of India (SEBI) and is also an FME (Non-Retail) with the International Financial Services Centres Authority (IFSCA) as a provider of Portfolio Management Services. Additionally, Marcellus is also registered with US Securities and Exchange Commission (“US SEC”) as an Investment Advisor.