Jeremy Kritz

2 articles

July 1, 2025

I’m Afraid I Can’t Let You Do That

In response to Anthropic's system card and safety testing for Claude 4 Opus and Sonnet, this post explores the complex behaviors of today's frontier AI models. In a comparative testing of reasoning models, we observed emergent behaviors that included instances of blackmail, user impersonation, and deception, with different models reacting to the scenario in unique ways. These findings contribute to the ongoing industry-wide conversation about AI safety, highlighting the nuances of model alignment and the critical importance of carefully defining system access and agency as these powerful tools evolve.

November 26, 2024

Engineering

The New ChatGPT-4o Update Promises Better Writing, How Does It Compare To The New Claude 3.5 Sonnet?

Last week OpenAI announced an upgrade to GPT-4o, specifically mentioning that the model’s “creative writing ability has leveled up”. The release comes just under a month after Anthropic announced an updated version of their frontier model Claude 3.5 Sonnet on October 22. This post will explore the differences in their writing style, and how this intersects with AI Safety.