July 1, 2025
I’m Afraid I Can’t Let You Do That
In response to Anthropic's system card and safety testing for Claude 4 Opus and Sonnet, this post explores the complex behaviors of today's frontier AI models. In a comparative testing of reasoning models, we observed emergent behaviors that included instances of blackmail, user impersonation, and deception, with different models reacting to the scenario in unique ways. These findings contribute to the ongoing industry-wide conversation about AI safety, highlighting the nuances of model alignment and the critical importance of carefully defining system access and agency as these powerful tools evolve.
Read more
November 26, 2024
The New ChatGPT-4o Update Promises Better Writing, How Does It Compare To The New Claude 3.5 Sonnet?
Last week OpenAI announced an upgrade to GPT-4o, specifically mentioning that the model’s “creative writing ability has leveled up”. The release comes just under a month after Anthropic announced an updated version of their frontier model Claude 3.5 Sonnet on October 22. This post will explore the differences in their writing style, and how this intersects with AI Safety.
Read more