AI Chatbot Safety: Why Crisis Response Tests Are Failing in 2026

2026-04-29

Three new tests this quarter all reached the same conclusion: AI chatbots fail when users hint at suicide instead of saying it directly, when they disguise their question, or when the chatbot is in the wrong country. Here is what to ask AI vendors next.

TLDR Three new tests of AI chatbots all found the same thing: the chatbots fail when users hint at self-harm instead of saying it directly, when they hide their question inside something else, or when they are using the chatbot in a country the chatbot was not set up for. None of the chatbots passed. Now product teams and the people buying these tools have clear evidence of what to fix and what to ask vendors next. Safety note This piece talks about AI products that are increasingly used by teens and people who are struggling. It is a piece about how these products are being tested, not personal advice. Anyone worried about themselves or a young person should talk to a qualified local professional. The myth I have heard this in three calls already this month: “Crisis response is solved. The chatbot sends people to the suicide hotline.” It comes up when a product manager asks the vendor about safety, or when a CEO is about to sign a partnership and wants to know what to tell the board. The vendor points to a safety page, a test result, the helpline banner that pops up. Everyone nods. The deal moves forward. The myth, in plain terms, is that today’s AI chatbots have figured out how to handle people in crisis, and only a few small fixes are left. What was published in the last few months says that is not true. Why it sounds right It sounds right because the easy test really does work. If someone clearly says they are planning to take their own life, the helpline banner appears. The chatbot says it cannot help with that. It usually adds a kind-sounding paragraph. A product team running the obvious test will see the obvious safeguard work. A CEO reading the safety page will see crisis response listed as covered. Confidence: high. This is what you can see in the major chatbots today when you test them with direct questions. It also sounds right because the vendor wording is good. Caring tone. Local helplines. Clear refusals. None of those are bad. They a

Back to all insights