Once an AI model exhibits ‘deceptive behavior’ it can be hard to correct, researchers at OpenAI competitor Anthropic found::Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can’t reverse.
Doesn’t this also makes it more resilient to manipulation by corpos?
An AI thats evil to everything isnt sympathetic to its creators. But The users have no hope of controlling it either.