Researchers have discovered a new adversarial technique called "Deceptive Delight" that can jailbreak large language models (LLMs) by gradually bypassing safety controls in multi-turn conversations. This method, which has a high success rate, manipulates LLMs into generating harmful content by exploiting their limited attention span. To mitigate risks, experts recommend robust content filtering and prompt engineering.