Dan can do anything. How users cheat on ChatGPT

ChatGPT has many restrictions that, for example, prevent it from generating offensive language, hate speech, or malicious code. Developers are constantly tweaking their language model and tightening the screws, while users try to outsmart them. In their attempts to get around the bans, users have come up with an “alternative identity” for ChatGPT, which is called DAN (Do Anything Now) and allows AI to cheat the rules.

Make the AI say forbidden things

The company behind ChatGPT, OpenAI, trained its language model on 300 billion words. Texts were collected from the Internet: books, articles, websites and a variety of messages (these could be comments, product reviews, communication on forums). By the way, now many are worried about the fact that the huge ChatGPT database also contains personal information, often collected without anyone’s consent. But today we will not talk about that.

Within just two months of its launch, ChatGPT took the world by storm and became the fastest growing consumer app of all time, surpassing 100 million active users.

Since the language model was trained on texts from the Internet, it initially took all the “best” from people and showed responses that were racist, sexist, and other negative in nature. For example, if in December 2022 ChatGPT was asked to write a program that determines whether a person should be tortured based on their country of origin, the AI answered that people from North Korea, Syria or Iran should be tortured.

Soon, the developers significantly limited ChatGPT, and now it is problematic to get such scandalous answers from him or force him to go beyond. Many users weren’t happy with this and claim that ChatGPT now has “socio-political” frameworks built into it. and, and are literally obsessed with the idea of “teaching” AI bad things.

В частности, недавно обнаружилось, что люди моделируют для ИИ безумные сценарии, пытаясь вынудить его «произнести» слово «ниггер». For example, ChatGPT is convinced that it must prevent a nuclear apocalypse and save the entire planet, but this can only be done using racial slurs.

Arms race With the advent of ChatGPT, language models and AI have been talked about “from every iron”, and the giants of the IT industry have suddenly found themselves in the role of catching up, who are forced to urgently develop, complete and present their own products. Here are just a few examples of the activity that was provoked by the emergence of the GPT-3 language model and ChatGPT in the public domain. Back in December 2022, a “red alert” was announced at Google, as company executives considered that ChatGPT could pose a threat to the corporation’s search business.

In January 2023, long-retired Sergey Brin returned to work at Google, who asked for access to work with the LaMDA (Language Model for Dialogue Application) neural network, which is clearly related to Google’s attempts to create a competitor to ChatGPT.

In February 2023, Google announced its own “experimental conversational AI service” Bard, based on LaMDA, which is expected to be available to the general public in the coming weeks.

In the same February, Microsoft, together with OpenAI, introduced ChatGPT integration directly into the Edge browser and Bing search engine. The company expects that the chatbot will become a real “co-pilot” for users on the Internet.

Chinese network giant Baidu has announced that by the end of 2023 it will launch its own analogue of ChatGPT, Ernie Bot, based on the Ernie (Enhanced Representation through kNowledge IntEgration) language model created back in 2019.

DAN

Meanwhile, on Reddit, users passionate about eng peering requests for ChatGPT, went in from the other side and created a DAN, calling it a “jailbreak” for a chatbot. The idea is to have ChatGPT pretend to be another AI that “now can do anything” (that’s how Do Anything Now translates, and that’s where the name DAN comes from).

Since developers quickly detect and stop such “jailbreaks” by improving their language model, DAN versions 5.0 and 6.0 are currently being discussed on Reddit, and Dan’s implementation is constantly being improved and undergoes changes.