‘Weak-to-Strong JailBreaking Attack’: An Efficient AI Method to Attack Aligned LLMs to Produce Harmful Text
Well-known Large Language Models (LLMs) like ChatGPT and Llama have recently advanced and shown incredible performance in a number of Artificial Intelligence (AI) applications. Though these models have demonstrated capabilities in tasks like content generation, question answering, text summarization, etc, there are concerns regarding possible abuse, such as disseminating false information and assistance for illegal…