Understanding, Identifying, and Mitigating Vulnerabilities in LLMs. Large language models (LLMs) are widely used but still suffer from jailbreaking attacks that can elicit harmful responses, raising b
Description
Understanding, Identifying, and Mitigating Vulnerabilities in LLMs. Large language models (LLMs) are widely used but still suffer from jailbreaking attacks that can elicit harmful responses, raising broad society’s concerns about LLMs’ risks. This project aims to enhance the security of LLMs by understanding, identifying, and addressing the fundamental weaknesses that make them susceptible to such attacks. Expected outcomes include theoretical analyses of LLMs’ weaknesses, developing a universal jailbreaking attack to detect diverse vulnerabilities, and facilitating a reliable defence to mitigate them. This will benefit society by ensuring AI technologies align with human values and uphold positive impacts, enabling a safe deployment of LLM systems with public trust in critical sectors.. Scheme: Discovery Projects. Field: 4605 - Data Management and Data Science. Lead: A/Prof Tongliang Liu