Understanding, Identifying, and Mitigating Vulnerabilities in LLMs. Large language models (LLMs) are widely used but still suffer from jailbreaking attacks that can elicit harmful responses, raising b

The University of Sydney — Discovery Projects

Amount

Up to $725,800

Closes

Saturday 24 March 2029

Status

unknown

Type

open opportunity

Apply Now →

Description

Understanding, Identifying, and Mitigating Vulnerabilities in LLMs. Large language models (LLMs) are widely used but still suffer from jailbreaking attacks that can elicit harmful responses, raising broad society’s concerns about LLMs’ risks. This project aims to enhance the security of LLMs by understanding, identifying, and addressing the fundamental weaknesses that make them susceptible to such attacks. Expected outcomes include theoretical analyses of LLMs’ weaknesses, developing a universal jailbreaking attack to detect diverse vulnerabilities, and facilitating a reliable defence to mitigate them. This will benefit society by ensuring AI technologies align with human values and uphold positive impacts, enabling a safe deployment of LLM systems with public trust in critical sectors.. Scheme: Discovery Projects. Field: 4605 - Data Management and Data Science. Lead: A/Prof Tongliang Liu

Foundations Supporting This Area

MINDEROO PICTURES LIMITED

artsindigenoushealth

$210.0M/yr

BHP Foundation

indigenouscommunityhuman_rights

$195.1M/yr

THE TRUSTEE FOR BESEN FAMILY FOUNDATION

artshealthenvironment

$144.0M/yr

The University Of Queensland

educationresearchhealth

$87.2M/yr

Lowy Foundation

artscommunityresearch

$50.0M/yr

Discovery method: arc-grants

Last verified: Monday 2 March 2026

Added: Saturday 28 February 2026