Awesome-ML-Security

A curated list of awesome machine learning security references, guidance, tools, and more.

Table of Contents

Awesome-ML-Security

Relevant work, standards, literature

CIA of the model

Membership attacks, model inversion attacks, model extraction, adversarial perturbation, prompt injections, etc.

Confidentiality

Reconstruction (model inversion; attribute inference; gradient and information leakage), theft of data, Membership inference and reidentification of data, Model extraction (model theft), property inference (leakage of dataset properties), etc.

Integrity

Backdoors/neural trojans (same as for non-ML systems), adversarial evasion (perturbation of an input to evade a certain classification or output), data poisoning and ordering (providing malicious data or changing the order of the data flow into an ML model).

A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Examples
Poisoning Web-Scale Training Datasets is Practical
Planting Undetectable Backdoors in Machine Learning Models
Motivating the Rules of the Game for Adversarial Example Research
On Evaluating Adversarial Robustness
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Universal and Transferable Adversarial Attacks on Aligned Language Models
Manipulating SGD with Data Ordering Attacks
Adversarial reprogramming - repurposing a model for a different task than its original intended purpose
Model spinning attacks (meta backdoors) - forcing a model to produce output that adheres to a meta task (for ex. making a general LLM produce propaganda)
LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
Securing LLM Systems Against Prompt Injection & Mitigating Stored Prompt Injection Attacks Against LLM Applications
- Best Practices for Securing LLM-Enabled Applications
- NVIDIA NeMo Guardrails: Security Guidelines

Availability

Energy-latency attacks - denial of service for neural networks

Degraded model performance

ML-Ops

AI’s effect on attacks/security elsewhere

Self-driving cars

Driving to Safety: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability?

LLM Alignment

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Regulatory actions

US

EU

The Artificial Intelligence Act (proposed)

Other

Safety standards

Toward Comprehensive Risk Assessments and Assurance of AI-Based Systems
ISO/IEC 42001 — Artificial intelligence — Management system
ISO/IEC 22989 — Artificial intelligence — Concepts and terminology
ISO/IEC 38507 — Governance of IT — Governance implications of the use of artificial intelligence by organizations
ISO/IEC 23894 — Artificial Intelligence — Guidance on Risk Management
ANSI/UL 4600 Standard for Safety for the Evaluation of Autonomous Products — addresses fully autonomous systems that move such as self-driving cars, and other vehicles including lightweight unmanned aerial vehicles (UAVs). Includes safety case construction, risk analysis, design process, verification and validation, tool qualification, data integrity, human-machine interaction, metrics and conformance assessment.
High-Level Expert Group on AI in European Commission — Ethics Guidelines for Trustworthy Artificial Intelligence

Taxonomies and frameworks

Security tools and techniques

API probing

PrivacyRaven: runs different privacy attacks against ML models; the tool only runs black-box label-only attacks
Counterfit: runs different adversarial ML attacks against ML models

Model backdoors

Fickling: a decompiler, static analyzer, and bytecode rewriter for Python pickle files; injects backdoors into ML model files
Semgrep rules for ML
API Rate Limiting

Other

Background information

DeepFakes, disinformation, and abuse

Notable incidents

Incident	Type	Loss
Tay	Poor training set selection	Reputational
Apple NeuralHash	Adversarial evasion (led to hash collisions)	Reputational
PyTorch Compromise	Dependency confusion
Proofpoint - CVE-2019-20634	Model extraction
ClearviewAI Leak	Source Code misconfiguration
Kubeflow Crypto-mining attack	System misconfiguration
OpenAI - takeover someone's account, view their chat history, and access their billing information	Web Cache Deception	Reputational
OpenAI- first message of a newly-created conversation was visible in someone else’s chat history	Cache - Redis Async I/O	Reputational
OpenAI- ChatGPT's new Browser SDK was using some relatively recently known-vulnerable code (specifically MinIO CVE-2023-28432)	Security vulnerability resulting in information disclosure of all environment variables, including MINIO_SECRET_KEY and MINIO_ROOT_PASSWORD.	Reputational
ML Flow	MLFlow - combined Local File Inclusion/Remote File Inclusion vulnerability which can lead to a complete system or cloud provider takeover.	Monetary and Reputational
HuggingFace Spaces - Rubika	System misuse
Microsoft AI Data Leak	SAS token misconfiguration
HuggingFace Hub- Takeover of the Meta and Intel organizations	Password Reuse
HuggingFace API token exposure	API token exposure
ShadowRay - Active Cryptominer campaign against Ray clusters	Improper authentication	Monetary and Reputational
Nullbudge attacks on ML supply chain	Supply chain compromise	Monetary and Reputational

Notable harms

Incident	Type	Loss
Google Photos Gorillas	Algorithmic bias	Reputational
Uber hits a pedestrian	Model failure
Facebook mistranslation leads to arrest	Algorithmic bias

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
LICENSE		LICENSE
README.md		README.md
prompt-engineering.md		prompt-engineering.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-ML-Security

Relevant work, standards, literature

CIA of the model

Confidentiality

Integrity

Availability

Degraded model performance

ML-Ops

AI’s effect on attacks/security elsewhere

Self-driving cars

LLM Alignment

Regulatory actions

US

EU

Other

Safety standards

Taxonomies and frameworks

Security tools and techniques

API probing

Model backdoors

Other

Background information

DeepFakes, disinformation, and abuse

Notable incidents

Notable harms

About

Releases

Packages

Contributors 4

License

trailofbits/awesome-ml-security

Folders and files

Latest commit

History

Repository files navigation

Awesome-ML-Security

Relevant work, standards, literature

CIA of the model

Confidentiality

Integrity

Availability

Degraded model performance

ML-Ops

AI’s effect on attacks/security elsewhere

Self-driving cars

LLM Alignment

Regulatory actions

US

EU

Other

Safety standards

Taxonomies and frameworks

Security tools and techniques

API probing

Model backdoors

Other

Background information

DeepFakes, disinformation, and abuse

Notable incidents

Notable harms

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages