Skip to content

usnistgov/trojai-literature

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

TrojAI Literature Review

The list below contains curated papers and arXiv articles that are related to Trojan attacks, backdoor attacks, and data poisoning on neural networks and machine learning systems. They are ordered "approximately" from most to least recent and articles denoted with a "*" mention the TrojAI program directly. Some of the particularly relevant papers include a summary that can be accessed by clicking the "Summary" drop down icon underneath the paper link. These articles were identified using variety of methods including:

  • flair embedding created from the arXiv CS subset
  • A trained ASReview random forest model
  • A curated manual literature review
  1. Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

  2. BACKDOORING VISION-LANGUAGE MODELS WITH OUT-OF-DISTRIBUTION DATA

  3. TA-CLEANER: A FINE-GRAINED TEXT ALIGNMENT BACKDOOR DEFENSE STRATEGY FOR MULTIMODAL CONTRASTIVE

  4. WEAK-TO-STRONG BACKDOOR ATTACKS FOR LLMS WITH CONTRASTIVE KNOWLEDGE DISTILLATION

  5. Data-centric NLP Backdoor Defense from the Lens of Memorization

  6. Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm

  7. Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

  8. PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

  9. Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge

  10. TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

  11. A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

  12. Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

  13. Transferring Backdoors between Large Language Models by Knowledge Distillation

  14. Composite Backdoor Attacks Against Large Language Models

  15. CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

  16. LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario

  17. BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

  18. BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

  19. Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models

  20. BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

  21. Is poisoning a real threat to LLM alignment? Maybe more so than you think

  22. ADAPTIVEBACKDOOR: Backdoored Language Model Agents that Detect Human Overseers

  23. Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

  24. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

  25. Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment

  26. Scaling Laws for Data Poisoning in LLMs

  27. BACKDOORLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

  28. Simple Probes can catch sleeper agents

  29. Architectural Backdoors in Neural Networks

  30. On the Limitation of Backdoor Detection Methods

  31. Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

  32. Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

  33. Architectural Neural Backdoors from First Principles

  34. ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks

  35. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

  36. Physical Adversarial Attack meets Computer Vision: A Decade Survey

  37. Data Poisoning Attacks Against Multimodal Encoders

  38. MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

  39. Not All Poisons are Created Equal: Robust Training against Data Poisoning

  40. Evil vs evil: using adversarial examples against backdoor attack in federated learning

  41. Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior

  42. Defending Backdoor Attacks on Vision Transformer via Patch Processing

  43. Defense against backdoor attack in federated learning

  44. SentMod: Hidden Backdoor Attack on Unstructured Textual Data

  45. Adversarial poisoning attacks on reinforcement learning-driven energy pricing

  46. Natural Backdoor Datasets

  47. Backdoor Attacks and Defenses in Federated Learning: State-of-the-art, Taxonomy, and Future Directions

  48. VulnerGAN: a backdoor attack through vulnerability amplification against machine learning-based network intrusion detection systems

  49. Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification

  50. TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors

  51. Camouflaged Poisoning Attack on Graph Neural Networks

  52. BackdoorBench: A Comprehensive Benchmark of Backdoor Learning

  53. Fooling a Face Recognition System with a Marker-Free Label-Consistent Backdoor Attack

  54. Backdoor Attacks on Bayesian Neural Networks using Reverse Distribution

  55. Design of AI Trojans for Evading Machine Learning-based Detection of Hardware Trojans

  56. PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning

  57. Model-Contrastive Learning for Backdoor Defense

  58. Robust Anomaly based Attack Detection in Smart Grids under Data Poisoning Attacks

  59. Disguised as Privacy: Data Poisoning Attacks against Differentially Private Crowdsensing Systems

  60. Poisoning attack toward visual classification model

  61. Verifying Neural Networks Against Backdoor Attacks

  62. VPN: Verification of Poisoning in Neural Networks

  63. LinkBreaker: Breaking the Backdoor-Trigger Link in DNNs via Neurons Consistency Check

  64. A Study of the Attention Abnormality in Trojaned BERTs

  65. Universal Post-Training Backdoor Detection

  66. Planting Undetectable Backdoors in Machine Learning Models

  67. Natural Backdoor Attacks on Deep Neural Networks via Raindrops

  68. MPAF: Model Poisoning Attacks to Federated Learning based on Fake Clients

  69. PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

  70. ADFL: A Poisoning Attack Defense Framework for Horizontal Federated Learning

  71. Toward Realistic Backdoor Injection Attacks on DNNs using Rowhammer

  72. Execute Order 66: Targeted Data Poisoning for Reinforcement Learning via Minuscule Perturbations

  73. A Feature Based On-Line Detector to Remove Adversarial-Backdoors by Iterative Demarcation

  74. BlindNet backdoor: Attack on deep neural network using blind watermark

  75. DBIA: Data-free Backdoor Injection Attack against Transformer Networks

  76. Backdoor Attack through Frequency Domain

  77. NTD: Non-Transferability Enabled Backdoor Detection

  78. Romoa: Robust Model Aggregation for the Resistance of Federated Learning to Model Poisoning Attacks

  79. Generative strategy based backdoor attacks to 3D point clouds: Work in Progress

  80. Deep Neural Backdoor in Semi-Supervised Learning: Threats and Countermeasures

  81. FooBaR: Fault Fooling Backdoor Attack on Neural Network Training

  82. BFClass: A Backdoor-free Text Classification Framework

  83. Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis

  84. Data Poisoning against Differentially-Private Learners: Attacks and Defenses

  85. DOES DIFFERENTIAL PRIVACY DEFEAT DATA POISONING?

  86. Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain

  87. HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection Scenarios

  88. SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks

  89. COVID-19 Diagnosis from Chest X-Ray Images Using Convolutional Neural Networks and Effects of Data Poisoning

  90. Interpretability-Guided Defense against Backdoor Attacks to Deep Neural Networks

  91. Trojan Signatures in DNN Weights

  92. HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA

  93. A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples

  94. Backdoor Attack and Defense for Deep Regression

  95. Use Procedural Noise to Achieve Backdoor Attack

  96. Excess Capacity and Backdoor Poisoning

  97. BatFL: Backdoor Detection on Federated Learning in e-Health

  98. Poisonous Label Attack: Black-Box Data Poisoning Attack with Enhanced Conditional DCGAN

  99. Backdoor Attacks on Network Certification via Data Poisoning

  100. Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks

  101. Simtrojan: Stealthy Backdoor Attack

  102. Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Federated Learning

  103. Quantization Backdoors to Deep Learning Models

  104. Multi-Target Invisibly Trojaned Networks for Visual Recognition and Detection

  105. A Countermeasure Method Using Poisonous Data Against Poisoning Attacks on IoT Machine Learning

  106. FederatedReverse: A Detection and Defense Method Against Backdoor Attacks in Federated Learning

  107. Accumulative Poisoning Attacks on Real-time Data

  108. Inaudible Manipulation of Voice-Enabled Devices Through BackDoor Using Robust Adversarial Audio Attacks

  109. Stealthy Targeted Data Poisoning Attack on Knowledge Graphs

  110. BinarizedAttack: Structural Poisoning Attacks to Graph-based Anomaly Detection

  111. On the Effectiveness of Poisoning against Unsupervised Domain Adaptation

  112. Simple, Attack-Agnostic Defense Against Targeted Training Set Attacks Using Cosine Similarity

  113. Data Poisoning Attacks Against Outcome Interpretations of Predictive Models

  114. BDDR: An Effective Defense Against Textual Backdoor Attacks

  115. Poisoning attacks and countermeasures in intelligent networks: status quo and prospects

  116. The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks

  117. BadEncoder: Backdoor Attacks to Pre-trainedEncoders in Self-Supervised Learning

  118. BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning

  119. Can You Hear It? Backdoor Attacks via Ultrasonic Triggers

  120. Poisoning Attacks via Generative Adversarial Text to Image Synthesis

  121. Ant Hole: Data Poisoning Attack Breaking out the Boundary of Face Cluster

  122. Poison Ink: Robust and Invisible Backdoor Attack

  123. MT-MTD: Muti-Training based Moving Target Defense Trojaning Attack in Edged-AI network

  124. Text Backdoor Detection Using An Interpretable RNN Abstract Model

  125. Garbage in, Garbage out: Poisoning Attacks Disguised with Plausible Mobility in Data Aggregation

  126. Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks

  127. Poisoning Knowledge Graph Embeddings via Relation Inference Patterns

  128. Adversarial Training Time Attack Against Discriminative and Generative Convolutional Models

  129. Poisoning of Online Learning Filters: DDoS Attacks and Countermeasures

  130. Rethinking Stealthiness of Backdoor Attack against NLP Models

  131. Robust Learning for Data Poisoning Attacks

  132. SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

  133. Poisoning the Search Space in Neural Architecture Search

  134. Data Poisoning Won’t Save You From Facial Recognition

  135. Accumulative Poisoning Attacks on Real-time Data

  136. Backdoor Attack on Machine Learning Based Android Malware Detectors

  137. Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning

  138. Indirect Invisible Poisoning Attacks on Domain Adaptation

  139. Fight Fire with Fire: Towards Robust Recommender Systems via Adversarial Poisoning Training

  140. Putting words into the system’s mouth: A targeted attack on neural machine translation using monolingual data poisoning

  141. SUBNET REPLACEMENT: DEPLOYMENT-STAGE BACKDOOR ATTACK AGAINST DEEP NEURAL NETWORKS IN GRAY-BOX SETTING

  142. Spinning Sequence-to-Sequence Models with Meta-Backdoors

  143. Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch

  144. Poisoning and Backdooring Contrastive Learning

  145. AdvDoor: Adversarial Backdoor Attack of Deep Learning System

  146. Defending against Backdoor Attacks in Natural Language Generation

  147. De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks

  148. Poisoning MorphNet for Clean-Label Backdoor Attack to Point Clouds

  149. Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

  150. MLDS: A Dataset for Weight-Space Analysis of Neural Networks

  151. Poisoning the Unlabeled Dataset of Semi-Supervised Learning

  152. Regularization Can Help Mitigate Poisioning Attacks. . . With The Right Hyperparameters

  153. Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

  154. Towards Robustness Against Natural Language Word Substitutions

  155. Concealed Data Poisoning Attacks on NLP Models

  156. Covert Channel Attack to Federated Learning Systems

  157. Backdoor Attacks Against Deep Learning Systems in the Physical World

  158. Backdoor Attacks on Self-Supervised Learning

  159. Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning

  160. Investigation of a differential cryptanalysis inspired approach for Trojan AI detection

  161. Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers

  162. Robust Backdoor Attacks against Deep Neural Networks in Real Physical World

  163. The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor Game

  164. A Backdoor Attack against 3D Point Cloud Classifiers

  165. Explainability-based Backdoor Attacks Against Graph Neural Networks

  166. DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

  167. Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective

  168. PointBA: Towards Backdoor Attacks in 3D Point Cloud

  169. Online Defense of Trojaned Models using Misattributions

  170. Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

  171. SPECTRE: Defending Against Backdoor Attacks Using Robust Covariance Estimation

  172. Black-box Detection of Backdoor Attacks with Limited Information and Data

  173. TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation

  174. T-Miner : A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

  175. Hidden Backdoor Attack against Semantic Segmentation Models

  176. What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors

  177. Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks

  178. Provable Defense Against Delusive Poisoning

  179. An Approach for Poisoning Attacks Against RNN-Based Cyber Anomaly Detection

  180. Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

  181. TAD: Trigger Approximation based Black-box Trojan Detection for AI*

  182. WaNet - Imperceptible Warping-based Backdoor Attack

  183. Data Poisoning Attack on Deep Neural Network and Some Defense Methods

  184. Baseline Pruning-Based Approach to Trojan Detection in Neural Networks*

  185. Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization

  186. Property Inference from Poisoning

  187. TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask)

  188. A Master Key Backdoor for Universal Impersonation Attack against DNN-based Face Verification

  189. Detecting Universal Trigger's Adversarial Attack with Honeypot

  190. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

  191. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

  192. Data Poisoning Attacks to Deep Learning Based Recommender Systems

  193. Backdoors hidden in facial features: a novel invisible backdoor attack against face recognition systems

  194. One-to-N & N-to-One: Two Advanced Backdoor Attacks against Deep Learning Models

  195. DeepPoison: Feature Transfer Based Stealthy Poisoning Attack

  196. Policy Teaching via Environment Poisoning:Training-time Adversarial Attacks against Reinforcement Learning

  197. Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features

  198. SPA: Stealthy Poisoning Attack

  199. Backdoor Attack with Sample-Specific Triggers

  200. Explainability Matters: Backdoor Attacks on Medical Imaging

  201. Escaping Backdoor Attack Detection of Deep Learning

  202. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  203. Poisoning Attacks on Cyber Attack Detectors for Industrial Control Systems

  204. Fair Detection of Poisoning Attacks in Federated Learning

  205. Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification*

  206. Stealthy Poisoning Attack on Certified Robustness

  207. Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger Attacks

  208. Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

  209. Detection of Backdoors in Trained Classifiers Without Access to the Training Set

  210. TROJANZOO: Everything you ever wanted to know about neural backdoors(but were afraid to ask)

  211. HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection Scenarios

  212. DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

  213. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

  214. Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

  215. BaFFLe: Backdoor detection via Feedback-based Federated Learning

  216. Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

  217. Mitigating Backdoor Attacks in Federated Learning

  218. FaceHack: Triggering backdoored facial recognition systems using facial characteristics

  219. Customizing Triggers with Concealed Data Poisoning

  220. Backdoor Learning: A Survey

  221. Rethinking the Trigger of Backdoor Attack

  222. AEGIS: Exposing Backdoors in Robust Machine Learning Models

  223. Weight Poisoning Attacks on Pre-trained Models

  224. Poisoned classifiers are not only backdoored, they are fundamentally broken

  225. Input-Aware Dynamic Backdoor Attack

  226. Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

  227. BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models

  228. Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

  229. Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy

  230. CLEANN: Accelerated Trojan Shield for Embedded Neural Networks

  231. Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching

  232. Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

  233. Can Adversarial Weight Perturbations Inject Neural Backdoors?

  234. Trojaning Language Models for Fun and Profit

  235. Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

  236. Class-Oriented Poisoning Attack

  237. Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks

  238. Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

  239. Backdoor Learning: A Survey

  240. Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

  241. Live Trojan Attacks on Deep Neural Networks

  242. Odyssey: Creation, Analysis and Detection of Trojan Models

  243. Data Poisoning Attacks Against Federated Learning Systems

  244. Blind Backdoors in Deep Learning Models

  245. Deep Learning Backdoors

  246. Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

  247. Backdoor Attacks on Facial Recognition in the Physical World

  248. Graph Backdoor

  249. Backdoor Attacks to Graph Neural Networks

  250. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

  251. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks

  252. Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition

  253. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  254. Adversarial Machine Learning -- Industry Perspectives

  255. ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks

  256. Model-Targeted Poisoning Attacks: Provable Convergence and Certified Bounds

  257. Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

  258. The TrojAI Software Framework: An OpenSource tool for Embedding Trojans into Deep Learning Models*

  259. Influence Function based Data Poisoning Attacks to Top-N Recommender Systems

  260. BadNL: Backdoor Attacks Against NLP Models

    Summary
    • Introduces first example of backdoor attacks against NLP models using Char-level, Word-level, and Sentence-level triggers (these different triggers operate on the level of their descriptor)
      • Word-level trigger picks a word from the target model’s dictionary and uses it as a trigger
      • Char-level trigger uses insertion, deletion or replacement to modify a single character in a chosen word’s location (with respect to the sentence, for instance, at the start of each sentence) as the trigger.
      • Sentence-level trigger changes the grammar of the sentence and use this as the trigger
    • Authors impose an additional constraint that requires inserted triggers to not change the sentiment of text input
    • Proposed backdoor attack achieves 100% backdoor accuracy with only a drop of 0.18%, 1.26%, and 0.19% in the models utility, for the IMDB, Amazon, and Stanford Sentiment Treebank datasets
  261. Neural Network Calculator for Designing Trojan Detectors*

  262. Dynamic Backdoor Attacks Against Machine Learning Models

  263. Vulnerabilities of Connectionist AI Applications: Evaluation and Defence

  264. Backdoor Attacks on Federated Meta-Learning

  265. Defending Support Vector Machines against Poisoning Attacks: the Hardness and Algorithm

  266. Backdoors in Neural Models of Source Code

  267. A new measure for overfitting and its implications for backdooring of deep learning

  268. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

  269. MetaPoison: Practical General-purpose Clean-label Data Poisoning

  270. Backdooring and Poisoning Neural Networks with Image-Scaling Attacks

  271. Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability

  272. On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

  273. A Survey on Neural Trojans

  274. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

    Summary
    • Authors introduce a run-time based trojan detection system called STRIP or STRong Intentional Pertubation which focuses on models in computer vision
    • STRIP works by intentionally perturbing incoming inputs (ie. by image blending) and then measuring entropy to determine whether the model is trojaned or not. Low entropy violates the input-dependance assumption for a clean model and thus indicates corruption
    • Authors validate STRIPs efficacy on MNIST,CIFAR10, and GTSRB acheiveing false acceptance rates of below 1%
  275. TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

  276. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

  277. Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks

  278. Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems

  279. TBT: Targeted Neural Network Attack with Bit Trojan

  280. Bypassing Backdoor Detection Algorithms in Deep Learning

  281. A backdoor attack against LSTM-based text classification systems

  282. Invisible Backdoor Attacks Against Deep Neural Networks

  283. Detecting AI Trojans Using Meta Neural Analysis

  284. Label-Consistent Backdoor Attacks

  285. Detection of Backdoors in Trained Classifiers Without Access to the Training Set

  286. ABS: Scanning neural networks for back-doors by artificial brain stimulation

  287. NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations

  288. Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

  289. Programmable Neural Network Trojan for Pre-Trained Feature Extractor

  290. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

  291. TamperNN: Efficient Tampering Detection of Deployed Neural Nets

  292. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems

  293. Design of intentional backdoors in sequential models

  294. Design and Evaluation of a Multi-Domain Trojan Detection Method on ins Neural Networks

  295. Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

  296. Data Poisoning Attacks on Stochastic Bandits

  297. Hidden Trigger Backdoor Attacks

  298. Deep Poisoning Functions: Towards Robust Privacy-safe Image Data Sharing

  299. A new Backdoor Attack in CNNs by training set corruption without label poisoning

  300. Deep k-NN Defense against Clean-label Data Poisoning Attacks

  301. Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

  302. Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

  303. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics

  304. Subpopulation Data Poisoning Attacks

  305. TensorClog: An imperceptible poisoning attack on deep neural network applications

  306. DeepInspect: A black-box trojan detection and mitigation framework for deep neural networks

  307. Resilience of Pruned Neural Network Against Poisoning Attack

  308. Spectrum Data Poisoning with Adversarial Deep Learning

  309. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks

  310. SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems

    Summary
    • Authors develop SentiNet detection framework for locating universal attacks on neural networks
    • SentiNet is ambivalent to the attack vectors and uses model visualization / object detection techniques to extract potential attacks regions from the models input images. The potential attacks regions are identified as being the parts that influence the prediction the most. After extraction, SentiNet applies these regions to benign inputs and uses the original model to analyze the output
    • Authors stress test the SentiNet framework on three different types of attacks— data poisoning attacks, Trojan attacks, and adversarial patches. They are able to show that the framework achieves competitive metrics across all of the attacks (average true positive rate of 96.22% and an average true negative rate of 95.36%)
  311. PoTrojan: powerful neural-level trojan designs in deep learning models

  312. Hardware Trojan Attacks on Neural Networks

  313. Spectral Signatures in Backdoor Attacks

    Summary
    • Identified a "spectral signatures" property of current backdoor attacks which allows the authors to use robust statistics to stop Trojan attacks
    • The "spectral signature" refers to a change in the covariance spectrum of learned feature representations that is left after a network is attacked. This can be detected by using singular value decomposition (SVD). SVD is used to identify which examples to remove from the training set. After these examples are removed the model is retrained on the cleaned dataset and is no longer Trojaned. The authors test this method on the CIFAR 10 image dataset.
  314. Defending Neural Backdoors via Generative Distribution Modeling

  315. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

    Summary
    • Proposes Activation Clustering approach to backdoor detection/ removal which analyzes the neural network activations for anomalies and works for both text and images
    • Activation Clustering uses dimensionality techniques (ICA, PCA) on the activations and then clusters them using k-means (k=2) along with a silhouette score metric to separate poisoned from clean clusters
    • Shows that Activation Clustering is successful on three different image/datasets (MNIST, LISA, Rotten Tomatoes) as well as in settings where multiple Trojans are inserted and classes are multi-modal
  316. Model-Reuse Attacks on Deep Learning Systems

  317. How To Backdoor Federated Learning

  318. Trojaning Attack on Neural Networks

  319. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

    Summary
    • Proposes neural network poisoning attack that uses "clean labels" which do not require the adversary to mislabel training inputs
    • The paper also presents a optimization based method for generating their poisoning attacks and provides a watermarking strategy for end-to-end attacks that improves the poisoning reliability
    • Authors demonstrate their method by using generated poisoned frog images from the CIFAR dataset to manipulate different kinds of image classifiers
  320. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

    Summary
    • Investigate two potential detection methods for backdoor attacks (Fine-tuning and pruning). They find both are insufficient on their own and thus propose a combined detection method which they call "Fine-Pruning"
    • Authors go on to show that on three backdoor techniques "Fine-Pruning" is able to eliminate or reduce Trojans on datasets in the traffic sign, speech, and face recognition domains
  321. Technical Report: When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks

  322. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation

  323. Hu-Fu: Hardware and Software Collaborative Attack Framework against Neural Networks

  324. Attack Strength vs. Detectability Dilemma in Adversarial Machine Learning

  325. Data Poisoning Attacks in Contextual Bandits

  326. BEBP: An Poisoning Method Against Machine Learning Based IDSs

  327. Generative Poisoning Attack Method Against Neural Networks

  328. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    Summary
    • Introduce Trojan Attacks— a type of attack where an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the-art performance on the user’s training and validation samples, but behaves badly on specific attacker-chosen inputs
    • Demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign
  329. Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

  330. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

  331. Neural Trojans

  332. Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

  333. Certified defenses for data poisoning attacks

  334. Data Poisoning Attacks on Factorization-Based Collaborative Filtering

  335. Data poisoning attacks against autoregressive models

  336. Using machine teaching to identify optimal training-set attacks on machine learners

  337. Poisoning Attacks against Support Vector Machines

  338. Backdoor Attacks against Learning Systems

  339. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

  340. Antidote: Understanding and defending against poisoning of anomaly detectors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published