Trojan Detection via Learning Coefficient Analysis

Lead: Kelechi Stewart & Ben Blaker uptack

Investigating the use of the Local Learning Coefficient (LLC) for detecting trojans in neural networks

Type: Applied
Difficulty: Hard
Status: In-progress

This project aims to investigate the potential of using the Local Learning Coefficient (LLC) for detecting trojans (backdoors) in neural networks. We’ll explore whether LLC analysis can distinguish between trojaned and non-trojaned models, and investigate how LLC dynamics during training might reveal the presence of trojans.

Key research questions:

  1. Can LLC estimates effectively differentiate between trojaned and non-trojaned models?
  2. How do LLC trajectories differ during the training of clean versus trojaned models?
  3. Are there specific phases during training where LLC analysis is most effective for trojan detection?
  4. Can LLC-based methods complement or improve upon existing trojan detection techniques?
  5. How does the effectiveness of LLC-based trojan detection vary with different types of trojans and model architectures?

Methodology:

  1. Implement a framework for injecting various types of trojans into neural networks during training.
  2. Train a set of clean and trojaned models on standard datasets (e.g., CIFAR-10, ImageNet), tracking LLC throughout the training process.
  3. Analyze LLC estimates and trajectories for clean and trojaned models, identifying potential distinguishing features.
  4. Investigate how LLC behaves in different layers or components of the network in the presence of trojans.
  5. Develop and evaluate LLC-based metrics for trojan detection, comparing their performance to existing detection methods.
  6. Explore the use of LLC analysis in conjunction with other interpretability techniques for more robust trojan detection.
  7. Investigate how different trojan injection methods and trigger characteristics affect LLC dynamics.

Expected outcomes:

  1. Characterization of LLC behavior in clean versus trojaned models across various architectures and datasets.
  2. Development of LLC-based metrics or techniques for trojan detection.
  3. Insights into how trojan injection affects model complexity from an SLT perspective.
  4. Comparative analysis of LLC-based detection methods against existing trojan detection techniques.
  5. Potential identification of critical phases during training where trojans are most detectable via LLC analysis.

This research could provide valuable insights into the nature of trojan attacks from a model complexity perspective and potentially lead to new, more robust trojan detection methods. It may also contribute to a deeper understanding of how trojans affect the learning dynamics and internal structure of neural networks.

Where to begin:

If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.