Label Noise

Robust traing against label nosie

The success of deep neural networks relies heavily on the quality of training data, and in particular accurate labels of the training examples. However, maintaining label quality becomes very expensive for large datasets, and hence mislabeled data points are ubiquitous in large real-world datasets. As deep neural networks have the capacity to essentially memorize any (even random) labeling of the data, noisy labels have a drastic effect on the generalization performance of deep neural networks. Therefore, it becomes crucial to develop methods with strong theoretical guarantees for robust training of neural networks against noisy labels. Such guarantees become of the utmost importance in safety-critical systems, such as aircraft, autonomous cars, and medical devices.

We develop principled techniques with strong theoretical guarantees for robust training of neural networks against noisy labels. We consider the effect of data, model, and pretraining on robustness against label noise.

Examples of Noisy Labels. Source: https://arxiv.org/pdf/1711.00583v1.pdf

Checkout the following papers to know more:

  1. UAI
    Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise
    Yihao Xue, Kyle Whitecross, and Baharan Mirzasoleiman
    Conference on Uncertainty in Artificial Intelligence (UAI), 2024
    Spotlight presentation
  2. Investigating why contrastive learning benefits robustness against label noise
    Yihao Xue, Kyle Whitecross, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2022
  3. Coresets for robust training of deep neural networks against noisy labels
    Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec
    Advances in Neural Information Processing Systems (NeurIPS), 2020