Publications

For the complete list, please see my Google Scholar Profile.

Preprints

  1. ArXiv
    Mini-batch Coresets for Memory-efficient Training of Large Language Models
    arXiv preprint arXiv:2407.19580, Preprints
  2. ArXiv
    Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
    arXiv preprint arXiv:2410.02116, Preprints
  3. ArXiv
    Towards Mitigating Spurious Correlations in the Wild: A Benchmark & a more Realistic Dataset
    arXiv preprint arXiv:2306.11957, Preprints

2024

  1. Make the Most of Your Data: Changing the Training Data Distribution to Improve In-distribution Generalization Performance
    Dang Nguyen, Paymon Haddad, Eric Gan, and Baharan Mirzasoleiman
    Advances in Neural Information Processing Systems (NeurIPS), 2024
  2. SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
    Yu Yang, Siddhartha Mishra, Jeffery N. Chiang, and Baharan Mirzasoleiman
    Advances in Neural Information Processing Systems (NeurIPS), 2024
  3. Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
    Wenhan Yang., Jingdong Gao, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2024
  4. Few-shot Adaption to Distribution Shifts By Mixing Source and Target Embeddings
    Yihao Xue, Ali Payani, Yu Yang, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2024
  5. NeWRF: A Deep Learning Framework for Wireless Radiation Field Reconstruction and Channel Prediction
    Haofan Lu, Christopher Vattheuer, Baharan Mirzasoleiman, and Omid Abari
    International Conference on Machine Learning (ICML), 2024
  6. UAI
    Graph Contrastive Learning under Heterophily via Graph Filters
    Conference on Uncertainty in Artificial Intelligence (UAI), 2024
  7. UAI
    Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise
    Yihao Xue, Kyle Whitecross, and Baharan Mirzasoleiman
    Conference on Uncertainty in Artificial Intelligence (UAI), 2024
    Spotlight presentation
  8. Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity
    Siddharth Joshi, Arnav Jain, Ali Payani, and Baharan Mirzasoleiman
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
  9. Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
    Yu Yang, Eric Gan, Gintare Karolina Dziugaite, and Baharan Mirzasoleiman
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
  10. ICLR
    Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
    International Conference on Learning Representations (ICLR), 2024
  11. ICLR
    Investigating the Benefits of Projection Head for Representation Learning
    Yihao Xue, Eric Gan, Jiayi Ni, Siddharth Joshi, and Baharan Mirzasoleiman
    International Conference on Learning Representations (ICLR), 2024
  12. ICLR
    Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality
    Xuxi Chen*, Yu Yang*, Zhangyang Wang, and Baharan Mirzasoleiman
    International Conference on Learning Representations (ICLR), 2024

2023

  1. Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
    Wenhan Yang., Jingdong Gao, and Baharan Mirzasoleiman
    Advances in Neural Information Processing Systems (NeurIPS), 2023
  2. Robust Learning with Progressive Data Expansion Against Spurious Correlation
    Yihe Deng*, Yu Yang*Baharan Mirzasoleiman, and Quanquan Gu
    Advances in Neural Information Processing Systems (NeurIPS), 2023
  3. J. Affect. Disord.
    Sleep, Brain Systems, and Persistent Stress in Early Adolescents During COVID-19: Insights from the ABCD Study
    Orsolya Kiss, Zihan Qu, Eva M. Müller-Oehring, Fiona C. Baker, and Baharan Mirzasoleiman
    Journal of Affective Disorders, 2023
  4. Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning
    Yu Yang, Besmira Nushi, Hamid Palangi, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2023
  5. Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least
    International Conference on Machine Learning (ICML), 2023
  6. Which Features are Learned by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression
    Yihao XueSiddharth Joshi, Eric Gan, Pin-Yu Chen, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2023
    Oral presentation (top 2%)
  7. Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
    Yu Yang, Hao Kang, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2023
  8. HotStorage
    NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training
    Neha Prakriya, Yu YangBaharan Mirzasoleiman, Cho-Jui Hsieh, and Jason Cong
    ACM Workshop on Hot Topics in Storage and File Systems (HotStorage), 2023
  9. High Probability Bounds for Stochastic Continuous Submodular Maximization
    Evan Becker, Jingdong Gao, Ted Zadouri, and Baharan Mirzasoleiman
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
  10. ICDH
    A Self-supervised Framework for Improved Data-Driven Monitoring of Stress via Multi-modal Passive Sensing
    Shayan Fazeli, Lionel Levine, Mehrab Beikzadeh, Baharan Mirzasoleiman, Bita Zadeh, Tara Peris, and Majid Sarrafzadeh
    IEEE Conference on Digital Health (ICDH), 2023
  11. TKDE
    On the fairness of time-critical influence maximization in social networks
    Junaid Ali, Mahmoudreza Babaei, Abhijnan Chakraborty, Baharan Mirzasoleiman, Krishna Gummadi, and Adish Singla
    IEEE Transactions on Knowledge and Data Engineering (TKDE), 2023

2022

  1. Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attack
    Tian Yu Liu, Yu Yang, and Baharan Mirzasoleiman
    Advances in Neural Information Processing Systems (NeurIPS), 2022
  2. Data-Efficient Augmentation for Training Neural Networks
    Tian Yu Liu, and Baharan Mirzasoleiman
    Advances in Neural Information Processing Systems (NeurIPS), 2022
  3. Not all poisons are created equal: Robust training against data poisoning
    Yu Yang, Tian Yu Liu, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2022
    Oral presentation (top 2%)
  4. Adaptive second order coresets for data-efficient machine learning
    Omead Pooladzandi, David Davini, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2022
  5. Investigating why contrastive learning benefits robustness against label noise
    Yihao Xue, Kyle Whitecross, and Baharan Mirzasoleiman
    International Conference on Machine Learning (ICML), 2022
  6. Syn.Data4ML
    Generating High Fidelity Synthetic Data via Coreset selection and Entropic Regularization
    Omead Pooladzandi, Pasha Khosravi, Erik Nijkamp, and Baharan Mirzasoleiman
    Neurips SyntheticData4ML Workshop, 2022
  7. BIBM
    Passive Monitoring of Physiological Precursors of Stress Leveraging Smartwatch Data
    Shayan Fazeli, Lionel Levine, Mehrab Beikzadeh, Baharan Mirzasoleiman, Bita Zadeh, Tara Peris, and Majid Sarrafzadeh
    IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2022
  8. EAAMO
    Towards Balanced Information Propagation in Social Media
    Mahmoudreza Babaei, Baharan Mirzasoleiman, Jungseock Joo, and Adrian Weller
    ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2022
  9. CompBio
    Purification of single-cell transcriptomics data with coreset selection
    Róbert Pálovics, Tony Wyss-Coray, and Baharan Mirzasoleiman
    ICML Workshop on Computational Biology (CompBio), 2022
  10. TempWeb
    Analytical Models for Motifs in Temporal Networks
    Alexandra Porter, Baharan Mirzasoleiman, and Jure Leskovec
    Temporal Web Analytics Workshop (TempWeb), 2022
  11. SNN
    Low Rank Pruning via Output Perturbation
    Sparsity in Neural Networks Workshop (SNN), 2022
  12. Crosswalk: Fairness-enhanced node representation learning
    Ahmad Khajehnejad, Moein Khajehnejad, Mahmoudreza Babaei, Krishna P Gummadi, Adrian Weller, and Baharan Mirzasoleiman
    AAAI Conference on Artificial Intelligence (AAAI), 2022
  13. ICDE
    On the fairness of time-critical influence maximization in social networks
    Junaid Ali, Mahmoudreza Babaei, Abhijnan Chakraborty, Baharan Mirzasoleiman, Krishna Gummadi, and Adish Singla
    IEEE International Conference on Data Engineering (ICDE), 2022

2020

  1. UAI
    Coresets for estimating means and mean square error with limited greedy samples
    Saeed Vahidian, Baharan Mirzasoleiman, and Alexander Cloninger
    Conference on Uncertainty in Artificial Intelligence (UAI), 2020
  2. Coresets for data-efficient training of machine learning models
    Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec
    International Conference on Machine Learning (ICML), 2020
  3. Coresets for robust training of deep neural networks against noisy labels
    Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec
    Advances in Neural Information Processing Systems (NeurIPS), 2020
  4. ICLR
    Selection via Proxy: Efficient Data Selection for Deep Learning
    Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia
    International Conference on Learning Representations (ICLR), 2020

2018

  1. Streaming non-monotone submodular maximization: Personalized video summarization on the fly
    Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause
    AAAI Conference on Artificial Intelligence (AAAI), 2018
  2. Dynamic network model from partial observations
    Elahe Ghalebi, Baharan Mirzasoleiman, Radu Grosu, and Jure Leskovec
    Advances in Neural Information Processing Systems (NeurIPS), 2018
    Spotlight presentation (top 3%)

2017

  1. Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”
    Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause
    International Conference on Machine Learning (ICML), 2017
  2. Guaranteed non-convex optimization: Submodular maximization over continuous domains
    Andrew An Bian, Baharan Mirzasoleiman, Joachim Buhmann, and Andreas Krause
    Artificial Intelligence and Statistics (AISTATS), 2017

2016

  1. Learning sparse combinatorial representations via two-stage submodular maximization
    Eric Balkanski*, Baharan Mirzasoleiman*, Andreas Krause, and Yaron Singer
    International Conference on Machine Learning (ICML), 2016
  2. Fast constrained submodular maximization: Personalized data summarization
    Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi
    International Conference on Machine Learning (ICML), 2016
  3. Fast distributed submodular cover: Public-private data summarization
    Baharan Mirzasoleiman, Morteza Zadimoghaddam, and Amin Karbasi
    Advances in Neural Information Processing Systems (NeurIPS), 2016
  4. JMLR
    Distributed submodular maximization
    Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause
    The Journal of Machine Learning Research (JMLR), 2016

2015

  1. Lazier than lazy greedy
    Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrák, and Andreas Krause
    AAAI Conference on Artificial Intelligence (AAAI), 2015
  2. Distributed submodular cover: Succinctly summarizing massive data
    Baharan Mirzasoleiman, Amin Karbasi, Ashwinkumar Badanidiyuru, and Andreas Krause
    Advances in Neural Information Processing Systems (NeurIPS), 2015
    Spotlight presentation (top 4%)

2014

  1. KDD
    Streaming submodular maximization: Massive data summarization on the fly
    Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause
    ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 2014
  2. NetSciCom
    Modeling the impact of user awareness on immunization strategies
    Baharan Mirzasoleiman, Hamid R Rabiee, and Mostafa Salehi
    IEEE International Workshop on Network Science for Communication Networks (NetSciCom), 2014

2013

  1. SNAM
    Revenue maximization in social networks through discounting
    Mahmoudreza Babaei, Baharan Mirzasoleiman, Mahdi Jalili, and Mohammad Ali Safari
    Social Network Analysis and Mining (SNAM), 2013
  2. Distributed submodular maximization: Identifying representative elements in massive data
    Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause
    Advances in Neural Information Processing Systems (NeurIPS), 2013

2012

  1. Europhys.Lett.
    Immunizing complex networks with limited budget
    Baharan Mirzasoleiman, Mahmoudreza Babaei, and Mahdi Jalili
    Europhysics Letters, 2012

2011

  1. Phys.Rev.E
    Cascaded failures in weighted networks
    Baharan Mirzasoleiman, Mahmoudreza Babaei, Mahdi Jalili, and MohammadAli Safari
    Physical Review E, 2011
  2. PLoS
    Failure tolerance of motif structure in biological networks
    Baharan Mirzasoleiman, and Mahdi Jalili
    PLoS One, 2011
  3. ICC
    Reuse-Attack Mitigation in Wireless Sensor Networks
    Hossein Shafiei, Ahmad Khonsari, Baharan Mirzasoleiman, and Mohammad Ould-Khaoua
    IEEE International Conference on Communications (ICC), 2011
    Best paper award runner up

2009

  1. ISPA
    Utility proportional optimization flow control for overlay multicast
    Ali Jafari, Hosein Shafiei, Baharan Mirzasoleiman, and Ghodrat Sepidnam
    IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2009

Thesis

  1. Thesis
    Big data summarization using submodular functions
    ETH Zurich, 2017, Thesis