Towards Robust Deep Learning on GPUs

This is a project home page of NSF Collaborative Research: SHF: Small:: Torwards Robust Deep Learning Computing on GPUs (CU Boulder site, WVU site)

Project Managers

Hyeran Jeon (University of California Merced) - Lead PI

Tamara Lehman (University of Colorado Boulder)

Nima Karimian (West Virginia University)

Graduate Students

Mujahid Al Rafi (UC Merced)

Yuan Feng (UC Merced)

Ange Thierry Ishimwe (CU Boulder)

Banafsheh Adami (WVU)

Ehsan Bahaloo Horeh (WVU)

Mohammadreza Hosseinzadehketilateh (WVU)

Niloofarsadat Alizadehhosseini (WVU)

Undergraduate Students

Aishwaria Rangasamy (UC Merced) - graduated Sp'23

Xavier Ybarra (UC Merced) - graduated Sp'23

Alexander Juenemann (CU Boulder)

Tucker Travins (CU Boulder)

Sam McDiarmid Sterling (CU Boulder)

Ishita Mehta (CU Boulder)

Research Scientist

Ryan Zalek (NVIDIA)

Goals and Achievements

Graphics processing units (GPU) have become one of the most promising computing engines in many application domains such as scientific simulations and deep learning. With the massive parallel processing power provided by GPUs, most of the state-of-the-art server and edge systems employ GPUs as the core computing engines for deep-learning model training and inference. As the performance of deep learning models becomes one of the most important delimiters that determines market revenue of the model creators and the convenience of daily lives of model consumers, it is critical to enforce reliable and robust deep-learning computation. This project aims to explore the challenges and opportunities to address the reliability and privacy implications of GPU computing as a deep-learning accelerator and design lightweight protection schemes.

The technical aims of this project are divided into three thrusts; 1) exploration of vulnerabilities and their impact on GPU-based deep-learning computing, 2) tackling the vulnerabilities at the compute-unit level by redesigning GPU building blocks, and 3) designing selective integrity protection mechanisms without imposing significant performance overhead.

Research Outcomes:

Dong Xu, Yuan Feng*, Kwangsik Shin, Daewoo Kim, Hyeran Jeon, and Dong Li, "Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link," The 36th ACM/IEEE International Conference for High-Performance Computing, Performance Measurement, Modeling and Tools (SC), Atlanta, GA, Nov 2024
Ishita Mehta, "Evaluating Rowhammer Impact on Neural Network Accuracy,” PACT SRC Presentation, 2nd Place, 2024.
Yuan Feng*, Seonjin Na, Hyesoon Kim, and Hyeran Jeon, "Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs," The 51st International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, June 2024
M. Hosseinzadehketilateh, S. Tehranipoor, and N. Karimian, "Human-to-Device (H2D) Authentication Using Biometrics," Silicon Valley Cybersecurity Conference (SVCC), Seoul, Korea, Republic of, 2024
K. Hasan, S. Tehranipoor, N. Karimian, and S. Vasudeva, "NAND Flash-Based Digital Fingerprinting for Robust and Secure Hardware Authentication," 25th International Symposium on Quality Electronic Design (ISQED), San Francisco, CA, USA, 2024
Ange Ishimwe, Zack Mckevitt, Sam Mcdiarmid Sterling and Tamara Silbergleit Lehman, “SMAD: Efficiently Defending Against Transient Execution Attacks,” (under review) Transactions on Architecture and Code Optimization (TACO)
Mujahid Al Rafi, Yuan Feng, Fan Yao, Meng Tang, and Hyeran Jeon, "Decepticon: Understanding Vulnerabilities of Transformers, IEEE International Symposium on Workload Characterization (IISWC), Ghent, Belgium, Oct 2023
Luanzheng Guo, Jay Lofstead, Jie Ren, Ignacio Laguna, Gokcen Kestor, Line Pouchard, Dossay Oryspayev, and Hyeran Jeon, "Understanding System Resilience for Converged Computing of Cloud, Edge, and HPC," Workshop on Converged Computing to be co-located with ISC'23 (WOCC), Hamburg, Germany, May 2023
Yuan Feng and Hyeran Jeon, "Understanding Scalability of Multi-GPU Systems," ACM Workshop on General Purpose GPUs (GPGPU), Montreal, Canada, Feb 2023
S. Tehranipoor, N. Karimian, and J. Edmonds, "Breaking AES-128: Machine Learning-Based SCA Under Different Scenarios and Devices," IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, 2023
Ange Thierry Ishimwe and Tamara Silbergleit Lehman, “SMAD: Efficiently Defending Against Transient Execution Attacks,” Poster presentation at Young Architect (YArch) 2023
Mujahid Al Rafi, Yuan Feng, and Hyeran Jeon, "Too Noisy To Extract: Pitfalls of Model Extraction Attacks," Workshop on Negative results, Opportunities, Perspectives, and Experiences In conjunction with ASPLOS-27 (NOPE), Feb 2022
Mujahid Al Rafi, Yuan Feng, and Hyeran Jeon, "Revealing Secrets From Pre-trained Models," arXiv Preprint, July 2022

Educational Activities

UC Merced
- A new course EECS242 (Advanced Topics in Computer Architecture) was created and the state-of-the-art research on "Security and Reliability" and "GPU and Accelerators" were covered in Fall 2022
- Topics of "Security and Reliability" and "GPU and Accelerators" are newly added in EECS253 (Computer Architecture and Design) in Fall 2021
- With supplemental REU support, two undergrad student interns worked on secure GPU computing for large language models in Summer 2022. The interns reviewed various literature studies, examined the weight value differences between pre-trained and fine-tuned models of several Transformer models, and developed a software tool that verifies our proposed ideas.
- A high-school summer intern was trained in Summer 2023. The intern studied GPU architecture and programming model, tested several GPU applications, and analyzed the performance differences between CPU and GPU computing.
University of Colorado Boulder
- An REU student and a PhD student have been recruited. The students are exploring impact of rowhammer attack on DNN computing.
- "Secure Deep Learning Computing on GPUs - Analysis on Rowhammer Implementation," presented at CU Boulder SPUR Final Presentation Workshop.
West Virginia University
- A new course CPE 593B (Hardware Security and Trust) has been offered for graduate students at WVU in spring 2023. The curse covered state of art hardware based attacks (side channel) on GPU and deep learning.
- Machine Learning Security and Privacy (summer 2024) - covers in-depth exploration of the security and privacy challenges in modern machine learning systems. It covers both foundational and cutting-edge topics, equipping students with the knowledge and skills to understand, analyze, and defend against various adversarial threats in machine learning, as well as the ethical considerations related to fairness and bias

Page updated

Google Sites

Report abuse