PQuant: Streamlining ML Model Compression to Deployment for Next-Gen Detector Systems

October 9, 2025·

Arghya Ranjan Das

· 0 min read

CPAD 2025 Repository MDMM branch

Abstract

Real-time machine learning is emerging as a key tool for next-generation detector systems, where strict latency and hardware constraints require highly efficient models. We present PQuant, a backend-agnostic Python library designed to unify and streamline pruning and quantization techniques for hardware deployment, supporting both PyTorch and TensorFlow. PQuant provides unstructured pruning, structured pruning (PDP and ActivationPruning), N:M pruning (Wanda), and hardware-aware resource (DSP/BRAM) optimization via an MDMM framework, including pattern compression for convolutional kernels targeting FPGA/ASIC deployment. It also supports flexible quantization from fixed-point to high-granularity schemes with per-layer/per-weight bit control. Integration with hls4ml is ongoing, enabling compressed models to be deployed directly to FPGAs/ASICs. PQuant bridges compression methods with hardware resource optimization for low-latency ML in triggers, DAQ, and online reconstruction.

Event

CPAD 2025 at Penn — Parallel session talk (RDC 5 Trigger & DAQ Shared Session II)

Location

Inn at Penn, University of Pennsylvania — St Marks

Philadelphia, PA,

Last updated on October 9, 2025

Trigger & DAQ Edge AI Pruning Quantization Hls4ml FPGA ASIC

Authors

Arghya Ranjan Das (he/him)

Ph.D. Student | LPC G&V Fellow

Hi, I am a Ph.D. student at Purdue University and currently based at Fermilab as an LPC G&V Fellow, working on the CMS experiment. My current work focuses on Di-Higgs searches, machine-learning solutions for real-time detector readout, and Outer Tracker upgrades.

I am also interested in theoretical astrophysics, cosmology, and high-energy physics.

← HL-LHC Regression Model Deep Dive November 13, 2025

No results found

PQuant: Streamlining ML Model Compression to Deployment for Next-Gen Detector Systems