PQuant: Streamlining ML Model Compression to Deployment for Next-Gen Detector Systems

October 9, 2025·
Arghya Ranjan Das
Arghya Ranjan Das
· 0 min read
Abstract
Real-time machine learning is emerging as a key tool for next-generation detector systems, where strict latency and hardware constraints require highly efficient models. We present PQuant, a backend-agnostic Python library designed to unify and streamline pruning and quantization techniques for hardware deployment, supporting both PyTorch and TensorFlow. PQuant provides unstructured pruning, structured pruning (PDP and ActivationPruning), N:M pruning (Wanda), and hardware-aware resource (DSP/BRAM) optimization via an MDMM framework, including pattern compression for convolutional kernels targeting FPGA/ASIC deployment. It also supports flexible quantization from fixed-point to high-granularity schemes with per-layer/per-weight bit control. Integration with hls4ml is ongoing, enabling compressed models to be deployed directly to FPGAs/ASICs. PQuant bridges compression methods with hardware resource optimization for low-latency ML in triggers, DAQ, and online reconstruction.
Event
CPAD 2025 at Penn — Parallel session talk (RDC 5 Trigger & DAQ Shared Session II)
Location

Inn at Penn, University of Pennsylvania — St Marks

Philadelphia, PA,

Arghya Ranjan Das
Authors
Ph.D. Student | LPC G&V Fellow

Hi, I am a Ph.D. student at Purdue University and cuuently based at Fermilab as LPC G&V Fellow, working on the CMS experiment. My current work focuses specifically on Di-Higgs searches, developing ML solution for real-time detector readout and Outer tracker upgrades.

I am also interested in Theoratical Astrophysics & Cosmology and High-energy physics.