Article Source
 Title: Model Compression Papers
Model Compression Papers
Papers for neural network compression and acceleration. Partly based on link.
Survey

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks, [arxiv ‘18]

A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv ‘17]
Quantization
 The ZipML Framework for Training Models with EndtoEnd Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML’17]
 Compressing Deep Convolutional Networks using Vector Quantization [arXiv’14]
 Quantized Convolutional Neural Networks for Mobile Devices [CVPR ‘16]
 FixedPoint Performance Analysis of Recurrent Neural Networks [ICASSP’16]
 Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv’16]
 Lossaware Binarization of Deep Networks [ICLR’17]
 Towards the Limit of Network Quantization [ICLR’17]
 Deep Learning with Low Precision by Halfwave Gaussian Quantization [CVPR’17]
 ShiftCNN: Generalized LowPrecision Architecture for Inference of Convolutional Neural Networks [arXiv’17]
 Training and Inference with Integers in Deep Neural Networks [ICLR’18]
 Deep Learning with Limited Numerical Precision[ICML’2015]
 Model compression via distillation and quantization [ICLR ‘18]
 Apprentice: Using Knowledge Distillation Techniques To Improve LowPrecision Network Accuracy [ICLR ‘18]
 On the Universal Approximability of Quantized ReLU Neural Networks [arXiv ‘18]
 Quantization and Training of Neural Networks for Efficient IntegerArithmeticOnly Inference [CVPR ‘18]
Pruning
 Learning both Weights and Connections for Efficient Neural Networks [NIPS’15]
 Pruning Filters for Efficient ConvNets [ICLR’17]
 Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR’17]
 Soft WeightSharing for Neural Network Compression [ICLR’17]
 Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR’16]
 Dynamic Network Surgery for Efficient DNNs [NIPS’16]
 Designing EnergyEfficient Convolutional Neural Networks using EnergyAware Pruning [CVPR’17]
 ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV’17]
 To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR’18]
 DataDriven Sparse Structure Selection for Deep Neural Networks [arXiv ‘17]
 Learning Structured Sparsity in Deep Neural Networks [NIPS ‘16]
 Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism [ISCA ‘17]
 Channel Pruning for Accelerating Very Deep Neural Networks [ICCV ‘17]
 Learning Efficient Convolutional Networks through Network Slimming [ICCV ‘17]
 NISP: Pruning Networks using Neuron Importance Score Propagation [CVPR ‘18]
 Rethinking the SmallerNormLessInformative Assumption in Channel Pruning of Convolution Layers [ICLR ‘18]
 MorphNet: Fast & Simple ResourceConstrained Structure Learning of Deep Networks [arXiv ‘17]
 Efficient SparseWinograd Convolutional Neural Networks [ICLR ‘18]
 “LearningCompression” Algorithms for Neural Net Pruning [CVPR ‘18]
Binarized Neural Network
 Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or 1 [NIPS ‘16]
 XNORNet: ImageNet Classification Using Binary Convolutional Neural Networks [ECCV ‘16]
 Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration [CVPR ‘17]
Lowrank Approximation
 Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR’15]
 Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
 Convolutional neural networks with lowrank regularization [arXiv’15]
 Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS’14]
 Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR’16]
 High performance ultralowprecision convolutions on mobile devices [NIPS’17]
 Speeding up convolutional neural networks with low rank expansions
 Coordinating Filters for Faster Deep Neural Networks [ICCV ‘17]
Knowledge Distillation
 Dark knowledge
 FitNets: Hints for Thin Deep Nets [ICLR ‘15]
 Net2net: Accelerating learning via knowledge transfer [ICLR ‘16]
 Distilling the Knowledge in a Neural Network [NIPS ‘15]
 MobileID: Face Model Compression by Distilling Knowledge from Neurons [AAAI ‘16]
 DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer [arXiv ‘17]
 Deep Model Compression: Distilling Knowledge from Noisy Teachers [arXiv ‘16]
 Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer [ICLR ‘17]
 Like What You Like: Knowledge Distill via Neuron Selectivity Transfer [arXiv ‘17]
 Learning Efficient Object Detection Models with Knowledge Distillation [NIPS ‘17]
 DataFree Knowledge Distillation For Deep Neural Networks [NIPS ‘17]
 A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learnin [CVPR ‘17]
 Moonshine: Distilling with Cheap Convolutions [arXiv ‘17]
 Model compression via distillation and quantization [ICLR ‘18]
 Apprentice: Using Knowledge Distillation Techniques To Improve LowPrecision Network Accuracy [ICLR ‘18]