In this paper, we investigate strategies to accelerate CNNs designed for the JPEG compressed domain. The starting point for our study is a state-of-the-art CNN proposed by Gueguen et al. [1], which is a modified version of the ResNet-50 [2]. However, the changes introduced by Gueguen et al. [1] in the ResNet-50 raised its computational complexity and number of parameters. To alleviate these drawbacks, Santos et al. [3] proposed to feed the network with the lowest frequency DCT coefficients, thus losing image details irretrievably. Here, we explore smart strategies to reduce the network computation complexity without sacrificing rich information provided by the DCT coefficients.

We conducted experiments on the ImageNet dataset, both in a subset and in the whole. Our results on both datasets showed that learning how to combine all DCT inputs in a data-driven fashion is better than discarding them by hand. Also, we found that skipping some stages of the network is beneficial, proving to be effective for reducing the computational complexity while retaining accuracy.

This paper will be presented at the virtual 25th Iberoamerican Congress on Pattern Recognition (CIARP'21), 10-13 May!

[1] L. Gueguen, A. Sergeev, B. Kadlec, R. Liu, and J. Yosin-ski,  "Faster Neural Networks Straight from JPEG,"  in Annual Conference on Neural Information Processing Systems (NIPS'18), 2018, pp. 3937-3948.  

[2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’16), 2016, pp. 770-778.

[3] S. F. Santos, N. Sebe, and J. Almeida. "The Good, the Bad, and the Ugly: Neural Networks Straight from JPEG", in IEEE International Conference on Image Processing (ICIP'20), 2020, pp. 1896-1900.

S. F. Santos and J. Almeida. "Less is More: Accelerating Faster Neural Networks Straight from JPEG", in 25th Iberoamerican Congress on Pattern Recognition (CIARP'21), 2021, pp. 1-10.