In this paper, entitled "The Good, the Bad, and the Ugly: Neural Networks Straight from JPEG", we investigate whether the spatial resolution and JPEG quality affects the performance of CNNs fed with DCT coefficients. More specifically, we studied several aspects of a state-of-the-art CNN recently proposed by Gueguen et al. [1], which is a modified version of the ResNet-50 architecture [2]. Despite the speed-up obtained by partially decoding JPEG images, their architectural changes raised the computation complexity and the number of parameters of the network. To alleviate these drawbacks, we propose a Frequency Band Selection (FBS) technique to select the most relevant DCT coefficients before feeding them to the network. A comparison among the original ResNet-50 network [2], the modified ResNet-50 network proposed by Gueguen et al. [1], and our improved version with FBS is presented below.
Original ResNet-50 network [2]
ResNet-50 using DCT as input [1]...