Abstract:
Deep neural networks have achieved revolutionary results in several domains; nevertheless, they require extensive computational resources and memory footprint. Research has been conducted in the field of knowledge distillation, aiming to enhance the performance of smaller models by transferring knowledge from larger networks, which can be categorized into three main types: response-based, feature-based, and relation-based. Existing works explored using one or two knowledge types; however, we hypothesize that distilling all three knowledge types should lead to more comprehensive transfer of information and would improve the student's accuracy. In this paper, we propose ModReduce; a unified knowledge distillation framework that utilizes the three knowledge types using a combination of offline and online knowledge distillation. ModReduce is a generic distillation framework that utilizes state-of-the-art methods for each knowledge distillation type to learn a better student. As such, it can be updated with new state-of-the-art methods as they become available. During training, three student instances each learn a single knowledge type from the teacher using offline distillation before leveraging online distillation to teach each other what they learned; analogous to peer learning in real life where different students can excel in different parts of a subject they are learning from their teacher and then help each other learn the other parts. During inference, only the best performing student is used, so no additional inference costs are introduced. Extensive experimentation on 15 different Teacher-Student architectures demonstrated that ModReduce produces a student that outperforms state-of-the-art methods with an average relative improvement up to 48.29% without additional inference cost. Source code is available at https://github.com/Yahya-Abbas/ModReduce.