(1.School of Advanced Manufacturing,, Fuzhou University,, Quanzhou 362000, China,; 2.Quanzhou Institute of Equipment Manufacturing,,Haixi Institutes Chinese Academy of Sciences,Quanzhou 362000,, China)
Abstract: In order to minimize the accuracy loss when compressing huge deep learning models and deploying them to devices with limited computing power and storage capacity, a knowledge distillation model compression method is investigated and an improved multi-teacher model knowledge distillation compression algorithm with filtering is proposed. Taking advantage of the integration of multi-teacher models, the better-performing teacher models are screened for student instruction using the predicted cross-entropy of each teacher model as the quantitative criterion for screening, and the student models are allowed to extract information starting from the feature layer of the teacher models, while the better-performing teacher models are allowed to have more say in the instruction. The experimental results of classification models such as VGG13 on the CIFAR100 dataset show that the multi-teacher model compression method in this paper has better performance in terms of accuracy compared with other compression algorithms with the same size of the final obtained student models.
Key words : model compression,;distillation of knowledge;multi-teacher model,;cross entropy,;feature layer