Deep convolutional neural networks, features, and categories perform similarly at explaining primate high-level visual representations

Abstract:

Deep convolutional neural networks (DNNs) are currently the best computational model for explaining image representations across the visual cortical hierarchy. However, it is unclear how the representations in DNNs relate to those of simpler “oracle” models of features and categories. We obtained DNN (AlexNet) representations for a set of 92 real-world object images. Human observers generated category and feature labels for the images. Category labels included subordinate, basic and superordinate categories; feature labels included object parts, colors, textures, and contours. We used the AlexNet representations and labels to explain brain representations of the images, measured with fMRI in humans and cell recordings in monkeys. For both human and monkey inferior temporal (IT) cortex, late AlexNet layers perform similarly to basic categories and object parts. Furthermore, late AlexNet layers can account for more than half of the variance that these labels explain in IT. Finally, while feature and category models predominantly explain image representations in high-level visual cortex, AlexNet layers explain representations across the entire visual cortical hierarchy. DNNs may provide a computationally explicit model of how features and categories are computed by the brain.