Psychological Experiments on CLIP: Could a Machine Learning Model Capture Human Cognition?

  • Phuc (Jerry) Ngo
    Corban Swain

Presentation author(s)

Phuc (Jerry) Ngo’23

Majors: Computer Science; Math


Color and shape have a great impact on human behavior. They influence our decisions and interpretation, which affects a wide range of activities such as perceiving the environment around us, our buying habits or the way we communicate. Recent development in machine learning has enabled multimodal capabilities in many models.

Specifically, after training with a lot of raw data, Contrastive Language-Image Pre-training (CLIP) model has been shown to possess multimodal neurons which recognize different forms and abstractions of the same concept. In this work, we show that CLIP could also capture human cognitive concepts like color and shape symbolism. We determine the effect by measuring CLIP’s response to varied input stimuli in the form of images and shapes. The results we have obtained show high correlations with similar psycho-visual experiments from the cognitive science literature.


Mehmet Dik, Phillip Isola (MIT)

This site uses cookies to improve your experience. Read our Web Privacy Policy for more information.

Got it! ×