Stop Thinking, Just Do!

Sungsoo Kim's Blog

Relational Visual Similarity

tagsTags

16 December 2025


Relational Visual Similarity (Dec 2025)

Abstract

Summary:

This paper introduces “relational visual similarity,” a novel framework for measuring image similarity based on underlying logical structures, functions, or interactions rather than surface-level visual attributes like color or shape. While existing models (e.g., CLIP, LPIPS) excel at perceptual similarity, they fail to capture abstract analogies (e.g., recognizing that the layers of a peach correspond to the layers of the Earth). To address this, the authors curate a dataset of 114k images paired with “anonymous captions”—descriptions that abstract away specific object identities to focus on relational patterns. They finetune a Vision-Language Model (Qwen2.5-VL) on this dataset to create a new metric, relsim. The study demonstrates that relsim significantly outperforms existing baselines in capturing relational logic for tasks such as image retrieval and analogical image generation.

Key Topics:

  • Relational Visual Similarity
  • Analogical Reasoning
  • Vision-Language Models
  • Anonymous Captioning
  • Image Retrieval
  • Visual Abstraction
  • Synthetic Dataset Generation