Relational Visual Similarity (Dec 2025)
Abstract
- Title: Relational Visual Similarity (Dec 2025)
- Link: http://arxiv.org/abs/2512.07833v1
- Date: December 2025
Summary:
This paper introduces “relational visual similarity,” a novel framework for measuring image similarity based on underlying logical structures, functions, or interactions rather than surface-level visual attributes like color or shape. While existing models (e.g., CLIP, LPIPS) excel at perceptual similarity, they fail to capture abstract analogies (e.g., recognizing that the layers of a peach correspond to the layers of the Earth). To address this, the authors curate a dataset of 114k images paired with “anonymous captions”—descriptions that abstract away specific object identities to focus on relational patterns. They finetune a Vision-Language Model (Qwen2.5-VL) on this dataset to create a new metric, relsim. The study demonstrates that relsim significantly outperforms existing baselines in capturing relational logic for tasks such as image retrieval and analogical image generation.
Key Topics:
- Relational Visual Similarity
- Analogical Reasoning
- Vision-Language Models
- Anonymous Captioning
- Image Retrieval
- Visual Abstraction
- Synthetic Dataset Generation