Cross-Domain Spatial Co-Attention Network for Sketch-Based Image Retrieval

Lingzhi Yu; Xifan Zhang

doi:10.3788/LOP202259.2215009

Abstract

Sketch-based image retrieval uses hand-drawn sketches as input to retrieve corresponding natural images, allowing users to draw and find desired natural images when no accurate query images are available. Edge maps are commonly used as an intermediate modality to bridge the domain gap between sketches and natural images. However, existing methods ignore the inherent relationship between edge maps and natural images. Based on the assumption that natural images and their corresponding edge maps have similar key regions, this paper proposes a deep learning model based on a cross-domain spatial co-attention network. The proposed model derives the shared spatial attention mask from the fused feature of the edge map and natural image, and it combines the loss function and auxiliary classifier for end-to-end training. When compared with existing representative sketch-based image retrieval methods, the proposed method can effectively extract the features of sketches and natural images, with mean average precision (mAP) values of 0.933 and 0.799 on the Sketchy and TU-Berlin datasets, respectively, outperforming most representative methods.

微信扫一扫：分享

微信扫一扫：分享