This repository accompanies our research paper on Explainable AI for maps using Vision Language Models (VLMs) which you can find here: https://ica-adv.copernicus.org/articles/5/12/2025/. It focuses on interpreting the output of VLMs to enable researches to better understand the decision-making processes of VLMs.
Cartographic maps have become increasingly accessible to diverse communities over the past decades. However, accessibility for blind and visually impaired users remains a challenge. This project investigates the use of VLMs to generate map descriptions and applies Shapley-based Explainable AI methods to analyze and interpret the outputs of these models.
- Explainability through XAI: Using Shapley Additive Explanations to understand how specific map regions contribute to generated text outputs.
- Insights for Accessibility: Highlighting the impact of map scale and labeling on the quality of generated descriptions. Proposing selective masking strategies for deeper insights.
root
├── map_images # Sample maps that were used for the paper
├── results # Final explanations and masked images (used in the paper)
├── CartoXAI.ipynb # Code used for VLM captioning, masking, and XAI analysis
└── README.md # Project overview and instructions
To run the code in this repository, you'll need:
- Python 3.8+
- PyTorch
- SHAP (for Explainable AI computations)
- Hugging Face Transformers library
Install the dependencies using:
pip install -r requirements.txt-
Prepare the Data:
- Place your map images in the
map_images/directory. - Ensure images are named appropriately (e.g.,
london_with_labels.png,vancouver_no_labels.png).
- Place your map images in the
-
Run Caption Generation:
- Use the provided script to generate captions for both original and masked images.
-
Compute Shapley Values:
- Analyze the contribution of specific map regions to the generated captions.
- Visualize the results.
London, Scale 1:50,000, with labels
VLM output: "The image is a detailed map of London, England, showing the city’s streets, landmarks, and surrounding areas."
London, Scale 1:50,000, without labels
VLM output: "The image is a detailed map of London, England, showing the city’s streets, landmarks, and surrounding areas."
Vancouver, Scale 1:50,000, with labels
VLM output: "The image is a detailed map of Vancouver, Canada, showing streets, landmarks, and a body of water."
Vancouver, Scale 1:50,000, without labels
VLM output: "The image is a detailed map of a city, showing streets, parks, and a body of water."
London including labels:
Highlighting the corresponding area in the original image:
London excluding labels:
Overview of the area compared to the original image:
Vancouver including labels:
Full visualizations can be found in the results/ directory.
- Selective Masking Strategies: Replacing grid-based masking with targeted masking of cartographic elements (e.g., labels, rivers) for deeper insights.
- Improving VLM Outputs: Investigating methods to enhance the detail and accuracy of descriptions generated by VLMs.
If you use this repository, please cite our work:
@Article{ica-adv-5-12-2025,
AUTHOR = {Dinga, G. T. and Schiewe, J.},
TITLE = {What do you see? An XAI approach for VLM-generated map descriptions},
JOURNAL = {Advances in Cartography and GIScience of the ICA},
VOLUME = {5},
YEAR = {2025},
PAGES = {12},
URL = {https://ica-adv.copernicus.org/articles/5/12/2025/},
DOI = {10.5194/ica-adv-5-12-2025}
}This repository is licensed under the MIT License. See LICENSE for more details.
We thank the open-source communities behind SHAP, Hugging Face, and CARTO Basemaps for their tools and datasets.
- Parts of the code are directly taken from SHAP
- Look up especially: SHAP Image Captioning
- Carto Basemaps
- We are currently working on a web-based system where users can upload their own images to receive explanation plots and masked images without having to set up everything locally.
- Prepare a docker container for better dependency management.




