About This Demo

This demo showcases a Visual Retrieval-Augmented Generation (RAG) application over PDFs using ColPali embeddings in Vespa, built entirely in Python, using FastHTML. The code is fully open source.

Resources

Architecture Overview

Vespa Application: Vespa Application that handles indexing, search, ranking and queries, leveraging features like phased ranking and multivector MaxSim calculations.
Frontend: Built with FastHTML, offering a professional and responsive user interface without the complexity of separate frontend frameworks.
Backend: Also built with FastHTML. Handles query embedding inference using ColPali, serves static files, and is responsible for orchestrating interactions between Vespa and the frontend.
Gemini API: VLM for the AI response, providing responses based on the top results from Vespa.

User Experience Highlights

Fast and Responsive: Optimized for quick loading times, with phased content delivery to display essential information immediately while loading detailed data in the background.
Similarity Maps: Provides visual highlights of the most relevant parts of a page in response to a query, enhancing interpretability.
Type-Ahead Suggestions: Offers query suggestions to assist users in formulating effective searches.

Dataset

The dataset used in this demo is retrieved from reports published by the Norwegian Government Pension Fund Global. It contains 6,992 pages from 116 PDF reports (2000–2024). The information is often presented in visual formats, making it an ideal dataset for visual retrieval applications.