Skip to content

Valentine is a Python package for capturing potential relationships among columns of different tabular datasets, given as pandas DataFrames. It implements several schema- and instance-based matching algorithms behind a single, uniform API, and ships with evaluation metrics so you can measure match quality against a ground truth.

Installation

pip install valentine

Requires Python >=3.10, <3.15.

A 30-second taste

import pandas as pd
from valentine import valentine_match
from valentine.algorithms import Coma

df1 = pd.read_csv("source_candidates.csv")
df2 = pd.read_csv("target_candidates.csv")

matches = valentine_match([df1, df2], Coma(use_instances=True))

for pair, score in matches.items():
    print(f"{pair.source_column} <-> {pair.target_column}: {score:.3f}")

Ready for more? Head over to Getting started, or jump straight to the API reference.

Research

Valentine started as a research project at Delft Data and is based on the ICDE 2021 paper. See the Research page for the papers behind the package, the algorithms it implements, and citation info.