Changelog & migration¶
This page tracks user-visible changes to Valentine and explains how to port code between releases. The format is based on Keep a Changelog and the project follows Semantic Versioning. For the full commit history, see GitHub releases.
Maintainers: how to update this page
When preparing a release, move the contents of the
Unreleased section below into a new versioned heading
(## vX.Y.Z โ YYYY-MM-DD) and reset the Unreleased sub-sections
to empty. Keep sub-section order consistent:
Added ยท Changed ยท Deprecated ยท Removed ยท Fixed ยท Security.
Unreleased¶
Added¶
- Nothing yet.
Changed¶
- Nothing yet.
Deprecated¶
- Nothing yet.
Removed¶
- Nothing yet.
Fixed¶
- Nothing yet.
Security¶
- Nothing yet.
v1.0.0 โ API redesign¶
v1.0.0 is a significant redesign of Valentine's public API. If you are coming from 0.5.x or earlier, the changes below will affect your code.
Added¶
ColumnPairNamedTuplewith explicitsource_table,source_column,target_table,target_columnfields โ replacing the previous nested-tuple match keys.- Sub-matcher score breakdowns exposed via
MatcherResults.detailsandget_details(pair). Currently populated byComa. - Ground-truth input accepts table-aware
ColumnPairinstances in addition to column-name pairs โ see Evaluation metrics. - Top-level
instance_sample_sizeparameter onvalentine_match(default1000) for controlling instance sampling without constructing a customDataframeTable. - Predefined metric sets:
METRICS_ALL,METRICS_PRECISION_RECALL, andMETRICS_PRECISION_INCREASING_Nalongside the existingMETRICS_COREโ see Predefined metric sets. - Full documentation site with matcher guide, API reference, and migration notes.
Changed¶
- Unified top-level match API. A single
valentine_matchnow accepts any iterable of DataFrames (list, tuple, generator), replacing the previousvalentine_match/valentine_match_batchpair. - Immutable
MatcherResults. The result object is now aMapping, not adictsubclass. Derived views (e.g.one_to_one()) are cached and cannot be silently invalidated. Comais now a pure-Python implementation of COMA 3.0 โ no JVM dependency. Constructor signature updated tomax_n,use_instances,use_schema,delta,threshold.METRICS_ALLis now an explicit set rather than a dynamic scan ofMetric.__subclasses__(), so user-defined metrics no longer bleed into the predefined set.- Parameter validation happens at matcher construction time: invalid
thresholds, negative counts, or mutually-exclusive flags raise
ValueErrorimmediately rather than failing mid-match.
Deprecated¶
NotAValentineMatcheris kept as an alias forInvalidMatcherErrorbut will be removed in a future release. Updateexceptclauses to use the new name.
Removed¶
valentine_match_batchโ usevalentine_matchwith an iterable instead.- The Java-backed COMA wrapper and its JVM dependency.
- Mutable
dictsemantics on match results (__setitem__,update,pop, โฆ).
Migrating from 0.5.x¶
1. valentine_match_batch is gone¶
Before (0.5.x):
from valentine import valentine_match, valentine_match_batch
matches = valentine_match(df1, df2, matcher) # two DataFrames
matches = valentine_match_batch([df1, df2, df3], matcher) # many DataFrames
After (1.0):
from valentine import valentine_match
matches = valentine_match([df1, df2], matcher) # any iterable
matches = valentine_match([df1, df2, df3], matcher)
valentine_match now accepts any iterable of
DataFrames; pairs, lists, tuples, and generators all work the same way.
2. Match keys are ColumnPair instances, not nested tuples¶
Before:
After:
for pair, score in matches.items():
print(f"{pair.source_column} <-> {pair.target_column}: {score}")
ColumnPair is a NamedTuple, so positional
indexing still works if you really need it, and destructuring into four
names is a simple migration path:
3. MatcherResults is immutable¶
Before:
After โ these raise TypeError / AttributeError. Use the
transformation methods instead:
matches = matches.filter(min_score=0.7)
matches = matches.take_top_n(10)
matches = matches.take_top_percent(25)
Each returns a new MatcherResults
instance.
4. Ground truth accepts ColumnPair instances¶
Before โ only (col, col) pairs were allowed:
After โ both work, and table-aware comparison is now possible for multi-table matching:
from valentine.algorithms import ColumnPair
ground_truth = [
ColumnPair("hr", "emp_id", "payroll", "employee_number"),
...
]
See Evaluation metrics โ Ground-truth formats.
5. NotAValentineMatcher is deprecated¶
The exception raised for bad matcher arguments is now
InvalidMatcherError. The old name is
kept as an alias for backward compatibility but will be removed in a
future release โ update your except clauses.
# Before
from valentine import NotAValentineMatcher
# After
from valentine import InvalidMatcherError
6. The Java COMA wrapper has been removed¶
If you were relying on the previous Java-backed Coma implementation,
you no longer need a JVM โ Coma is now pure Python and
ships with the package. The constructor signature has changed slightly;
see the API reference for the new parameters
(max_n, use_instances, use_schema, delta, threshold).