Judges¶
Judges score or classify candidates produced by a generator.
All judges inherit from CandidateJudgeBase and implement a judge() method
returning CandidateJudgeOutput.
CandidateJudgeBase¶
locisimiles.pipeline.judge._base.CandidateJudgeBase
¶
Abstract base class for candidate judges.
A judge receives the output of a candidate generator and produces a
CandidateJudgeOutput — a dictionary mapping query-segment IDs to
lists of CandidateJudge objects, each containing a source segment,
the original candidate score, and a final judgment score.
Subclasses must implement judge().
Available implementations:
ClassificationJudge— scores pairs with a fine-tuned transformer classification model.ThresholdJudge— applies a top-k or score-threshold rule.IdentityJudge— passes candidates through unchanged (judgment_score = 1.0).
judge
abstractmethod
¶
judge(
*,
query: Document,
candidates: CandidateGeneratorOutput,
**kwargs: Any,
) -> CandidateJudgeOutput
Score or classify candidates.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
Query document (needed to look up query-segment texts).
TYPE:
|
candidates
|
Output from a candidate generator.
TYPE:
|
**kwargs
|
Judge-specific parameters.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
CandidateJudgeOutput
|
Mapping of query segment IDs → lists of |
ClassificationJudge¶
Judge candidates using a transformer sequence-classification model.
locisimiles.pipeline.judge.classification.ClassificationJudge
¶
ClassificationJudge(
*,
classification_name: str = "julian-schelb/xlm-roberta-large-class-lat-intertext-v1",
device: str | int | None = None,
pos_class_idx: int = 1,
)
Judge candidates using a transformer classification model.
Loads a pre-trained sequence-classification model and tokenizer.
For each query–candidate pair the model outputs P(positive), which
is stored as judgment_score.
The default model is
julian-schelb/xlm-roberta-large-class-lat-intertext-v1, a fine-tuned
classifier for Latin intertextuality detection.
| PARAMETER | DESCRIPTION |
|---|---|
classification_name
|
HuggingFace model identifier.
TYPE:
|
device
|
Torch device string (
TYPE:
|
pos_class_idx
|
Index of the positive class in the classifier output.
TYPE:
|
Example
from locisimiles.pipeline.judge import ClassificationJudge
# Create judge with default model
judge = ClassificationJudge(device="cpu")
# Score pre-generated candidates
results = judge.judge(query=query_doc, candidates=candidates)
# Each result has a judgment_score (probability of being a match)
for qid, judgments in results.items():
for j in judgments:
if j.judgment_score > 0.5:
print(f"{qid} → {j.segment.id}: {j.judgment_score:.3f}")
debug_input_sequence
¶
Inspect how a query–candidate pair is tokenised and encoded.
Useful for debugging classification results or understanding how text truncation affects model input.
| PARAMETER | DESCRIPTION |
|---|---|
query_text
|
Raw query text.
TYPE:
|
candidate_text
|
Raw candidate text.
TYPE:
|
max_len
|
Maximum token length.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary with keys: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
judge
¶
judge(
*,
query: Document,
candidates: CandidateGeneratorOutput,
batch_size: int = 32,
**kwargs: Any,
) -> CandidateJudgeOutput
Classify each candidate pair using the loaded model.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
Query document.
TYPE:
|
candidates
|
Output from a candidate generator.
TYPE:
|
batch_size
|
Batch size for the classifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
CandidateJudgeOutput
|
|
CandidateJudgeOutput
|
P(positive) from the classifier. |
ThresholdJudge¶
Binary decisions based on candidate scores (top-k or threshold).
locisimiles.pipeline.judge.threshold.ThresholdJudge
¶
Judge candidates using a simple score threshold or top-k cut-off.
Two strategies are available (mutually exclusive):
- Top-k (default): the first top_k candidates per query (assumed
to be sorted by score descending) receive
judgment_score = 1.0; the rest get0.0. - Similarity threshold: if similarity_threshold is provided,
every candidate whose
score >= similarity_thresholdreceivesjudgment_score = 1.0.
| PARAMETER | DESCRIPTION |
|---|---|
top_k
|
Number of top candidates to mark as positive.
TYPE:
|
similarity_threshold
|
Score threshold for positive decisions.
If set, overrides
TYPE:
|
Example
from locisimiles.pipeline.judge import ThresholdJudge
# Keep the 5 best candidates per query
judge = ThresholdJudge(top_k=5)
results = judge.judge(query=query_doc, candidates=candidates)
# Or use a similarity threshold instead
judge = ThresholdJudge(similarity_threshold=0.7)
results = judge.judge(query=query_doc, candidates=candidates)
judge
¶
judge(
*,
query: Document,
candidates: CandidateGeneratorOutput,
top_k: Optional[int] = None,
similarity_threshold: Optional[float] = None,
**kwargs: Any,
) -> CandidateJudgeOutput
Apply threshold or top-k rule to produce binary judgments.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
Query document (unused but required by protocol).
TYPE:
|
candidates
|
Output from a candidate generator.
TYPE:
|
top_k
|
Override instance
TYPE:
|
similarity_threshold
|
Override instance
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
CandidateJudgeOutput
|
|
IdentityJudge¶
Pass-through judge that marks every candidate as positive.
locisimiles.pipeline.judge.identity.IdentityJudge
¶
Pass every candidate through with judgment_score = 1.0.
Useful when the candidate generator already performs all the filtering and scoring that is needed (e.g. the rule-based generator). No additional models are loaded.
Example
judge
¶
judge(
*,
query: Document,
candidates: CandidateGeneratorOutput,
**kwargs: Any,
) -> CandidateJudgeOutput
Convert every Candidate to CandidateJudge with judgment_score = 1.0.