Judges¶

Judges score or classify candidates produced by a generator.

All judges inherit from CandidateJudgeBase and implement a judge() method returning CandidateJudgeOutput.

CandidateJudgeBase¶

locisimiles.pipeline.judge._base.CandidateJudgeBase ¶

Abstract base class for candidate judges.

A judge receives the output of a candidate generator and produces a CandidateJudgeOutput — a dictionary mapping query-segment IDs to lists of CandidateJudge objects, each containing a source segment, the original candidate score, and a final judgment score.

Subclasses must implement judge().

Available implementations:

ClassificationJudge — scores pairs with a fine-tuned transformer classification model.
ThresholdJudge — applies a top-k or score-threshold rule.
IdentityJudge — passes candidates through unchanged (judgment_score = 1.0).

judge `abstractmethod` ¶

judge(
    *,
    query: Document,
    candidates: CandidateGeneratorOutput,
    **kwargs: Any,
) -> CandidateJudgeOutput

Score or classify candidates.

PARAMETER	DESCRIPTION
`query`	Query document (needed to look up query-segment texts). TYPE: `Document`
`candidates`	Output from a candidate generator. TYPE: `CandidateGeneratorOutput`
`**kwargs`	Judge-specific parameters. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`CandidateJudgeOutput`	Mapping of query segment IDs → lists of `CandidateJudge` objects.

ClassificationJudge¶

Judge candidates using a transformer sequence-classification model.

locisimiles.pipeline.judge.classification.ClassificationJudge ¶

ClassificationJudge(
    *,
    classification_name: str = "julian-schelb/xlm-roberta-large-class-lat-intertext-v1",
    device: str | int | None = None,
    pos_class_idx: int = 1,
)

Judge candidates using a transformer classification model.

Loads a pre-trained sequence-classification model and tokenizer. For each query–candidate pair the model outputs P(positive), which is stored as judgment_score.

The default model is julian-schelb/xlm-roberta-large-class-lat-intertext-v1, a fine-tuned classifier for Latin intertextuality detection.

PARAMETER	DESCRIPTION
`classification_name`	HuggingFace model identifier. TYPE: `str` DEFAULT: `'julian-schelb/xlm-roberta-large-class-lat-intertext-v1'`
`device`	Torch device string (`"cpu"`, `"cuda"`, `"mps"`). TYPE: `str \| int \| None` DEFAULT: `None`
`pos_class_idx`	Index of the positive class in the classifier output. TYPE: `int` DEFAULT: `1`

Example

from locisimiles.pipeline.judge import ClassificationJudge

# Create judge with default model
judge = ClassificationJudge(device="cpu")

# Score pre-generated candidates
results = judge.judge(query=query_doc, candidates=candidates)

# Each result has a judgment_score (probability of being a match)
for qid, judgments in results.items():
    for j in judgments:
        if j.judgment_score > 0.5:
            print(f"{qid} → {j.segment.id}: {j.judgment_score:.3f}")

debug_input_sequence ¶

debug_input_sequence(
    query_text: str, candidate_text: str, max_len: int = 512
) -> Dict[str, Any]

Inspect how a query–candidate pair is tokenised and encoded.

Useful for debugging classification results or understanding how text truncation affects model input.

PARAMETER	DESCRIPTION
`query_text`	Raw query text. TYPE: `str`
`candidate_text`	Raw candidate text. TYPE: `str`
`max_len`	Maximum token length. TYPE: `int` DEFAULT: `512`

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary with keys:
`Dict[str, Any]`	`query` / `candidate` — original texts.
`Dict[str, Any]`	`query_truncated` / `candidate_truncated` — after truncation.
`Dict[str, Any]`	`input_ids` — token ID list.
`Dict[str, Any]`	`attention_mask` — attention mask list.
`Dict[str, Any]`	`input_text` — decoded input with special tokens visible.

Example

judge = ClassificationJudge(device="cpu")
info = judge.debug_input_sequence(
    "Arma virumque cano",
    "Troiae qui primus ab oris",
)
print(info["input_text"])

judge ¶

judge(
    *,
    query: Document,
    candidates: CandidateGeneratorOutput,
    batch_size: int = 32,
    **kwargs: Any,
) -> CandidateJudgeOutput

Classify each candidate pair using the loaded model.

PARAMETER	DESCRIPTION
`query`	Query document. TYPE: `Document`
`candidates`	Output from a candidate generator. TYPE: `CandidateGeneratorOutput`
`batch_size`	Batch size for the classifier. TYPE: `int` DEFAULT: `32`

RETURNS	DESCRIPTION
`CandidateJudgeOutput`	`CandidateJudgeOutput` with `judgment_score` =
`CandidateJudgeOutput`	P(positive) from the classifier.

ThresholdJudge¶

Binary decisions based on candidate scores (top-k or threshold).

locisimiles.pipeline.judge.threshold.ThresholdJudge ¶

ThresholdJudge(
    *,
    top_k: int = 10,
    similarity_threshold: Optional[float] = None,
)

Judge candidates using a simple score threshold or top-k cut-off.

Two strategies are available (mutually exclusive):

Top-k (default): the first top_k candidates per query (assumed to be sorted by score descending) receive judgment_score = 1.0; the rest get 0.0.
Similarity threshold: if similarity_threshold is provided, every candidate whose score >= similarity_threshold receives judgment_score = 1.0.

PARAMETER	DESCRIPTION
`top_k`	Number of top candidates to mark as positive. TYPE: `int` DEFAULT: `10`
`similarity_threshold`	Score threshold for positive decisions. If set, overrides `top_k`. TYPE: `Optional[float]` DEFAULT: `None`

Example

from locisimiles.pipeline.judge import ThresholdJudge

# Keep the 5 best candidates per query
judge = ThresholdJudge(top_k=5)
results = judge.judge(query=query_doc, candidates=candidates)

# Or use a similarity threshold instead
judge = ThresholdJudge(similarity_threshold=0.7)
results = judge.judge(query=query_doc, candidates=candidates)

judge ¶

judge(
    *,
    query: Document,
    candidates: CandidateGeneratorOutput,
    top_k: Optional[int] = None,
    similarity_threshold: Optional[float] = None,
    **kwargs: Any,
) -> CandidateJudgeOutput

Apply threshold or top-k rule to produce binary judgments.

PARAMETER	DESCRIPTION
`query`	Query document (unused but required by protocol). TYPE: `Document`
`candidates`	Output from a candidate generator. TYPE: `CandidateGeneratorOutput`
`top_k`	Override instance `top_k`. TYPE: `Optional[int]` DEFAULT: `None`
`similarity_threshold`	Override instance `similarity_threshold`. TYPE: `Optional[float]` DEFAULT: `None`

RETURNS	DESCRIPTION
`CandidateJudgeOutput`	`CandidateJudgeOutput` with `judgment_score` ∈ {0.0, 1.0}.

IdentityJudge¶

Pass-through judge that marks every candidate as positive.

locisimiles.pipeline.judge.identity.IdentityJudge ¶

Pass every candidate through with judgment_score = 1.0.

Useful when the candidate generator already performs all the filtering and scoring that is needed (e.g. the rule-based generator). No additional models are loaded.

Example

from locisimiles.pipeline.judge import IdentityJudge

judge = IdentityJudge()
results = judge.judge(query=query_doc, candidates=candidates)

# Every candidate gets judgment_score = 1.0
for qid, judgments in results.items():
    for j in judgments:
        print(j.judgment_score)  # 1.0

judge ¶

judge(
    *,
    query: Document,
    candidates: CandidateGeneratorOutput,
    **kwargs: Any,
) -> CandidateJudgeOutput

Convert every Candidate to CandidateJudge with judgment_score = 1.0.

Judges¶

CandidateJudgeBase¶

locisimiles.pipeline.judge._base.CandidateJudgeBase ¶

judge abstractmethod ¶

ClassificationJudge¶

locisimiles.pipeline.judge.classification.ClassificationJudge ¶

debug_input_sequence ¶

judge ¶

ThresholdJudge¶

locisimiles.pipeline.judge.threshold.ThresholdJudge ¶

judge ¶

IdentityJudge¶

locisimiles.pipeline.judge.identity.IdentityJudge ¶

judge ¶

judge `abstractmethod` ¶