API Reference¶
This section provides detailed documentation for the LociSimiles Python API, auto-generated from source code docstrings.
Core Modules¶
Document Module¶
The Document module provides classes for representing and loading text collections:
TextSegment- Individual text unit with ID and contentDocument- Container for text segments
Pipeline Module¶
The Pipelines module provides the main processing pipelines:
Pipeline- Generic composer: combine any generator + judgeRetrievalPipeline- Semantic similarity retrievalClassificationPipeline- Text pair classificationClassificationPipelineWithCandidateGeneration- Two-stage retrieval + classificationRuleBasedPipeline- Lexical matching + linguistic filters
Generators Module¶
The Generators module provides candidate-generation components:
EmbeddingCandidateGenerator- Semantic embedding similarityExhaustiveCandidateGenerator- All-pairs (no filtering)RuleBasedCandidateGenerator- Lexical matching + linguistic filters
Judges Module¶
The Judges module provides scoring/classification components:
ClassificationJudge- Transformer-based sequence classificationThresholdJudge- Binary decisions from candidate scoresIdentityJudge- Pass-through (judgment_score = 1.0)
Evaluator Module¶
The Evaluator module provides tools for assessing detection quality:
IntertextEvaluator- Main evaluation class
Quick Reference¶
Loading Documents¶
Saving Results¶
# Save from a pipeline instance
results = pipeline.run(query=query_doc, source=source_doc, top_k=10)
pipeline.to_csv("results.csv")
pipeline.to_json("results.json")
# Or use standalone functions
from locisimiles.pipeline import results_to_csv, results_to_json
results_to_csv(results, "results.csv")
results_to_json(results, "results.json")