CLI Reference¶

LociSimiles provides a command-line interface for common workflows.

Installation¶

The CLI is installed automatically with the package:

pip install locisimiles

Commands¶

`locisimiles run`¶

Run the intertextual detection pipeline on source and target documents.

locisimiles run SOURCE TARGET [OPTIONS]

Arguments¶

Argument	Description
`SOURCE`	Path to the source CSV file
`TARGET`	Path to the target CSV file

Options¶

Option	Default	Description
`--output`, `-o`	`results.csv`	Output file path
`--model`, `-m`	`sentence-transformers/all-MiniLM-L6-v2`	Model name or path
`--top-k`, `-k`	`10`	Number of candidates to retrieve
`--threshold`, `-t`	`0.85`	Classification threshold
`--batch-size`, `-b`	`32`	Batch size for processing

Examples¶

Basic usage:

locisimiles run source.csv target.csv

With custom output and model:

locisimiles run source.csv target.csv \
    --output results.csv \
    --model bert-base-multilingual-cased \
    --top-k 20

`locisimiles evaluate`¶

Evaluate detection results against ground truth.

locisimiles evaluate PREDICTIONS GROUND_TRUTH [OPTIONS]

Arguments¶

Argument	Description
`PREDICTIONS`	Path to predictions CSV file
`GROUND_TRUTH`	Path to ground truth CSV file

Options¶

Option	Default	Description
`--output`, `-o`	`None`	Output file for metrics (prints to stdout if not specified)

Examples¶

locisimiles evaluate results.csv ground_truth.csv

Save metrics to file:

locisimiles evaluate results.csv ground_truth.csv -o metrics.json

Input File Formats¶

Source/Target CSV¶

CSV files should contain at minimum an ID column and a text column:

id,text
1,"Arma virumque cano Troiae qui primus ab oris"
2,"Italiam fato profugus Laviniaque venit"

Ground Truth CSV¶

Ground truth files should contain query-reference pairs with labels:

query_id,reference_id,label
1,42,1
2,15,0

Where label is 1 for true matches and 0 for non-matches.

Output Format¶

The pipeline outputs a CSV with the following columns:

Column	Description
`query_id`	ID of the source text segment
`reference_id`	ID of the matched target segment
`score`	Similarity/classification score
`above_threshold`	Whether the score exceeds the threshold

Environment Variables¶

Variable	Description
`LOCISIMILES_CACHE_DIR`	Directory for model caching
`LOCISIMILES_DEVICE`	Device for computation (`cpu`, `cuda`, `mps`)

CLI Reference¶

Installation¶

Commands¶

locisimiles run¶

Arguments¶

Options¶

Examples¶

locisimiles evaluate¶

Arguments¶

Options¶

Examples¶

Input File Formats¶

Source/Target CSV¶

Ground Truth CSV¶

Output Format¶

Environment Variables¶

`locisimiles run`¶

`locisimiles evaluate`¶