Development Guide¶

This guide covers setting up a development environment and contributing to LociSimiles.

Prerequisites¶

Python 3.10 or higher
Git
pip package manager

Setup¶

Clone the Repository¶

git clone https://github.com/julianschelb/locisimiles.git
cd locisimiles

Create Virtual Environment¶

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Development Dependencies¶

pip install -e ".[dev]"

This installs the package in editable mode along with development tools:

pytest - Testing framework
pytest-cov - Coverage reporting
poethepoet - Task runner
mkdocs - Documentation
mkdocs-material - Documentation theme

Running Tests¶

Using Poe (Recommended)¶

# Run all tests
poe test

# Run tests with coverage report
poe test-cov

Using Pytest Directly¶

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run specific test file
pytest tests/test_document.py

# Run specific test
pytest tests/test_document.py::TestDocument::test_from_csv

# Run with coverage
pytest --cov=locisimiles --cov-report=term-missing

# Generate HTML coverage report
pytest --cov=locisimiles --cov-report=html

Documentation¶

Serve Locally¶

poe docs

This starts a local server at http://127.0.0.1:8000 with live reload.

Build Static Site¶

poe docs-build

Output is written to the site/ directory.

Deploy to GitHub Pages¶

poe docs-deploy

This builds and pushes to the gh-pages branch.

Project Structure¶

locisimiles/
├── src/
│   └── locisimiles/
│       ├── __init__.py          # Package exports
│       ├── cli.py               # Command-line interface
│       ├── document.py          # Document and TextSegment classes
│       ├── evaluator.py         # Evaluation metrics
│       └── pipeline/
│           ├── __init__.py      # Pipeline exports
│           ├── _types.py        # Type definitions
│           ├── classification.py # Classification pipeline
│           ├── retrieval.py     # Retrieval pipeline
│           └── two_stage.py     # Combined pipeline
├── tests/
│   ├── conftest.py              # Shared test fixtures
│   ├── test_document.py         # Document tests
│   ├── test_evaluator.py        # Evaluator tests
│   ├── test_cli.py              # CLI tests
│   ├── test_types.py            # Type definition tests
│   ├── test_retrieval.py        # Retrieval pipeline tests
│   ├── test_classification.py   # Classification pipeline tests
│   └── test_two_stage.py        # Two-stage pipeline tests
├── docs/                        # Documentation source
├── examples/                    # Example scripts and notebooks
├── pyproject.toml               # Project configuration
└── mkdocs.yml                   # Documentation configuration

Code Style¶

General Guidelines¶

Follow PEP 8 style guidelines
Use type hints for function signatures
Write docstrings for public APIs
Keep functions focused and small

Example¶

def compute_similarity(
    text_a: str,
    text_b: str,
    model: Optional[str] = None
) -> float:
    """Compute semantic similarity between two texts.

    Args:
        text_a: First text string.
        text_b: Second text string.
        model: Optional model name. Defaults to MiniLM.

    Returns:
        Similarity score between 0 and 1.

    Raises:
        ValueError: If either text is empty.
    """
    if not text_a or not text_b:
        raise ValueError("Texts cannot be empty")
    ...

Testing Guidelines¶

Writing Tests¶

Place tests in the tests/ directory
Name test files test_<module>.py
Name test classes Test<ClassName>
Name test methods test_<behavior>

Using Fixtures¶

Shared fixtures are defined in conftest.py:

def test_document_loading(sample_csv_file):
    """Test loading document from CSV."""
    doc = Document.from_csv(sample_csv_file)
    assert len(doc) > 0

Mocking External Dependencies¶

Use unittest.mock for ML models:

from unittest.mock import MagicMock, patch

def test_retrieval_with_mock(mock_embedder):
    """Test retrieval with mocked embedding model."""
    pipeline = RetrievalPipeline()
    pipeline.model = mock_embedder
    # Test without actual model loading

Contributing¶

Workflow¶

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Make changes and add tests
Ensure tests pass: poe test
Commit changes: git commit -m "Add my feature"
Push to fork: git push origin feature/my-feature
Open a Pull Request

Pull Request Guidelines¶

Include tests for new functionality
Update documentation as needed
Keep commits focused and atomic
Write clear commit messages

Available Poe Tasks¶

Task	Command	Description
`test`	`poe test`	Run all tests
`test-cov`	`poe test-cov`	Run tests with coverage
`docs`	`poe docs`	Serve documentation locally
`docs-build`	`poe docs-build`	Build documentation
`docs-deploy`	`poe docs-deploy`	Deploy to GitHub Pages