Metadata-Version: 2.4
Name: acquirium
Version: 0.1.1
Summary: Acquirium: A data-metadata management platform for water treatment systems
Author-email: Gabe Fierro <gtfierro@mines.edu>, Mete Saka <saka@mines.edu>
License-Expression: MIT
License-File: LICENSE
Keywords: knowledge-graph,metadata,ontology,rdf,time-series,water-treatment
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.12
Requires-Dist: adbc-driver-postgresql>=1.9.0
Requires-Dist: docker>=7.1.0
Requires-Dist: duckdb>=0.10.0
Requires-Dist: fastapi>=0.125.0
Requires-Dist: fastembed>=0.4.0
Requires-Dist: fastexcel>=0.17.2
Requires-Dist: grafanalib>=0.7.1
Requires-Dist: ipykernel==7.0.1
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: numpy==2.3.5
Requires-Dist: polars>=1.36.1
Requires-Dist: psycopg[binary]>=3.3.1
Requires-Dist: pyarrow>=22.0.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: pyontoenv==0.4.0a4
Requires-Dist: pyoxigraph>=0.5.2
Requires-Dist: python-dateutil>=2.9.0.post0
Requires-Dist: python-multipart>=0.0.21
Requires-Dist: requests>=2.32.5
Requires-Dist: rich>=14.2.0
Requires-Dist: typer>=0.12.0
Requires-Dist: uvicorn>=0.38.0
Provides-Extra: mqtt
Requires-Dist: paho-mqtt>=2.1.0; extra == 'mqtt'
Provides-Extra: watertap
Requires-Dist: pyomo>=6; extra == 'watertap'
Requires-Dist: watertap; extra == 'watertap'
Provides-Extra: xlsx
Requires-Dist: fastexcel>=0.11.0; extra == 'xlsx'
Description-Content-Type: text/markdown

# Acquirium
A Data-Metadata Framework for Water Treatment Plants

Acquirium is a framework for storing, managing, querying, and integrating data and metadata for water treatment systems. It combines knowledge graphs and time series data to support analysis, monitoring, and experimentation.

## Installation

From PyPI:

```bash
pip install acquirium
```

Optional extras for specific drivers:

```bash
pip install "acquirium[mqtt]"       # MQTT ingestion driver
pip install "acquirium[xlsx]"       # Excel ingestion driver
pip install "acquirium[watertap]"   # WaterTAP simulation driver
```

Or with [uv](https://docs.astral.sh/uv/):

```bash
uv pip install acquirium
```

For development from a clone:

```bash
git clone https://github.com/DataDrivenCPS/acquirium.git
cd acquirium
python -m venv .venv && source .venv/bin/activate
pip install -e .
# or: uv sync
```

## Quickstart

Acquirium ships a single CLI entry point. Start the server and any configured drivers with:

```bash
acquirium server --config acquirium.toml
```

A sample `acquirium.toml` is included at the repository root. Key sections:

- `[server]` — bind host/port, choice of timeseries backend (DuckDB or TimescaleDB), data directory.
- `[driver]` — connection defaults applied to all drivers (server URL, port, tick interval).
- `[[drivers]]` — drivers to start alongside the server.

By default the server stores data on local disk — an embedded Oxigraph RDF store and a single DuckDB file under `data_dir`. **No external services are required for a fresh install.** For multi-worker or production deployments, switch the config to `timeseries_backend = "timescale"` and point `pg_dsn` at a Postgres + TimescaleDB instance.

Override the bind host/port from the CLI if needed:

```bash
acquirium server --config acquirium.toml --host 127.0.0.1 --port 8000
acquirium server --config acquirium.toml --reload          # uvicorn auto-reload
```

## Driver-only mode

To run only `[[drivers]]` against a remote Acquirium server (no FastAPI on this host), set:

```toml
[server]
enabled = false
```

and configure `[driver].server_url` / `server_port` to point at the remote instance. Then:

```bash
acquirium server --config acquirium.toml
```

When `enabled = false`, the `server` subcommand starts only the drivers.

## Docker stack (optional)

A `compose.yaml` is provided for an all-in-one local stack (Acquirium + TimescaleDB + Grafana):

```bash
make up                              # start
make up ACQUIRIUM_RECREATE=true      # wipe data + start
make down                            # stop
```

> By default each Docker run resets the system. To preserve data across runs, set `ACQUIRIUM_RECREATE=false` in `compose.yaml`.

## WaterTAP integration

The `watertap` extra installs the Python packages needed for the built-in WaterTAP driver:

```bash
pip install "acquirium[watertap]"
acquirium server --config acquirium.toml   # with a [[drivers]] entry for WaterTAP
```

Some WaterTAP setups also require native extensions that are not installed by the extra:

```bash
pyomo download-extensions
python -m pip install setuptools && pyomo build-extensions
idaes get-extensions
```

For a full demo (WaterTAP + streaming simulator + API examples):

```bash
make watertap-up
uv run scripts/api_example.py
# or open notebooks/watertap-single-pump.ipynb
make watertap-down
```

## Logging

Acquirium supports user logs attached to entities in the system. See [scripts/logging_example.py](./scripts/logging_example.py):

```bash
acquirium server --config acquirium.toml &
python scripts/logging_example.py
```

## Text Matcher

Acquirium uses a text matcher to map natural-language input to ontology URIs (classes, predicates, units, quantity kinds). The match algorithm uses **semantic embedding similarity** powered by [FastEmbed](https://github.com/qdrant/fastembed) (default model: `BAAI/bge-small-en-v1.5`). Each ontology concept is represented by one or more surface strings, embedded and stored in an in-memory vector index. At query time the input phrase is embedded and compared against the index using cosine similarity.

There are two separate matchers, each with its own index:

1. **Graph matcher** — indexes classes and predicates from user-inserted RDF graphs. Surface strings are derived from `rdfs:label` values and CamelCase/underscore-split local names.
2. **QUDT matcher** — indexes units and quantity kinds from the QUDT ontology (fetched over HTTP with local fallback). Surface strings include `rdfs:label`, `skos:prefLabel`, `skos:altLabel`, symbols, UCUM codes, and split local names.

Both indexes are cached to disk and updated incrementally when graphs change. Results can be filtered by `kind` (`class`, `predicate`, `unit`, `quantity_kind`) and are ranked by cosine similarity, deduplicated to the highest-scoring surface per URI. See [scripts/text_matcher_example.py](./scripts/text_matcher_example.py) for usage.

## Tests

```bash
pytest tests/unit            # unit tests only
make test                    # full suite (Docker required)
```

## Status

Acquirium is under active development. Planned work is tracked in [improvements.md](./improvements.md). Bug reports and feature requests are welcome — please open an issue.
