ParquetStoreΒΆ

Defined in dccd.storage.parquet

class ParquetStore(data_path)[source]

Bases: object

Read/write interface for a single DatasetId.

All timestamps (TS) are nanoseconds UTC (int64).

Parameters:
data_pathstr or Path

Root directory for all data files.

Examples

>>> import pathlib, tempfile
>>> from dccd.domain.dataset import DatasetId
>>> from dccd.domain.symbol import Symbol
>>> from dccd.domain.types import DataType
>>> store = ParquetStore('/tmp/data')
directory(ds)[source]

Return the directory for ds, creating it if needed.

inventory()[source]

Return list of dataset info dicts for all stored data.

Each entry includes min_ts / max_ts (nanoseconds UTC) and rows so the UI can display the actual data time range.

last_timestamp(ds)[source]

Return last TS in ns, or None if no data.

load(ds, start_ns=None, end_ns=None)[source]

Load data for ds in the given nanosecond range.

missing_intervals(ds, start_ns, end_ns)[source]

Return gaps as (start_ns, end_ns) pairs within [start_ns, end_ns].

static read_provenance(file_path)[source]

Return the Provenance stored in a Parquet file, if any.

save(ds, records, provenance=None)[source]

Write records to Parquet, merging with existing data.

Parameters:
dsDatasetId
recordslist

OHLCBar, Trade, or OrderBookSnapshot objects.

provenanceProvenance or None
Returns:
int

Number of rows written.