DataStore

Defined in dccd.storage

class DataStore(data_path, exchange, pair, span, data_type='ohlc')[source]

Bases: object

Unified read/write interface for a single (exchange, pair, data_type).

Parameters:
data_pathstr

Root directory for all local data files (e.g. '/data/crypto').

exchangestr

Exchange name, lowercase (e.g. 'binance').

pairstr

Trading pair in 'CRYPTO/FIAT' format (e.g. 'BTC/USDT'). The slash is converted to a hyphen for the file-system path.

spanint or None

Candle interval in seconds. Required for data_type='ohlc'; pass None for trades and orderbook.

data_type{‘ohlc’, ‘trades’, ‘orderbook’}

Kind of data stored in this instance.

Attributes:
directorypathlib.Path

Absolute directory for this store (created if absent).

property directory

Absolute directory for this store (created if absent).

existing_periods()[source]

List period labels for all available files.

Returns:
list of str

Sorted list of year strings (['2024', '2025']) for OHLC, or date strings (['2026-05-20', '2026-05-21']) for trades/orderbook.

is_period_complete(year)[source]

Return True if the parquet file for year contains all expected rows.

Parameters:
yearint

Calendar year to check (e.g. 2024).

Returns:
bool

False for non-OHLC stores, missing files, or when the row count is below the expected number of candles for that year.

last_timestamp()[source]

Return the last TS value in the most recent period file.

Returns:
int or None

Unix timestamp of the last row, or None if no data exists.

load(start=None, end=None)[source]

Load and concatenate all period files covering [start, end].

Parameters:
startint or None, optional

Inclusive lower bound (Unix timestamp). None means no lower bound.

endint or None, optional

Inclusive upper bound (Unix timestamp). None means no upper bound.

Returns:
pl.DataFrame

Concatenated data, sorted by 'TS', filtered to [start, end]. Empty DataFrame if no files are found.

missing_intervals(start, end)[source]

Return the list of (start, end) intervals within [start, end] that still need to be downloaded.

For OHLC stores the method inspects existing annual parquet files: complete past years are skipped entirely; incomplete or absent years yield an interval from the last saved timestamp (+ span) to the end of that year. The current calendar year always extends from the last saved row to end.

For trades / orderbook stores (no span) the method falls back to a simple resume: one interval from last_timestamp + span (or start if no data) to end.

Parameters:
startint

Desired start timestamp (Unix seconds).

endint

Desired end timestamp (Unix seconds).

Returns:
list of (int, int)

Ordered list of (ivl_start, ivl_end) pairs to download. Empty list means all data is already present.

save(df)[source]

Write df into the appropriate period file(s), merging with existing data.

OHLC data is grouped by year; trades and orderbook by calendar day. Rows are merged on 'TS' (dedup keep='last'), sorted ascending, and written as Parquet.

Parameters:
dfpl.DataFrame

Data to persist. Must contain a 'TS' column (Unix timestamps).