CubedPandas Cube Class¶

Wraps a Pandas dataframes into a cube to provide convenient multi-dimensional access to the underlying dataframe for easy aggregation, filtering, slicing, reporting and data manipulation and write back. A schema, that defines the dimensions and measures of the Cube, can either be inferred automatically from the underlying dataframe (default) or defined explicitly.

`settings: CubeSettings` `property` ¶

Returns:

CubeSettings –

The settings of the Cube.

`measures: MeasureCollection` `property` ¶

Returns:

MeasureCollection –

The measures available within or defined for the Cube.

`ambiguities: Ambiguities` `property` ¶

Returns:

Ambiguities –

An Ambiguities object that provides information about ambiguous data types in the underlying dataframe.

`linked_cubes: CubeLinks` `property` ¶

Returns:

CubeLinks –

A list of linked cubes that are linked to this cube.

`schema: Schema` `property` ¶

Returns:

Schema –

The Schema of the Cube which defines the dimensions and measures of the Cube.

`df: pd.DataFrame` `property` ¶

Returns: The underlying Pandas dataframe of the Cube.

`dimensions: DimensionCollection` `property` ¶

Returns:

DimensionCollection –

The dimensions available through the Cube.

`size_in_bytes: int` `property` ¶

Returns: The size in bytes allocated by the Cube object instance. The memory allocation by the underlying dataframe is not included.

`init(df, schema=None, infer_schema=True, exclude=None, read_only=True, ignore_member_key_errors=True, ignore_case=True, ignore_key_errors=True, caching=CachingStrategy.LAZY, caching_threshold=EAGER_CACHING_THRESHOLD, eager_evaluation=True)` ¶

Wraps a Pandas dataframes into a cube to provide convenient multi-dimensional access to the underlying dataframe for easy aggregation, filtering, slicing, reporting and data manipulation and write back.

Parameters:

df (DataFrame) –

The Pandas dataframe to be wrapped into the CubedPandas Cube object.
schema –

(optional) A schema that defines the dimensions and measures of the Cube. If not provided, the schema will be inferred from the dataframe if parameter infer_schema is set to True. For further details please refer to the documentation of the Schema class. Default value is None.
infer_schema (bool, default: True ) –

(optional) If no schema is provided and infer_schema is set to True, a suitable schema will be inferred from the unerlying dataframe. All numerical columns will be treated as measures, all other columns as dimensions. If this behaviour is not desired, a schema must be provided. Default value is True.
exclude (str | list | tuple | None, default: None ) –

(optional) Defines the columns that should be excluded from the cube if no schema is provied. If a column is excluded, it will not be part of the schema and can not be accessed through the cube. Excluded columns will be ignored during schema inference. Default value is None.
read_only (bool, default: True ) –

(optional) Defines if write backs to the underlying dataframe are permitted. If read_only is set to True, write back attempts will raise an PermissionError. If read_only is set to False, write backs are permitted and will be pushed back to the underlying dataframe. Default value is True.
ignore_case (bool, default: True ) –

(optional) If set to True, the case of member names will be ignored, 'Apple' and 'apple' will be treated as the same member. If set to False, member names are case-sensitive, 'Apple' and 'apple' will be treated as different members. Default value is True.
ignore_key_errors (bool, default: True ) –

(optional) If set to True, key errors for members of dimensions will be ignored and cell values will return 0.0 or None if no matching record exists. If set to False, key errors will be raised as exceptions when accessing cell values for non-existing members. Default value is True.
caching (CachingStrategy, default: LAZY ) –

(optional) A caching strategy to be applied for accessing the cube. recommended value for almost all use cases is CachingStrategy.LAZY, which caches dimension members on first access. Caching can be beneficial for performance, but may also consume more memory. To cache all dimension members eagerly (on initialization of the cube), set this parameter to CachingStrategy.EAGER. Please refer to the documentation of 'CachingStrategy' for more information. Default value is CachingStrategy.LAZY.
caching_threshold (int, default: EAGER_CACHING_THRESHOLD ) –

(optional) The threshold as 'number of members' for EAGER caching only. If the number of distinct members in a dimension is below this threshold, the dimension will be cached eargerly, if caching is set to CacheStrategy.EAGER or CacheStrategy.FULL. Above this threshold, the dimension will be cached lazily. Default value is EAGER_CACHING_THRESHOLD, equivalent to 256 unique members per dimension.
eager_evaluation (bool, default: True ) –

(optional) If set to True, the cube will evaluate the context eagerly, i.e. when the context is created. Eager evaluation is recommended for most use cases, as it simplifies debugging and error handling. If set to False, the cube will evaluate the context lazily, i.e. only when the value of a context is accessed/requested.

Returns:

–

A new Cube object that wraps the dataframe.

Raises:

PermissionError –

If writeback is attempted on a read-only Cube.
ValueError –

If the schema is not valid or does not match the dataframe or if invalid dimension, member, measure or address agruments are provided.

Examples:

>>> df = pd.value([{"product": ["A", "B", "C"]}, {"value": [1, 2, 3]}])
>>> cdf = cubed(df)
>>> cdf["product:B"]
2

`len()` ¶

Returns:

–

The number of records in the underlying dataframe of the Cube.

`clear_cache()` ¶

Clears the cache of the Cube for all dimensions.

`getattr(name)` ¶

Dynamically resolves dimensions, measure or member from the cube. This enables a more natural access to the cube data using the Python dot notation.

If the name is not a valid Python identifier and contains special characters or whitespaces or start with numbers, then the slicer method needs to be used to resolve the name, e.g., if 12 data % is the name of a column or value in a dataframe, then cube["12 data %"] needs to be used to return the dimension, measure or column.

Parameters:

name –

Existing Name of a dimension, member or measure in the cube.

Returns:

Context | CubeContext –

A Cell object that represents the cube data related to the address.

Samples

cdf = cubed(df) cdf.Online.Apple.cost 50

`getitem(address)` ¶

Returns a cell of the cube for a given address. Args: address: A valid cube address. Please refer the documentation for further details.

Returns:

Context –

A Cell object that represents the cube data related to the address.

Raises:

ValueError –

If the address is not valid or can not be resolved.

`setitem(address, value)` ¶

Sets a value for a given address in the cube. Args: address: A valid cube address. Please refer the documentation for further details. value: The value to be set for the data represented by the address. Raises: PermissionError: If write back is attempted on a read-only Cube.

`delitem(address)` ¶

Deletes the records represented by the given address from the underlying dataframe of the cube. Args: address: A valid cube address. Please refer the documentation for further details. Raises: PermissionError: If write back is attempted on a read-only Cube.

`slice(rows=None, columns=None, config=None)` ¶

Returns a new slice for the cube. A slice represents a table-alike view to data in the cube. Typically, a slice has rows, columns and filters, comparable to an Excel PivotTable. Useful for printing in Jupyter, visual data analysis and reporting purposes. Slices can be easily 'navigated' by setting and changing rows, columns and filters.

Please refer to the documentation of the Slice class for further details.

Samples

cdf = cubed(df) cdf.slice(rows="product", columns="region", filters={"year": 2020})

year: 2000¶

| | (all) | North | South |¶

| (all) | 550 | 300 | 250 | | Apple | 200 | 100 | 100 | | Banana | 350 | 200 | 150 |

CubedPandas Cube Class¶

settings: CubeSettings property ¶

measures: MeasureCollection property ¶

ambiguities: Ambiguities property ¶

linked_cubes: CubeLinks property ¶

schema: Schema property ¶

df: pd.DataFrame property ¶

dimensions: DimensionCollection property ¶

size_in_bytes: int property ¶

__init__(df, schema=None, infer_schema=True, exclude=None, read_only=True, ignore_member_key_errors=True, ignore_case=True, ignore_key_errors=True, caching=CachingStrategy.LAZY, caching_threshold=EAGER_CACHING_THRESHOLD, eager_evaluation=True) ¶

__len__() ¶

clear_cache() ¶

__getattr__(name) ¶

__getitem__(address) ¶

__setitem__(address, value) ¶

__delitem__(address) ¶

slice(rows=None, columns=None, config=None) ¶

year: 2000¶

| | (all) | North | South |¶

`settings: CubeSettings` `property` ¶

`measures: MeasureCollection` `property` ¶

`ambiguities: Ambiguities` `property` ¶

`linked_cubes: CubeLinks` `property` ¶

`schema: Schema` `property` ¶

`df: pd.DataFrame` `property` ¶

`dimensions: DimensionCollection` `property` ¶

`size_in_bytes: int` `property` ¶

`init(df, schema=None, infer_schema=True, exclude=None, read_only=True, ignore_member_key_errors=True, ignore_case=True, ignore_key_errors=True, caching=CachingStrategy.LAZY, caching_threshold=EAGER_CACHING_THRESHOLD, eager_evaluation=True)` ¶

`len()` ¶

`clear_cache()` ¶

`getattr(name)` ¶

`getitem(address)` ¶

`setitem(address, value)` ¶

`delitem(address)` ¶

`slice(rows=None, columns=None, config=None)` ¶