CubedPandas Cube Class¶
Wraps a Pandas dataframes into a cube to provide convenient multi-dimensional access to the underlying dataframe for easy aggregation, filtering, slicing, reporting and data manipulation and write back. A schema, that defines the dimensions and measures of the Cube, can either be inferred automatically from the underlying dataframe (default) or defined explicitly.
settings: CubeSettings
property
¶
Returns:
-
CubeSettings
–The settings of the Cube.
measures: MeasureCollection
property
¶
Returns:
-
MeasureCollection
–The measures available within or defined for the Cube.
ambiguities: Ambiguities
property
¶
Returns:
-
Ambiguities
–An Ambiguities object that provides information about ambiguous data types in the underlying dataframe.
linked_cubes: CubeLinks
property
¶
Returns:
-
CubeLinks
–A list of linked cubes that are linked to this cube.
schema: Schema
property
¶
Returns:
-
Schema
–The Schema of the Cube which defines the dimensions and measures of the Cube.
df: pd.DataFrame
property
¶
Returns: The underlying Pandas dataframe of the Cube.
dimensions: DimensionCollection
property
¶
Returns:
-
DimensionCollection
–The dimensions available through the Cube.
size_in_bytes: int
property
¶
Returns:
The size in bytes allocated by the Cube
object instance.
The memory allocation by the underlying dataframe is not included.
__init__(df,
schema=None, infer_schema=True, exclude=None, read_only=True, ignore_member_key_errors=True, ignore_case=True, ignore_key_errors=True, caching=CachingStrategy.LAZY, caching_threshold=EAGER_CACHING_THRESHOLD,
eager_evaluation=True)
¶
Wraps a Pandas dataframes into a cube to provide convenient multi-dimensional access to the underlying dataframe for easy aggregation, filtering, slicing, reporting and data manipulation and write back.
Parameters:
-
df
(DataFrame
) –The Pandas dataframe to be wrapped into the CubedPandas
Cube
object. -
schema
–(optional) A schema that defines the dimensions and measures of the Cube. If not provided, the schema will be inferred from the dataframe if parameter
infer_schema
is set toTrue
. For further details please refer to the documentation of theSchema
class. Default value isNone
. -
infer_schema
(bool
, default:True
) –(optional) If no schema is provided and
infer_schema
is set to True, a suitable schema will be inferred from the unerlying dataframe. All numerical columns will be treated as measures, all other columns as dimensions. If this behaviour is not desired, a schema must be provided. Default value isTrue
. -
exclude
(str | list | tuple | None
, default:None
) –(optional) Defines the columns that should be excluded from the cube if no schema is provied. If a column is excluded, it will not be part of the schema and can not be accessed through the cube. Excluded columns will be ignored during schema inference. Default value is
None
. -
read_only
(bool
, default:True
) –(optional) Defines if write backs to the underlying dataframe are permitted. If read_only is set to
True
, write back attempts will raise anPermissionError
. If read_only is set toFalse
, write backs are permitted and will be pushed back to the underlying dataframe. Default value isTrue
. -
ignore_case
(bool
, default:True
) –(optional) If set to
True
, the case of member names will be ignored, 'Apple' and 'apple' will be treated as the same member. If set toFalse
, member names are case-sensitive, 'Apple' and 'apple' will be treated as different members. Default value isTrue
. -
ignore_key_errors
(bool
, default:True
) –(optional) If set to
True
, key errors for members of dimensions will be ignored and cell values will return 0.0 orNone
if no matching record exists. If set toFalse
, key errors will be raised as exceptions when accessing cell values for non-existing members. Default value isTrue
. -
caching
(CachingStrategy
, default:LAZY
) –(optional) A caching strategy to be applied for accessing the cube. recommended value for almost all use cases is
CachingStrategy.LAZY
, which caches dimension members on first access. Caching can be beneficial for performance, but may also consume more memory. To cache all dimension members eagerly (on initialization of the cube), set this parameter toCachingStrategy.EAGER
. Please refer to the documentation of 'CachingStrategy' for more information. Default value isCachingStrategy.LAZY
. -
caching_threshold
(int
, default:EAGER_CACHING_THRESHOLD
) –(optional) The threshold as 'number of members' for EAGER caching only. If the number of distinct members in a dimension is below this threshold, the dimension will be cached eargerly, if caching is set to
CacheStrategy.EAGER
orCacheStrategy.FULL
. Above this threshold, the dimension will be cached lazily. Default value isEAGER_CACHING_THRESHOLD
, equivalent to 256 unique members per dimension. -
eager_evaluation
(bool
, default:True
) –(optional) If set to
True
, the cube will evaluate the context eagerly, i.e. when the context is created. Eager evaluation is recommended for most use cases, as it simplifies debugging and error handling. If set toFalse
, the cube will evaluate the context lazily, i.e. only when the value of a context is accessed/requested.
Returns:
-
–
A new Cube object that wraps the dataframe.
Raises:
-
PermissionError
–If writeback is attempted on a read-only Cube.
-
ValueError
–If the schema is not valid or does not match the dataframe or if invalid dimension, member, measure or address agruments are provided.
Examples:
__len__()
¶
Returns:
-
–
The number of records in the underlying dataframe of the Cube.
clear_cache()
¶
Clears the cache of the Cube for all dimensions.
__getattr__(name)
¶
Dynamically resolves dimensions, measure or member from the cube. This enables a more natural access to the cube data using the Python dot notation.
If the name is not a valid Python identifier and contains special characters
or whitespaces
or start with numbers, then the slicer
method needs to be used
to resolve the name,
e.g., if 12 data %
is the name of a column or value in a
dataframe, then cube["12 data %"]
needs to be used to return the dimension, measure or column.
Parameters:
-
name
–Existing Name of a dimension, member or measure in the cube.
Returns:
-
Context | CubeContext
–A Cell object that represents the cube data related to the address.
Samples
cdf = cubed(df) cdf.Online.Apple.cost 50
__getitem__(address)
¶
Returns a cell of the cube for a given address. Args: address: A valid cube address. Please refer the documentation for further details.
Returns:
-
Context
–A Cell object that represents the cube data related to the address.
Raises:
-
ValueError
–If the address is not valid or can not be resolved.
__setitem__(address,
value)
¶
Sets a value for a given address in the cube. Args: address: A valid cube address. Please refer the documentation for further details. value: The value to be set for the data represented by the address. Raises: PermissionError: If write back is attempted on a read-only Cube.
__delitem__(address)
¶
Deletes the records represented by the given address from the underlying dataframe of the cube. Args: address: A valid cube address. Please refer the documentation for further details. Raises: PermissionError: If write back is attempted on a read-only Cube.
slice(rows=None,
columns=None, config=None)
¶
Returns a new slice for the cube. A slice represents a table-alike view to data in the cube. Typically, a slice has rows, columns and filters, comparable to an Excel PivotTable. Useful for printing in Jupyter, visual data analysis and reporting purposes. Slices can be easily 'navigated' by setting and changing rows, columns and filters.
Please refer to the documentation of the Slice class for further details.