Skip to content

CubedPandas Schema Class

Defines a multidimensional schema, for cell-based data access to a Pandas dataframe using an Cube.

The schema defines the dimensions and measures of the cube and can be either inferred from the underlying Pandas dataframe automatically or defined explicitly. The schema can be validated against the Pandas dataframe to ensure the schema is valid for the table.

dimensions: DimensionCollection property

Returns the dimensions of the schema.

measures: MeasureCollection property

Returns the measures of the schema.

__init__(df=None, schema=None, caching=CachingStrategy.LAZY)

Initializes a new schema for a Cube upon a given Pandas dataframe. If the dataframe is not provided, the schema needs to be built manually and can also not be validated against the Pandas dataframe.

For building a schema manually, you can either create a new schema from scratch or you can load, extend and modify an existing schema as defined by parameter schema. The parameter schema can either be another Schema object, a Python dictionary containing valid schema information, a json string containing valid schema information or a file name or path to a json file containing valid schema information.

:param df: (optional) the Pandas dataframe to build the schema from or for. :param schema: (optional) a schema to initialize the Schema with. The parameter schema can either be another Schema object, a Python dictionary containing valid schema information, a json string containing valid schema information or a file name or path to a json file containing valid schema information. :param caching: The caching strategy to be used for the Cube. Default is CachingStrategy.LAZY. Please refer to the documentation of 'CachingStrategy' for more information.

validate(df)

Validates the schema against an existing Pandas dataframe.

If returned True, the schema is valid for the given Pandas dataframe and can be used to access its data. Otherwise, the schema is not valid and will or may lead to errors when accessing its data.

:param df: The Pandas dataframe to validate the schema against.

:return: Returns True if the schema is valid for the given Pandas dataframe, otherwise False.

infer_schema(exclude=None)

Infers a multidimensional schema from the Pandas dataframe of the Schema or another Pandas dataframe by analyzing the columns of the table and their contents.

This process can be time-consuming for large tables. For such cases, it is recommended to infer the schema only from a sample of the records by setting parameter 'sample_records' to True. By default, the schema is inferred from and validated against all records.

The inference process tries to identify the dimensions and their hierarchies of the cube as well as the measures of the cube. If no schema cannot be inferred, an exception is raised.

By default, string, datetime and boolean columns are assumed to be measure columns and numerical columns are assumed to be measures for cube computations. By default, all columns of the Pandas dataframe will be used to infer the schema. However, a subset of columns can be specified to infer the schema from. The subset needs to contain at least two columns, one for a single dimensions and one for a single measures.

For more complex tables it is possible or even likely that the resulting schema does not match your expectations or requirements. For such cases, you will need to build your schema manually. Please refer the documentation for further details on how to build a schema manually.

:param exclude: (optional) a list of either column names or ordinal column ids to exclude when inferring the schema.

:return: Returns the inferred schema.

from_dict(dictionary) classmethod

Creates a new schema from a dictionary containing schema information for a Cube. Please refer to the documentation for further details on valid schema definitions.

:param dictionary: The dictionary containing the schema information. :return: Returns a new schema object. :exception: Raises an exception if the schema information is not valid or incomplete.

from_json(json_string) classmethod

Creates a new schema from a json string containing schema information for a Cube. If the json string is not valid and does refer to a file that contains a valid schema in json format, an exception is raised. Please refer to the documentation for further details on valid schema definitions.

:param json_string: The json string containing the schema information. :return: Returns a new schema object. :exception: Raises an exception if the schema information is not valid or incomplete.

to_dict()

Converts the schema into a dictionary containing schema information for an Cube.

:return: Returns a dictionary containing the schema information.

to_json()

Converts the schema into a dictionary containing schema information for an Cube.

:return: Returns a dictionary containing the schema information.

save_as_json(file_name)

Saves the schema as a json file.

:param file_name: The name of the file to save the schema to.

__len__()

Returns the number of dimensions of the schema.