microframe.core package#

The microframe.core package includes a variety of submodules for handling the core functionality of the Microframe library. These submodules provide foundational classes and functions that are essential for the operation of the Microframe ecosystem.

Submodules#

Microframe Core Module#

The microframe.core.microframe module is the central module containing core classes and functions.

class microframe.core.microframe.MicroFrame(data: List[List[Any]], dtypes: List[str], columns: List[str] | None = None)#

Bases: object

A lightweight and efficient data structure for handling tabular data.

The MicroFrame class provides a simple yet powerful interface for manipulating and displaying structured data. It is designed to be intuitive for users familiar with pandas DataFrame but aims to be more memory-efficient and faster for smaller datasets.

Parameters#

dataList[List[Any]]: A list of lists representing the rows of data.
dtypesList[str]: A list of data types for each column.
columnsOptional[List[str]], optional: A list of column names. If None, default column names will be generated.

Raises#

TypeError: If the provided data, dtypes, or columns are not lists, or if the data is not a list of lists.
ValueError: If there’s a mismatch between the data rows and the provided columns or dtypes.

Attributes#

columnsnp.ndarray: An array of column names.
valuesnp.ndarray: A structured numpy array representing the data.

Examples#

>>> # Simple Initialization
>>> data = [[1, 'Alice'], [2, 'Bob']] # Initialize Data
>>> dtypes = ['int32', 'U10'] # Initialize Data Types
>>> columns = ['id', 'name'] # Initialize Column Names
>>> mframe = MicroFrame(data, dtypes, columns)
>>> mframe.head() # Display first 5 rows
id  name
---------
1   Alice
2   Bob
2 rows x 2 columns

>>> # How to extract a subsection of the data and convert it to
>>> # numpy for training
>>> mframe_slice = mframe.iloc[:, 0] # returns all rows, but just col 0
>>> numpy_array = mframe_slice.to_numpy() # returns mframe_slice as a numpy array
>>> numpy_array
array([[1],
       [2]], dtype=int32)

Methods#

change_dtypes(dtypes_dict: dict)#

Changes the data types of the columns of the MicroFrame.

This method uses the StructuredArrayManipulator class to change the data types of the columns of the MicroFrame based on the provided mapping.

Parameters:: dtypes_dict – A dictionary mapping column names to their new data types.

Example:

>>> mframe.change_dtypes({'column1': 'float64', 'column2': 'int32'})

property count#

Returns the number of rows in the MicroFrame.

This property is useful for quickly determining the size of the dataset without having to inspect the underlying numpy array directly.

Returns:: The number of rows in the MicroFrame.
Return type:: int

Example:

>>> mframe.count

describe()#

Generates descriptive statistics summarizing the central tendency, dispersion, and shape of the dataset’s distribution, excluding NaN values.

This method targets numeric data and provides an overview of statistical characteristics of numeric columns, including count, mean, standard deviation, minimum, and maximum values.

NaN values are excluded from the calculations. The results are printed in a tabular format to the console.

Statistics computed:

count: The number of non-NaN values.
mean: The mean of the values.
std: The sample standard deviation of the values.
min: The minimum value.
max: The maximum value.

The method prints the summary to the console and does not return a value.

Raises:

TypeError – If columns contain types that cannot be converted to float.
ValueError – If computations encounter issues like an empty column.

Example:

>>> mframe.describe()

property dtypes#

Returns the data types of the columns in the MicroFrame.

This property provides a convenient way to access the data types (dtypes) of the underlying structured numpy array. Each column’s data type is returned in a numpy dtype object.

Returns:: The data types of the columns.
Return type:: numpy.dtype

Example:

>>> mframe.dtypes

classmethod from_structured_array(data: ndarray, columns: List[str] | None = None)#

Factory method to create a MicroFrame instance from a structured NumPy array.

Parameters:

data – A structured NumPy array with named fields.
columns – A list of column names. If None, default column names will be generated.

Returns:

An instance of MicroFrame.

head(max_width: int = 80, num_cols: int | None = None, num_rows: int = 5)#

Displays the first few rows of the MicroFrame.

This method uses the StructuredDataPrinter class to provide a tabular representation of the first few rows of the MicroFrame, similar to the .head() method in pandas.

Parameters:

max_width – Maximum width of the printed table in characters.
num_cols – Number of columns to display. If None, all columns are displayed.
num_rows – Number of rows to display.

Example:

>>> mframe.head()  # Show first 5 rows
>>> mframe.head(num_rows=10)  # Show first 10 rows

property iloc#

Provides integer-location based indexing for selection by position.

This property returns an instance of IlocIndexer, which is specialized for integer-location based indexing. It enables the selection of subsets of the MicroFrame’s data by integer position, similar to the .iloc property in pandas DataFrames.

When accessed, this property ensures that the subset of data is returned as a MicroFrame instance, maintaining the structure and functionalities of the original MicroFrame.

Returns:: An instance of IlocIndexer for integer-location based indexing.
Return type:: IlocIndexer

Example:

>>> first_row = mframe.iloc[0]  # First row of the MicroFrame
>>> last_row = mframe.iloc[-1] # Last row of the MicroFrame

rename(new_columns)#

Renames the columns of the MicroFrame.

This method uses the StructuredArrayManipulator class to rename the columns of the MicroFrame based on the provided mapping.

Parameters:: new_columns – A dictionary mapping old column names to new column names.

Example:

>>> mframe.rename({'old_name1': 'new_name1', 'old_name2': 'new_name2'})

property shape#

Returns the shape of the MicroFrame as a tuple.

This property mimics the .shape attribute of numpy arrays and pandas DataFrames, providing a familiar interface for users. The shape is returned as a tuple where the first element is the number of rows and the second element is the number of columns.

Returns:: The shape of the MicroFrame.
Return type:: tuple

Example:

>>> mframe.shape

tail(max_width=80, num_cols=None, num_rows=5)#

Displays the last few rows of the MicroFrame.

This method uses the StructuredDataPrinter class to provide a tabular representation of the last few rows of the MicroFrame, similar to the .tail() method in pandas.

Parameters:

max_width – Maximum width of the printed table in characters.
num_cols – Number of columns to display. If None, all columns are displayed.
num_rows – Number of rows to display.

Example:

>>> mframe.tail()
>>> mframe.tail(num_rows=10)

to_numpy()#

Converts the MicroFrame to a regular 2D NumPy array (matrix).

This conversion will result in a 2D NumPy array with each column corresponding to a field in the MicroFrame. All fields must be of a type that can be cast to a common dtype.

Returns:: A 2D NumPy array representation of the MicroFrame data.
Return type:: numpy.ndarray

Example:

>>> numpy_array = mframe.to_numpy()

Indexers Module#

The indexers submodule provides functionality for indexing data within the Microframe framework.

class microframe.core.indexers.IlocIndexer(values: ndarray, columns: ndarray, return_type: Type[T])#

Bases: Generic[T], StructuredArrayIndexer

Provides integer-location based indexing for selection by position.

This indexer is a generic class that returns a subset of the data in a structured array format. It inherits from StructuredArrayIndexer and allows for selection of data by integer-location, similar to .iloc in pandas.

Parameters#

valuesnumpy.ndarray: The numpy structured array to be indexed.
columnsnumpy.ndarray: Column names corresponding to the data in the structured array.
return_typeType[T]: The type of the object that will be returned by the indexer. Typically, this will be a MicroFrame or similar class that can be initialized from a structured array.

class microframe.core.indexers.StructuredArrayIndexer(values: ndarray, columns: ndarray)#

Bases: object

A class for indexing into numpy structured arrays using both row and column indices.

Parameters:

values (numpy.ndarray) – The numpy structured array to be indexed.
columns (numpy.ndarray) – Column names corresponding to the data.

Manipulators Module#

The manipulators submodule contains tools to manipulate data structures within Microframe.

exception microframe.core.manipulators.ArrayManipulationError#

Bases: Exception

Raised when there’s an error during Structured Array manipulation

class microframe.core.manipulators.StructuredArrayManipulator(values: ndarray, columns: ndarray)#

Bases: object

A class for manipulating numpy structured arrays.

Parameters:

values (numpy.ndarray) – The numpy structured array to be manipulated.
columns (numpy.ndarray) – Column names corresponding to the data.

change_dtypes(dtypes_dict: dict) → None#

Changes the data types of specified columns in the structured array.

Parameters:: dtypes_dict (dict) – A dictionary mapping column names to their new data types.
Raises:: ArrayManipulationError – If the column doesn’t exist or the type conversion is invalid.

rename(new_columns: dict) → None#

Renames columns in the structured array based on a provided mapping.

Parameters:: new_columns (dict) – A dictionary mapping old column names to new column names.
Raises:: ArrayManipulationError – If the old column name doesn’t exist or the new column name already exists.

to_numpy()#

Converts the structured array to a regular 2D NumPy array (matrix).

This conversion will result in a 2D NumPy array with each column corresponding to a field in the structured array. All fields must be of a type that can be cast to a common dtype.

Returns:: A 2D NumPy array representation of the structured array.
Return type:: numpy.ndarray
Raises:: ArrayManipulationError – If the conversion is not possible due to incompatible data types.

Printers Module#

The printers submodule provides utilities to print data structures and results in a user-friendly format.

class microframe.core.printers.StructuredDataPrinter(values, columns, max_value_length=20)#

Bases: object

A class for displaying numpy structured arrays or lists of tuples in a tabular format.

Parameters:

values (numpy.ndarray or list) – The data to be printed.
columns (list) – Column names corresponding to the data.
max_value_length (int, optional) – Maximum display length for cell values, defaults to 20.

Variables:

values – The data to be printed.
columns – Column names corresponding to the data.
max_value_length – Maximum display length for cell values.

Example:

>>> import numpy as np
>>> # Initialize a numpy structured array for data
>>> data = np.array([(0, 10. , 'Item 1'), (1, 20.5, 'Item 2'), (2, 30.2, 'Item 3'), (3, 40.8, 'Item 4'),
>>>                 (4, 50.1, 'Item 5')], dtype=[('id', '<i4'), ('value', '<f4'), ('description', '<U25')])
>>> # Initialize normal numpy array for columns
>>> columns = np.array(['id', 'value', 'description'])
>>> printer = StructuredDataPrinter(data, columns)

structured_print(max_width=80, num_cols=None, num_rows=10, tail=False)#

Prints the data in a tabular format with configurable display options.

Parameters:

max_width (int, optional) – Maximum width of the table, defaults to 80.
num_cols (int, optional) – Number of columns to display, defaults to None.
num_rows (int, optional) – Number of rows to display, defaults to 10.
tail (bool, optional) – Whether to display the last rows instead of the first, defaults to False.

Example:

>>> printer.structured_print() # Print the data in the default tabular format.
>>> printer.structured_print(max_width=50) # Print the data with a constrained width
>>> printer.structured_print(num_cols=3) # Print 3 columns of data.
>>> printer.structured_print(num_rows=15) # Print the first 15 rows of data
>>> printer.structured_print(num_rows=5, tail=True) # If you want to see the end of a large dataset, use the `tail` parameter to print the last rows.