mlektic.preprocessing package

Submodules

mlektic.preprocessing.dataframes_utils module

mlektic.preprocessing.dataframes_utils.pd_dataset(df: DataFrame, input_columns: List[str], output_column: str, train_fraction: float, shuffle: bool = True, random_seed: int = 42, normalize: bool = False, normalization_type: str = 'standard') Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]][source]

Prepares train and test datasets from a pandas DataFrame.

Parameters:
  • df (pd.DataFrame) – DataFrame containing the data.

  • input_columns (List[str]) – List of column names to be used as inputs.

  • output_column (str) – Column name to be used as the output/target.

  • train_fraction (float) – Fraction of data to be used for training.

  • shuffle (bool, optional) – Whether to shuffle the data before splitting. Default is True.

  • random_seed (int, optional) – Seed for random number generator. Default is 42.

  • normalize (bool, optional) – Whether to normalize the input data. Default is False.

  • normalization_type (str, optional) – Type of normalization (‘standard’ or ‘minmax’). Default is ‘standard’.

Returns:

Tuple of train and test datasets.

Return type:

Tuple[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]

Raises:

ValueError – If normalization_type is not supported.

mlektic.preprocessing.dataframes_utils.pl_dataset(df: DataFrame, input_columns: List[str], output_column: str, train_fraction: float, shuffle: bool = True, random_seed: int = 42, normalize: bool = False, normalization_type: str = 'standard') Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]][source]

Prepares train and test datasets from a polars DataFrame.

Parameters:
  • df (pl.DataFrame) – DataFrame containing the data.

  • input_columns (List[str]) – List of column names to be used as inputs.

  • output_column (str) – Column name to be used as the output/target.

  • train_fraction (float) – Fraction of data to be used for training.

  • shuffle (bool, optional) – Whether to shuffle the data before splitting. Default is True.

  • random_seed (int, optional) – Seed for random number generator. Default is 42.

  • normalize (bool, optional) – Whether to normalize the input data. Default is False.

  • normalization_type (str, optional) – Type of normalization (‘standard’ or ‘minmax’). Default is ‘standard’.

Returns:

Tuple of train and test datasets.

Return type:

Tuple[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]

Raises:

ValueError – If normalization_type is not supported.

Module contents