mlektic.preprocessing package
Submodules
mlektic.preprocessing.dataframes_utils module
- mlektic.preprocessing.dataframes_utils.pd_dataset(df: DataFrame, input_columns: List[str], output_column: str, train_fraction: float, shuffle: bool = True, random_seed: int = 42, normalize: bool = False, normalization_type: str = 'standard') Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]] [source]
Prepares train and test datasets from a pandas DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing the data.
input_columns (List[str]) – List of column names to be used as inputs.
output_column (str) – Column name to be used as the output/target.
train_fraction (float) – Fraction of data to be used for training.
shuffle (bool, optional) – Whether to shuffle the data before splitting. Default is True.
random_seed (int, optional) – Seed for random number generator. Default is 42.
normalize (bool, optional) – Whether to normalize the input data. Default is False.
normalization_type (str, optional) – Type of normalization (‘standard’ or ‘minmax’). Default is ‘standard’.
- Returns:
Tuple of train and test datasets.
- Return type:
Tuple[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]
- Raises:
ValueError – If normalization_type is not supported.
- mlektic.preprocessing.dataframes_utils.pl_dataset(df: DataFrame, input_columns: List[str], output_column: str, train_fraction: float, shuffle: bool = True, random_seed: int = 42, normalize: bool = False, normalization_type: str = 'standard') Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]] [source]
Prepares train and test datasets from a polars DataFrame.
- Parameters:
df (pl.DataFrame) – DataFrame containing the data.
input_columns (List[str]) – List of column names to be used as inputs.
output_column (str) – Column name to be used as the output/target.
train_fraction (float) – Fraction of data to be used for training.
shuffle (bool, optional) – Whether to shuffle the data before splitting. Default is True.
random_seed (int, optional) – Seed for random number generator. Default is 42.
normalize (bool, optional) – Whether to normalize the input data. Default is False.
normalization_type (str, optional) – Type of normalization (‘standard’ or ‘minmax’). Default is ‘standard’.
- Returns:
Tuple of train and test datasets.
- Return type:
Tuple[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]
- Raises:
ValueError – If normalization_type is not supported.