Getting Started with Logistic Regression

With Mlektic, you can perform univariate or multivariate logistic regression (with one or more classes), using the log loss method or by considering various optimization options available in the optimizer_archt module.

Supported optimization methods include: - ‘sgd-standard’ - ‘sgd-stochastic’ - ‘sgd-mini-batch’ - ‘sgd-momentum’ - ‘nesterov’ - ‘adagrad’ - ‘adadelta’ - ‘rmsprop’ - ‘adam’ - ‘adamax’ - ‘nadam’

For more details on the optimizer_archt module, please refer to the optimizer_archt documentation.

You can also apply regularization to improve model generalization. The regularizer_archt module supports the following regularization methods: - ‘l1’ (default) - ‘l2’ - ‘elastic_net’

To learn more about the regularizer_archt module, please refer to the regularizer_archt documentation.

For example, you can train a model using logistic regression with standard gradient descent and L1 regularization with the LogisticRegressionArcht module as follows:

import pandas as pd
import numpy as np
from mlektic.logistic_reg import LogisticRegressionArcht
from mlektic import preprocessing
from mlektic import methods

# Generate random data.
np.random.seed(42)
n_samples = 100
feature1 = np.random.rand(n_samples)
feature2 = np.random.rand(n_samples)
target = (3 * feature1 + 5 * feature2 + np.random.randn(n_samples) * 0.5) > 4.0
target = target.astype(np.float32)

# Create pandas dataframe from the data.
df = pd.DataFrame({
    'feature1': feature1,
    'feature2': feature2,
    'target': target
})

# Create train and test sets.
train_set, test_set = preprocessing.pd_dataset(df, ['feature1', 'feature2'], 'target', 0.8)

# Define regularizer and optimizer.
regularizer = methods.regularizer_archt('l1', lambda_value=0.01)
optimizer = methods.optimizer_archt('sgd-standard', learning_rate=0.1)

# Configure the model.
log_reg = LogisticRegressionArcht(iterations=1000, optimizer=optimizer, regularizer=regularizer)

# Train the model.
log_reg.train(train_set)
Epoch 100, Loss: 0.5152596235275269, Accuracy: 0.862500011920929
Epoch 200, Loss: 0.4489741921424866, Accuracy: 0.862500011920929
Epoch 300, Loss: 0.4166463613510132, Accuracy: 0.875
Epoch 400, Loss: 0.39809101819992065, Accuracy: 0.887499988079071
Epoch 500, Loss: 0.38631850481033325, Accuracy: 0.887499988079071
Epoch 600, Loss: 0.37834054231643677, Accuracy: 0.887499988079071
Epoch 700, Loss: 0.37267810106277466, Accuracy: 0.875
Epoch 800, Loss: 0.3685190677642822, Accuracy: 0.875
Epoch 900, Loss: 0.36538296937942505, Accuracy: 0.875
Epoch 1000, Loss: 0.362968385219574, Accuracy: 0.875

To learn more about the LogisticRegressionArcht module, please refer to the LogisticRegressionArcht documentation.

The cost evolution can be plotted with the plot_cost method:

from mlektic.plot_utils import plot_cost

cost_history = log_reg.get_cost_history()
plot_cost(cost_history, dim=(7, 5))

cost plot

Different evaluation metrics can be obtained:

categorical_crossentropy = log_reg.eval(test_set, 'categorical_crossentropy')
accuracy = log_reg.eval(test_set, 'accuracy')
precision = log_reg.eval(test_set, 'precision')
recall = log_reg.eval(test_set, 'recall')
f1_score = log_reg.eval(test_set, 'f1_score')
confusion_matrix = log_reg.eval(test_set, 'confusion_matrix')

print(f'Categorical Crossentropy: {categorical_crossentropy}')
print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1 Score: {f1_score}')
print(f'Confusion Matrix: \n{confusion_matrix}')
Categorical Crossentropy: 0.22856256365776062
Accuracy: 0.949999988079071
Precision: 1.0
Recall: 0.9090909361839294
F1 Score: 0.952380895614624
Confusion Matrix:
[[10.  0.]
[ 1.  9.]]

Print the parameters obtained by training:

print("Weights:", log_reg.get_parameters())
print("Intercept:", log_reg.get_intercept())
Weights: [[-1.506539   1.506539 ]
[-3.4472232  3.4472237]]
Intercept: [ 2.4116907 -2.4116907]

And make predictions:

prob_prediction = log_reg.predict_prob([2.0, 3.0])

print(f'Predicted probability for class 0: {prob_prediction[0][0]}')
print(f'Predicted probability for class 1: {prob_prediction[0][1]}')
Predicted probability for class 0: 3.1259400623540046e-10
Predicted probability for class 1: 1.0
class_prediction = log_reg.predict_class([2.0, 3.0])

print('Predicted class:', class_prediction[0])
Predicted class: 1

Finally, you can save the model parameters in a JSON format for future use:

log_reg.save_model('logistic_regression_model.json')