Logistic Regression Introduction
Building this page….
Intro: (Binary Classification)
Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of the dependent variable. In logistic regression, the dependent variable is a binary value of either 1 or 0.
Intro: (Multi-Class)
Multi-class logistic regression consist of more two different classes. There are three methods that can be used for multi-class classification:
- *1 vs. All: separates one class at a time by binary classification.
Disadvantages: many ways to divide, hence it can result in different labels when starting from different groups.
- 1 vs. 1: separate each possible pairs of classes and uses a “vote” to decide the predicted labels.
- L-class determinants: L number of linear functions are considered.
Dataset
We will use the pre-existing IRIS data set from the sklearn package. The iris data set consists of 150 data points, each point a 4 dimensional vector. For more info
We can load up the data usig the following command:
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
The class label (iris.target) for the data set is 0, 1, 2 (which signifies whether it classifies as either Iris setosa, Iris virginica, or Iris versicolor). Hence, we need to reformulate the class labels 0,1,2 in iris.target to (1,0,0), (0,1,0), (0,0,1) as following:
Y_old = iris.target # original class labels of 0,1,2
Y_new=np.zeros([len(Y_old),3]) # store new labels
for i in range(len(Y_old)):
if Y_old[i]==0:
Y_new[i,:]=[1,0,0]
elif Y_old[i]==1:
Y_new[i,:]=[0,1,0]
else:
Y_new[i,:]=[0,0,1]