Regression
Linear Regression
Linear regression is a model that estimates the linear relationship between one or more independent variables and a dependent variable. These models are typically fitted with a least squares approach. This approach aims to minimize the distance between a fitted value and an observed value (minimizing the sum of squares of the residuals). After fitting a linear regression model, one can supply new data points X to predict y.
Logistic Regression
Logistic regression in machine learning is a supervised learning method involving a binary outcome. It is used to model the probability of a certain class or event occurring based on one or more independent variables. The output of a logistic regression model is a probability score between 0 and 1. This score can be converted into a binary outcome by setting a threshold. If the probability is greater than the threshold, the outcome is predicted as 1; otherwise, it is predicted as 0.
Similarities between Linear and Logistic Regression
- Both are types of regression models used in supervised learning.
- Both models involve estimating the relationship between independent and dependent variables.
- Both models can be used to make predictions based on new data points.
Differences between Linear and Logistic Regression
- Linear regression is used for continuous outcomes, while logistic regression is used for binary outcomes.
- The output of linear regression is continuous, while the output of logistic regression is a probability score between 0 and 1.
- Linear regression uses the least squares approach to fit the model, while logistic regression uses maximum likelihood estimation.
Logistic Regression and the Sigmoid Function
How does logistic regression use the sigmoid function to model the probability of a binary outcome?
Logistic regression begins with a linear combination of the input features and model coefficients. This linear combination is then passed through the sigmoid function, also known as the logistic function. The sigmoid function maps the linear combination to a probability score between 0 and 1. This probability score is then passed through a threshold (0.5, in our model) to make a binary prediction. In our example, a 1 indicates that Snow did occur, whereas 0 indicates that snow did not.
Mathematically, the sigmoid function is defined as:, where is the linear combination of input features and model coefficients.
Maximum Likelihood Estimation in Logistic Regression
How is maximum likelihood estimation used in logistic regression?
In logistic regression, the maximum likelihood function is used to determine the best-fitting parameters (coefficients) of the model. In other words, we seek to find the parameter that maximizes the likelihood of observing the data given the model.
Logistic Regression vs. Multinomial Naive Bayes
As mentioned in the notebook, multinomial naive bayes is not the ideal model selection for this problem, given that our data is not discrete and is more normally distributed. Naive Bayes also assumes feature independence, which typically does not hold in our highly correlated, weather related data. This is reflected in the results, where our logistic regression model achieves an accuracy of above 95.7%, while the multinomial naive bayes achieves 86%. Logistic regression aligns well with our dataset’s properties, whereas multinomial Naive Bayes, designed for discrete, count-based data, performs less effectively.
A couple key takeaways: when analyzing the logistic regression coefficients, it appears that low temperature is the number one highest weighted variable to determine snow, which intuitively makes sense. Soil moisture at 8ft also appears to be a strong "predictor" of whether it snows; however, soil moisture is expected to be seen as a reaction to precipitation rather than a predictor of it. This prompts further investigation into the soil moisture variables, and whether they are lagged or need to be lagged to some degree. Lastly, logistic regression outperforms the multinomial naive Bayes results in true positives, true negatives, and minimizes false positives and false negatives more effectively.