Skip to content Skip to sidebar Skip to footer

Univariate Regression In Python

Need to run multiple single-factor (univariate) regression models in python between a column in a dataframe and several other columns in the same dataframe - so based on the image

Solution 1:

There are two options you can use here. One is the popular scikit-learn library. It is used as follows

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)  # where X is your feature data and y is your target
reg.score(X, y)  # R^2 value
>>> 0.87
reg.coef_  # slope coeficients
>>> array([1.45, -9.2])
reg.intercept_  # intercept
>>> 6.1723...

There are not many other statistics you can use with scikit.

Another option is statsmodels which offers far richer detail into the statistics of the model

import numpy as np
import statsmodels.api as sm

# generate some synthetic data
nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)

X = sm.add_constant(X)
y = np.dot(X, beta) + e

# fit the model and get a summary of the statistics
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.020e+06
Date:                Mon, 08 Jul 2019   Prob (F-statistic):          2.83e-239
Time:                        02:07:22   Log-Likelihood:                -146.51
No. Observations:                 100   AIC:                             299.0
Df Residuals:                      97   BIC:                             306.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.3423      0.313      4.292      0.000       0.722       1.963
x1            -0.0402      0.145     -0.278      0.781      -0.327       0.247
x2            10.0103      0.014    715.745      0.000       9.982      10.038
==============================================================================
Omnibus:                        2.042   Durbin-Watson:                   2.274
Prob(Omnibus):                  0.360   Jarque-Bera (JB):                1.875
Skew:                           0.234   Prob(JB):                        0.392
Kurtosis:                       2.519   Cond. No.                         144.
==============================================================================

You can see that statsmodels offer much more details, such as the AIC, BIC, t-statistics etc.


Post a Comment for "Univariate Regression In Python"