Code Along
/module-1
We’ll take this part easily!
Our aim: How well can we predict penguins’ species based on their measured characteristics? Let’s simplify our task (for now) to just distinguishing between Adelie and Gentoo.
Loading, setting up
Fit model
just example code
import pandas as pd
import statsmodels.api as sm
# Load and preprocess the data
starwars = pd.read_csv('path_to_starwars.csv') # Load your data file
starwars['species_human'] = starwars['species'].apply(lambda x: 'Human' if x == 'Human' else 'Not human')
starwars['species_human'] = starwars['species_human'].astype('category').cat.codes
# Regression model
X_reg = starwars[['height', 'mass']]
y_reg = starwars['species_human']
X_reg = sm.add_constant(X_reg) # Add a constant term for the intercept
reg_model = sm.Logit(y_reg, X_reg).fit()
print(reg_model.summary())
Loading, setting up
Engineer features
Specify recipe, model, and workflow
just example code
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
# Load and preprocess the data
starwars = pd.read_csv('path_to_starwars.csv') # Load your data file
starwars['species_human'] = starwars['species'].apply(lambda x: 'Human' if x == 'Human' else 'Not human')
# Prepare the features and target
X = starwars[['height', 'mass']]
y = starwars['species_human']
# Specify model and fit
clf = LogisticRegression()
clf.fit(X, y)
# Evaluate accuracy on the training data
y_pred = clf.predict(X)
print(classification_report(y, y_pred))