55. Sample Coding Midterm.#
This is an old Midterm from a few years ago. This year will be about 16.5% harder.
WARNING!!! If you see this icon on the top of your COLAB sesssion, your work is not saved automatically.
Save your working file in Google drive so that all changes will be saved as you work. MAKE SURE that your final version is saved to GitHub.
Before you turn this in, make sure everything runs as expected. First, restart the kernel (in the menu, select Kernel → Restart) and then run all cells (in the menubar, select Cell → Run All). They should run completely without intervention…i.e., DO NOT not manually upload any files. Use the wget
command to retreive files as necesssary.
55.1. This is a 50 point assignment.#
You may find it useful to go through the notebooks from the course materials when doing these exercises.
If you receive assistance from anyone in the class it it will be considered an ethical violation and referred to associate dean.
import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/rpi-techfundamentals/website_fall_2021/master/site/public/midterm2.csv")
data.head()
v1 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9 | v10 | group | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 453.521369 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 151.919191 |
2 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | -141.666991 |
3 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | -227.245474 |
4 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -130.752790 |
55.2. (15 points) 1. Predict Group Using Different Sets of IVs#
Predict the group
variable from v1-v5
and then from v6-v10
using k Nearest Neighbor and all of the data (i.e., don’t do train test split) and the default hyperparameters. IGNORE THE target
variable for now.
accuracy_v1_5
accuracy_v6_10
accuracy_v1_5 =
accuracy_v6_10 =
55.3. (10 points) 2. Null model#
What would the accuracy of the null/naive model be? Set it accuracy_null
.
How would you interpret the model for accuracy_v1_5
, accuracy_v6_10
, vs the null model.
#Enter this to 1 decimal place. (i.e., not string)
accuracy_null= 1.1 #included as example.
accuracy_null
one_interpretation = """
Answer here.
"""
55.4. (15 points) 3. Perform linear regression using SciKit Learn.#
Perform two regression analyses.
For for analysis1
select the independent variables v1-v10
(all v variables) and group
. Calculate the r2 (r2_analysis1
) for the linear regression with the target variable.
For for analysis2
select the independent variables v1-v10
(all v variables) and filter out to only include group ==1
. Calculate the r2 r2_analysis2
for the linear regression with the target variable.
Print r2_analysis1
and r2_analysis2
to make sure they are set.
#Print r2_analysis1 and r2_analysis2 to make sure they are set.
print(r2_analysis1, r2_analysis2)
55.5. (10 points) Train Test Split#
Using the random_state=99
do a 50 50 train test split of only variables v1-v10
and the target
for y. Your split should create the following
train_X
, test_X
, train_y
, test_y
train_X