NotesWhat is notes.io?

Notes brand slogan

Notes - notes.io

Imputing Values

You now have some experience working with missing values, and imputing based on common methods. Now, it is your turn to put your skills to work in being able to predict for rows even when they have NaN values.

First, let's read in the necessary libraries, and get the results together from what you achieved in the previous attempt.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import r2_score, mean_squared_error

import ImputingValues as t

import seaborn as sns

%matplotlib inline



df = pd.read_csv('./survey_results_public.csv')

df.head()



#Only use quant variables and drop any rows with missing values

num_vars = df[['Salary', 'CareerSatisfaction', 'HoursPerWeek', 'JobSatisfaction', 'StackOverflowSatisfaction']]

df_dropna = num_vars.dropna(axis=0)



#Split into explanatory and response variables

X = df_dropna[['CareerSatisfaction', 'HoursPerWeek', 'JobSatisfaction', 'StackOverflowSatisfaction']]

y = df_dropna['Salary']



#Split into train and test

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .30, random_state=42)



lm_model = LinearRegression(normalize=True) # Instantiate

lm_model.fit(X_train, y_train) #Fit



#Predict and score the model

y_test_preds = lm_model.predict(X_test)

"The r-squared score for your model was {} on {} values.".format(r2_score(y_test, y_test_preds), len(y_test))

Question 1

1. As you may remember from an earlier analysis, there are many more salaries to predict than the values shown from the above code. One of the ways we can start to make predictions on these values is by imputing items into the X matrix instead of dropping them.

Using the num_vars dataframe drop the rows with missing values of the response (Salary) - store this new dataframe in drop_sal_df, then impute the values for all the other missing values with the mean of the column - store this in fill_df.

drop_sal_df = #Drop the rows with missing salaries



# test look

drop_sal_df.head()

#Check that you dropped all the rows that have salary missing

t.check_sal_dropped(drop_sal_df)

fill_df = #Fill all missing values with the mean of the column.



# test look

fill_df.head()

#Check your salary dropped, mean imputed datafram matches the solution

t.check_fill_df(fill_df)

Question 2

2. Using fill_df, predict Salary based on all of the other quantitative variables in the dataset. You can use the template above to assist in fitting your model:

Split the data into explanatory and response variables
Split the data into train and test (using seed of 42 and test_size of .30 as above)
Instantiate your linear model using normalized data
Fit your model on the training data
Predict using the test data
Compute a score for your model fit on all the data, and show how many rows you predicted for

Use the tests to assure you completed the steps correctly.

#Split into explanatory and response variables



#Split into train and test



#Predict and score the model



#Rsquared and y_test

rsquared_score = #r2_score

length_y_test = #num in y_test



"The r-squared score for your model was {} on {} values.".format(rsquared_score, length_y_test)

# Pass your r2_score, length of y_test to the below to check against the solution

t.r2_y_test_check(rsquared_score, length_y_test)

This model still isn't great. Let's see if we can't improve it by using some of the other columns in the dataset.
     
 
what is notes.io
 

Notes.io is a web-based application for taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000 notes created and continuing...

With notes.io;

  • * You can take a note from anywhere and any device with internet connection.
  • * You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
  • * You can quickly share your contents without website, blog and e-mail.
  • * You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
  • * Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 12 years and has been free since the day it was started.


You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;


Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio



Regards;
Notes.io Team

     
 
Shortened Note Link
 
 
Looding Image
 
     
 
Long File
 
 

For written notes was greater than 18KB Unable to shorten.

To be smaller than 18KB, please organize your notes, or sign in.