NotesWhat is notes.io?

Notes brand slogan

Notes - notes.io

A Look at the Data

In order to get a better understanding of the data we will be looking at throughout this lesson, let's take a look at some of the characteristics of the dataset.

First, let's read in the data and necessary libraries.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import ALookAtTheData as t

from IPython import display

%matplotlib inline



df = pd.read_csv('./survey_results_public.csv')

df.head()

Respondent Professional ProgramHobby Country University EmploymentStatus FormalEducation MajorUndergrad HomeRemote CompanySize ... StackOverflowMakeMoney Gender HighestEducationParents Race SurveyLong QuestionsInteresting QuestionsConfusing InterestedAnswers Salary ExpectedSalary
0 1 Student Yes, both United States No Not employed, and not looking for work Secondary school NaN NaN NaN ... Strongly disagree Male High school White or of European descent Strongly disagree Strongly agree Disagree Strongly agree NaN NaN
1 2 Student Yes, both United Kingdom Yes, full-time Employed part-time Some college/university study without earning ... Computer science or software engineering More than half, but not all, the time 20 to 99 employees ... Strongly disagree Male A master's degree White or of European descent Somewhat agree Somewhat agree Disagree Strongly agree NaN 37500.0
2 3 Professional developer Yes, both United Kingdom No Employed full-time Bachelor's degree Computer science or software engineering Less than half the time, but at least one day ... 10,000 or more employees ... Disagree Male A professional degree White or of European descent Somewhat agree Agree Disagree Agree 113750.0 NaN
3 4 Professional non-developer who sometimes write... Yes, both United States No Employed full-time Doctoral degree A non-computer-focused engineering discipline Less than half the time, but at least one day ... 10,000 or more employees ... Disagree Male A doctoral degree White or of European descent Agree Agree Somewhat agree Strongly agree NaN NaN
4 5 Professional developer Yes, I program as a hobby Switzerland No Employed full-time Master's degree Computer science or software engineering Never 10 to 19 employees ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 154 columns

As you work through the notebook(s) in this and future parts of this program, you will see some consistency in how to test your solutions to assure they match what we achieved! In every environment, there is a solution file and a test file. There will be checks for each solution built into each notebook, but if you get stuck, you may also open the solution notebook to see how we find any of the solutions. Let's take a look at an example.
Question 1

1. Provide the number of rows and columns in this dataset.

# We solved this one for you by providing the number of rows and columns:

# You can see how we are prompted that we solved for the number of rows and cols correctly!



num_rows = df.shape[0] #Provide the number of rows in the dataset

num_cols = df.shape[1] #Provide the number of columns in the dataset



t.check_rows_cols(num_rows, num_cols)

Nice job there are 19102 rows in the dataset!
Nice job there are 154 columns in the dataset!

# If we made a mistake - a different prompt will appear



flipped_num_rows = df.shape[1] #Provide the number of rows in the dataset

flipped_num_cols = df.shape[0] #Provide the number of columns in the dataset



t.check_rows_cols(flipped_num_rows, flipped_num_cols)

t.rows(df,rows)

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-3bd0d2c0c02b> in <module>()
----> 1 t.rows(df,rows)

NameError: name 't' is not defined

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import ALookAtTheData as t

from IPython import display

%matplotlib inline



df = pd.read_csv('./survey_results_public.csv')

df.head()

df.columns[np.sum(df.isnull())/df.shape[0]>.75]

#satir = df.shape[0]

#satir





Index(['YearsCodedJobPast', 'WebDeveloperType', 'MobileDeveloperType',
'NonDeveloperType', 'ExCoderReturn', 'ExCoderNotForMe',
'ExCoderBalance', 'ExCoder10Years', 'ExCoderBelonged', 'ExCoderSkills',
'ExCoderWillNotCode', 'ExCoderActive', 'TimeAfterBootcamp',
'ExpectedSalary'],
dtype='object')

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import ALookAtTheData as t

from IPython import display

%matplotlib inline



df = pd.read_csv('./survey_results_public.csv')

df.head()

Respondent Professional ProgramHobby Country University EmploymentStatus FormalEducation MajorUndergrad HomeRemote CompanySize ... StackOverflowMakeMoney Gender HighestEducationParents Race SurveyLong QuestionsInteresting QuestionsConfusing InterestedAnswers Salary ExpectedSalary
0 1 Student Yes, both United States No Not employed, and not looking for work Secondary school NaN NaN NaN ... Strongly disagree Male High school White or of European descent Strongly disagree Strongly agree Disagree Strongly agree NaN NaN
1 2 Student Yes, both United Kingdom Yes, full-time Employed part-time Some college/university study without earning ... Computer science or software engineering More than half, but not all, the time 20 to 99 employees ... Strongly disagree Male A master's degree White or of European descent Somewhat agree Somewhat agree Disagree Strongly agree NaN 37500.0
2 3 Professional developer Yes, both United Kingdom No Employed full-time Bachelor's degree Computer science or software engineering Less than half the time, but at least one day ... 10,000 or more employees ... Disagree Male A professional degree White or of European descent Somewhat agree Agree Disagree Agree 113750.0 NaN
3 4 Professional non-developer who sometimes write... Yes, both United States No Employed full-time Doctoral degree A non-computer-focused engineering discipline Less than half the time, but at least one day ... 10,000 or more employees ... Disagree Male A doctoral degree White or of European descent Agree Agree Somewhat agree Strongly agree NaN NaN
4 5 Professional developer Yes, I program as a hobby Switzerland No Employed full-time Master's degree Computer science or software engineering Never 10 to 19 employees ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 154 columns

# If you want to know more about what the test function is expecting,

# you can read the documentation the same way as any other funtion



t.check_rows_cols?

Now that you are familiar with how to test your code - let's have you answer your first question:
Question 2

2. Which columns had no missing values? Provide a set of column names that have no missing values.

no_nulls = #Provide a set of columns with 0 missing values.



display.HTML(t.no_null_cols(no_nulls))

File "<ipython-input-5-f8e7a5bdec79>", line 1
no_nulls = #Provide a set of columns with 0 missing values.
^
SyntaxError: invalid syntax


Question 3

3. Which columns have the most missing values? Provide a set of column names that have more than 75% if their values missing.

most_missing_cols = #Provide a set of columns with more than 75% of the values missing



t.most_missing_cols(most_missing_cols)

Question 4

4. Provide a pandas series of the different Professional status values in the dataset along with the count of the number of individuals with each status. Store this pandas series in status_vals. If you are correct, you should see a bar chart of the proportion of individuals in each status.

status_vals = #Provide a pandas series of the counts for each Professional status



# The below should be a bar chart of the proportion of individuals in each professional category if your status_vals

# is set up correctly.



(status_vals/df.shape[0]).plot(kind="bar");

plt.title("What kind of developer are you?");

File "<ipython-input-7-5208cc44b1b6>", line 1
status_vals = #Provide a pandas series of the counts for each Professional status
^
SyntaxError: invalid syntax


Question 5

5. Provide a pandas series of the different FormalEducation status values in the dataset along with the count of how many individuals received that formal education. Store this pandas series in ed_vals. If you are correct, you should see a bar chart of the proportion of individuals in each status.

ed_vals = #Provide a pandas series of the counts for each FormalEducation status



# The below should be a bar chart of the proportion of individuals in your ed_vals

# if it is set up correctly.



(ed_vals/df.shape[0]).plot(kind="bar");

plt.title("Formal Education");

File "<ipython-input-64-1d5cd852c4ac>", line 1
ed_vals = #Provide a pandas series of the counts for each FormalEducation status
^
SyntaxError: invalid syntax


Question 6

6. Provide a pandas series of the different Country values in the dataset along with the count of how many individuals are from each country. Store this pandas series in count_vals. If you are correct, you should see a bar chart of the proportion of individuals in each country.

count_vals = #Provide a pandas series of the counts for each Country



# The below should be a bar chart of the proportion of the top 10 countries for the

# individuals in your count_vals if it is set up correctly.



(count_vals[:10]/df.shape[0]).plot(kind="bar");

plt.title("Country");

File "<ipython-input-40-50467cb692bf>", line 1
count_vals = #Provide a pandas series of the counts for each Country
^
SyntaxError: invalid syntax


Feel free to explore the dataset further to gain additional familiarity with the columns and rows in the dataset. You will be working pretty closely with this dataset throughout this lesson.
     
 
what is notes.io
 

Notes.io is a web-based application for taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000 notes created and continuing...

With notes.io;

  • * You can take a note from anywhere and any device with internet connection.
  • * You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
  • * You can quickly share your contents without website, blog and e-mail.
  • * You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
  • * Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 12 years and has been free since the day it was started.


You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;


Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio



Regards;
Notes.io Team

     
 
Shortened Note Link
 
 
Looding Image
 
     
 
Long File
 
 

For written notes was greater than 18KB Unable to shorten.

To be smaller than 18KB, please organize your notes, or sign in.