NotesWhat is notes.io?

Notes brand slogan

Notes - notes.io

.LOG

________________________
Machine learning intro lecture
2:15 pm 26/10/2021


Difference between Supervised learning models.

1) Numeric:
- The thing that you're trying to predict is a number, it is a "regressor".
- Sometimes, the thing you have to predict has more than 1 class. This would be called a multi-class classification problem

2) Categorical:
- The thing you're trying to predict is not numerical, it is a "classifier"

--------------
Steps into making a machine learning prediction?
1) Load the dataset
2) Summarize the dataset
3) Visualize the dataset
4) Evaluate some algorithms
5) Make some predictions
( There are actually a lot more steps that are noted in future lectures )

After we load up the dataset, we DO NOT immediately jump to the model building because
a) We don't know what kind of data we're working with
b) More importantly, We NEED to clean the dataset. Like the cher said, garbage in, garbage out. Which essentially means that if the dataset is not clean, the model may produce inaccuracies

Which is why we need to Explore, and Preprocess the dataset

--------------
Summarize and exploring the dataset:

To explore the data, We can use numbers or graphs.
Numerically:
- Using statistics to describe the dataset (mean, min, max, std)
Graphically:
- Using graphs to visualise the data and make it easier to look at.
(histogram, box plot)
- In graphs, there are univariate plots and multivariate plots. Univariate means just 1 variable, whereas multivariate plots means more than 1, like scatter plot. Which will have a value of correlation r.


Code for numeric exploration:
- dataset.shape
- This will return the number of rows, the number of columns
- dataset.head(x)
- This will return the first x rows of data. Used to take a peek at the data. the method ".tail(x)" will show the last x rows of data
- dataset.describe()
- This gives us a statistical summary for the numeric fields only
- dataset.groupby('x')
- This consolidates the fields under that column


Code for graphical exploration:

UNIVARIATE
- dataset.plot(kind ='box', subplots =True, layout =(2,2), sharex =False, sharey = False)
plt.show()
- This will plot a boxplot for the numerical fields
- dataset.hist()
plt.show()
- This will plot a histogram for the numerical fields

MULTIVARIATE
- scatter_matrix(dataset)
plt.show()
- This will plot a scatter plot for each pair of variables that are numeric. It will construct a histogram for the pair of variables that are the same.

--------------
Evaluate Algorithms

steps:
1) Separate out a validation dataset
2) Set-up the test harnes to use 10-fold cross validation
3) Build 5 different models to predic species from flower measurements
4) Select the best model.

At this point, I'm lazy to take notes. Just look at the python notebooks

________________________
More into Supervised Learning Lecture
2:38 pm 2/11/2021

What is supervised learning?
Supervised Learning is a method of machine learning where the computer learns how to perform a function by looking at labeled training data.

For example, we could determine the value of a house based on the property's location and characteristics.


How does it work?
We train the supervised learning model by showing it data and telling it what the correct value output should be for that data. The machine learning algorithm then uses the same data to work out the rules to reproduce those same results.

For example, if we show the model the numbers 2 and 2, and tell it the answer is 4... And then show it the numbers 3 and 5, and tell it the answer is 8, and so on and on. With a series of two numbers and their results, it will start to figure out how to do addition. Just by looking at enough examples, it will figure out that whenever it sees two numbers, the output is the sum of the two numbers. This is achieved through LINEAR REGRESSION. Once the model is trained, we can give it new data and it will give us an estimate of the outpt value for the new input data.

Linear regression is used for numeric fields only.

--------------
We also need to measure how accurate out model is. We can do this through Quantification. There's a formula but idk how to paste it here.

(Insert mean squared error formula here)

The lower the value of the result, the more right our predictions will be.
The higher the value of the result, the more wrong our predictions will be.


There's a nother common algorithm that we can use to help find the weight is GRADIENT DESCENT. It is an iterative algorithm that can be used to minimize the cost function and find best weights. It works by tweaking each of the weights a tiny bit at a time in the direction that will reduce the cost.

--------------
Linear regression can be used to predict values, percentages, positive/negative, angles, and many more as long as they are numeric in nature.

________________________
Supervised Learning 2
1:56 pm 16/11/2021

Common Python libraries

Python has some rich libraries like
- NumPy
- This is an efficient library for array and linear algebra functions
- scikit-learn
- A machine learning library that is not difficult to use. It also implements many popular machine learning algorithms.
- pandas
- short form for "panel data". Allows easy loading of large data sets similar to a spreadsheet, which makes it easy to work with large data sets exported as CSV files.

In machine learning, we often work with large arrays of data. These arrays are also known as vectors for each column of data because of the linear algebra roots of machine learning.

--------------
Decision trees

Decision trees are a powerful class of model that can achieve high accuracy in prediction. It builds a model by partitioning the dataset based on certain rules. It looks a lot like the DSAG quick sort function. Where it splits the things into 2 parts and continues to do so until it's done.



--------------
Gradient Boosting
     
 
what is notes.io
 

Notes.io is a web-based application for taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000 notes created and continuing...

With notes.io;

  • * You can take a note from anywhere and any device with internet connection.
  • * You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
  • * You can quickly share your contents without website, blog and e-mail.
  • * You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
  • * Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 12 years and has been free since the day it was started.


You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;


Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio



Regards;
Notes.io Team

     
 
Shortened Note Link
 
 
Looding Image
 
     
 
Long File
 
 

For written notes was greater than 18KB Unable to shorten.

To be smaller than 18KB, please organize your notes, or sign in.