Notes

Notes - notes.io

Exp 1: Applying Python Data Analytics Libraries for Data Exploration
Numpy
import numpy as np
data = np.array([10, 20, 30, 40, 50])
print("Mean:", np.mean(data))
print("Sum:", np.sum(data))
print("Max:", np.max(data))
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)
print(a * 2)
Pandas
import pandas as pd
data = {
'Name' : ['Alice', 'Bob', 'Charlie'],
'Marks' : [25, 30, 35],
}
df = pd.DataFrame(data)
print(df)
print("Average Marks:", df['Marks'].mean())
import pandas as pd
df = pd.read_csv('StudentsPerformance.csv')
print(df.head(2))
print(df.describe())
print(df.tail())
import pandas as pd
df = pd.read_csv('Iris (1).csv')
display(df)
print(df.head())
print(df.describe())
print(df.tail())
1. Create a dataset of students with marks in 3 subjects
2. Use NumPy to calculate total & average
3. Use Pandas to store data and find the top student
4. Use Matplotlib to plot average marks per student
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 1. Read CSV file
df = pd.read_csv('Marks.csv')
print("Original Data:")
print(df)
# 2. NumPy calculations
marks = df[['DVA', 'MLBI', 'NLP']].to_numpy()
df['Total'] = np.sum(marks, axis=1)
df['Average'] = np.mean(marks, axis=1)
print(df[['Student', 'Average']])
# 3. Find top-performing student
top_student = df.loc[df['Total'].idxmax()]
print("nTop Performing Student:", top_student['Student'])
# 4. Matplotlib Bar Chart
plt.bar(df['Student'], df['Average'])
plt.xlabel('Student')
plt.ylabel('Average Marks')
plt.title('Average Marks per Student')
plt.show()
Sales
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 1. Read CSV file
df = pd.read_csv('sales_data_sample.csv', encoding='latin-1')

print("Original Data:")
print(df.head())

# 2. Calculate Total Sales per Product
product_sales = df.groupby('PRODUCTLINE')['SALES'].sum()

print("nTotal Sales per Product:")
print(product_sales)

# 3. Find Top Performing Product
top_product = product_sales.idxmax()
top_sales = product_sales.max()

print("nTop Performing Product:")
print(f"Product: {top_product}")
print(f"Total Sales: {top_sales}")

# 4. Bar Chart of Product Sales
plt.bar(product_sales.index, product_sales.values)
plt.xlabel('Product')
plt.ylabel('Total Sales')
plt.title('Total Sales per Product')
plt.xticks(rotation=45)
plt.show()
Marks
import pandas as pd
df = pd.read_csv('Marks.csv')
print(df)
df.columns = df.columns.str.strip()
df['Total_Marks'] = df['DVA'] + df['MLBI'] + df['NLP']
display(df)
topper_student = df.loc[df['Total_Marks'].idxmax()]
print(f"Topper Student:n{topper_student}")
# Assuming maximum marks for each subject is 100
max_possible_marks = 3 * 100

tupper_percentage = (topper_student['Total_Marks'] / max_possible_marks) * 100
print(f"nTotal marks percentage for the topper: {tupper_percentage:.2f}%")
import numpy as np

# Zeros array
zeros_arr = np.zeros((2, 3))
print("Zeros array:")
print(zeros_arr)

# Ones array
ones_arr = np.ones((2, 3))
print("nOnes array:")
print(ones_arr)

# Range (arange) array
range_arr = np.arange(1, 10)
print("nArange array:")
print(range_arr)
Question: Section A: NumPy Assignments
Array Creation & Basics
Create a NumPy array of integers from 10 to 50.
Reshape it into a 5×9 matrix.
Find the minimum, maximum, mean, and standard deviation of the array.
Indexing & Slicing
From a 2D NumPy array, extract:
the first row
the last column
a 3×3 sub-matrix from the center
Explain the difference between slicing and fancy indexing.
Conditional Operations
Generate an array of 100 random integers between 1 and 100.
Replace all values greater than 80 with -1.
Count how many values were replaced.
DataFrame Creation
Create a DataFrame with columns: Student_ID, Name, AIDS, DVA, CC.
Add a new column Average that stores the average marks.
Data Selection & Filtering
From the DataFrame:

display students who scored more than 80 in Math
display students who failed (score < 40) in any subject
# ============================================
# Section A: NumPy & Pandas Assignments
# Google Colab Ready Code
# ============================================

# Import Libraries
import numpy as np
import pandas as pd

# =====================================================
# 1. Array Creation & Basics
# =====================================================

print("===== Array Creation & Basics =====")

# Create a NumPy array of integers from 10 to 50
arr = np.arange(10, 55) # 10 to 54 gives 45 elements

print("nOriginal Array:")
print(arr)

# Reshape into 5x9 matrix
matrix = arr.reshape(5, 9)

print("n5x9 Matrix:")
print(matrix)

# Find minimum, maximum, mean, and standard deviation
print("nMinimum Value:", np.min(matrix))
print("Maximum Value:", np.max(matrix))
print("Mean:", np.mean(matrix))
print("Standard Deviation:", np.std(matrix))

# =====================================================
# 2. Indexing & Slicing
# =====================================================

print("nn===== Indexing & Slicing =====")

# Create a sample 2D array
array_2d = np.arange(1, 26).reshape(5, 5)

print("n2D Array:")
print(array_2d)

# Extract first row
print("nFirst Row:")
print(array_2d[0])

# Extract last column
print("nLast Column:")
print(array_2d[:, -1])

# Extract 3x3 center sub-matrix
print("n3x3 Center Sub-Matrix:")
print(array_2d[1:4, 1:4])

# Difference between slicing and fancy indexing
print("nDifference Between Slicing and Fancy Indexing:")
print("""
1. Slicing:
- Uses ranges (:)
- Returns a view of the original array
- Example: arr[1:5]

2. Fancy Indexing:
- Uses lists or arrays of indices
- Returns a copy of the data
- Example: arr[[1, 3, 5]]
""")

# =====================================================
# 3. Conditional Operations
# =====================================================

print("nn===== Conditional Operations =====")

# Generate 100 random integers between 1 and 100
random_arr = np.random.randint(1, 101, 100)

print("nOriginal Random Array:")
print(random_arr)

# Count values greater than 80
count_replaced = np.sum(random_arr > 80)

# Replace values greater than 80 with -1
random_arr[random_arr > 80] = -1

print("nModified Array:")
print(random_arr)

print("nNumber of Values Replaced:", count_replaced)

# =====================================================
# 4. DataFrame Creation
# =====================================================

print("nn===== DataFrame Creation =====")

# Create DataFrame
data = {
"Student_ID": [101, 102, 103, 104, 105],
"Name": ["Amit", "Priya", "Rahul", "Sneha", "Karan"],
"AIDS": [85, 78, 92, 35, 67],
"DVA": [88, 90, 76, 45, 72],
"CC": [91, 84, 89, 30, 70]
}

df = pd.DataFrame(data)

# Add Average column
df["Average"] = df[["AIDS", "DVA", "CC"]].mean(axis=1)

print("nDataFrame:")
print(df)

# =====================================================
# 5. Data Selection & Filtering
# =====================================================

print("nn===== Data Selection & Filtering =====")

# Display students who scored more than 80 in AIDS
print("nStudents Scoring More Than 80 in AIDS:")
print(df[df["AIDS"] > 80])

# Display students who failed in any subject (score < 40)
print("nStudents Who Failed in Any Subject:")
failed_students = df[
(df["AIDS"] < 40) |
(df["DVA"] < 40) |
(df["CC"] < 40)
]

print(failed_students)

Exp 2: Applying data visualization techniques using Matplotlib and Seaborn in Python.
1. Line Chart
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [50, 30, 45, 25, 40]

plt.plot(x, y)
plt.title("Line Chart Example")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()

2. Bar Chart
import matplotlib.pyplot as plt

names = ["KINJAL", "NAVYA", "SHRIYA", "NIKHIL"]
marks = [65, 88, 76, 90]

plt.bar(names, marks, color='pink', alpha=1.0)
plt.title("Bar Chart Example")
plt.xlabel("Students")
plt.ylabel("Marks")
plt.show()

3. Histogram
import matplotlib.pyplot as plt
import numpy as np

data = [5, 8, 10, 12, 15, 18, 20, 23, 25, 25, 27, 30, 32, 33, 35, 40, 40, 40, 40, 43, 45, 50, 50, 50]

plt.hist(data, bins=10, edgecolor='black', alpha=0.7)
plt.title("Histogram Example")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

4. Pie Chart
import matplotlib.pyplot as plt

sizes = [40, 30, 20, 10]
labels = ["React js", "Node js", "Express js", "MongoDB"]

plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.title("Pie Chart Example")
plt.show()

5. Box Plot
import matplotlib.pyplot as plt

data = [-17, 10, 12, 13, 14, 15, 18, 20, 22, 25, 27, 30, 32, 35, 40, 80]

plt.boxplot(data)
plt.title("Box Plot Example")
plt.show()

6. Violin Plot
import matplotlib.pyplot as plt

data = [10, 11, 12, 13, 14, 15, 18, 19, 20, 25, 27, 30, 31, 32]

plt.violinplot(data)
plt.title("Violin Plot Example")
plt.show()

7. Scatter Plot
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

iris = pd.read_csv("Iris.csv")

sns.scatterplot(
x="SepalLengthCm",
y="SepalWidthCm",
hue="Species",
data=iris,
palette="RdPu"
)

plt.title("Iris Dataset - Scatter Plot")
plt.show()

8. FacetGrid Histogram Plot
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

g = sns.FacetGrid(iris, col="species", hue="species")
g.map(sns.histplot, "sepal_width")

plt.show()

9. Regression Plot (Sepal Length vs Sepal Width)
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

sns.regplot(x="sepal_length", y="sepal_width", data=iris)

plt.title("Regression Plot: Sepal Length vs. Sepal Width by Species")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Sepal Width (cm)")
plt.show()

10. Regression Plot (Petal Length vs Petal Width)
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

sns.regplot(x="petal_length", y="petal_width", data=iris)

plt.title("Regression Plot: Petal Length vs. Petal Width")
plt.xlabel("Petal Length (cm)")
plt.ylabel("Petal Width (cm)")
plt.show()

11. LM Plot (Petal Length vs Petal Width)
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

sns.lmplot(
x="petal_length",
y="petal_width",
hue="species",
data=iris,
palette="magma"
)

plt.title("LM Plot: Petal Length vs. Petal Width by Species")
plt.xlabel("Petal Length (cm)")
plt.ylabel("Petal Width (cm)")
plt.show()

12. LM Plot (Sepal Length vs Sepal Width)
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

sns.lmplot(
x="sepal_length",
y="sepal_width",
hue="species",
data=iris,
palette="magma"
)

plt.title("LM Plot: Sepal Length vs. Sepal Width by Species")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Sepal Width (cm)")
plt.show()

13. Seaborn Box Plot
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

sns.boxplot(
x="species",
y="sepal_length",
data=iris,
hue="species",
palette="viridis"
)

plt.title("Box Plot of Sepal Length by Species")
plt.xlabel("Species")
plt.ylabel("Sepal Length (cm)")
plt.show()

14. Line Plot
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

sns.lineplot(
x="species",
y="sepal_length",
hue="species",
data=iris,
palette="viridis",
marker='o'
)

plt.title("Line Plot of Mean Sepal Length by Species")
plt.xlabel("Species")
plt.ylabel("Mean Sepal Length (cm)")
plt.show()

15. Seaborn Bar Plot
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

sns.barplot(
x="species",
y="sepal_length",
data=iris,
hue="species",
palette="viridis",
legend=False
)

plt.title("Bar Plot of Mean Sepal Length by Species")
plt.xlabel("Species")
plt.ylabel("Mean Sepal Length (cm)")
plt.show()

Exp 3: Applying data visualization techniques using R Programming
1. Line Chart
i <- read.csv("C:/Users/kinja/Downloads/Iris.csv")

plot(i$SepalLengthCm,
type = "l",
main = "Line Plot of Sepal Length",
xlab = "Observation Number",
ylab = "Sepal Length",
col = "red",
lwd = 2)

2. Scatter Plot
i <- read.csv("C:/Users/kinja/Downloads/Iris.csv")

plot(i$SepalLengthCm, i$SepalWidthCm,
main = "Scatter Plot of Sepal Length vs Sepal Width",
xlab = "Sepal Length",
ylab = "Sepal Width",
col = "blue",
pch = 19)

3. Bar Plot
iris_data <- read.csv("C:\Users\kinja\Downloads\Iris.csv")

print(iris_data)

barplot(table(iris_data$Species),
main = "Count of Iris Species",
col = c("pink", "lightgreen", "lightblue"))

4. Histogram
iris_data <- read.csv("C://Users//kinja//Downloads//Iris.csv")

names(iris_data)

hist(iris_data$SepalLengthCm,
main = "Histogram of Sepal Length",
xlab = "Sepal Length",
col = "lightblue",
border = "black")

5. Box Plot (Iris Measurements)
iris_data <- read.csv("C://Users//kinja//Downloads//Iris.csv")

boxplot(iris_data[, 1:4],
main = "Boxplots of Iris Measurements",
col = "lightgreen")

6. Box Plot (Sepal Length by Species)
summary(iris)

boxplot(Sepal.Length ~ Species,
data = iris,
main = "Boxplot of Sepal Length by Species",
xlab = "Species",
ylab = "Sepal Length",
col = c("lightpink", "lightgreen", "lightblue"))

7. Box Plot (Petal Length by Species)
i <- read.csv("C:/Users/kinja/Downloads/Iris.csv")

boxplot(PetalLengthCm ~ Species,
data = i,
main = "Boxplot of Petal Length by Species",
xlab = "Species",
ylab = "Petal Length",
col = c("lightpink", "lightgreen", "lightblue"))

8. Pie Chart
species_count <- table(iris$Species)

pie(species_count,
col = c("pink", "lightgreen", "lightblue"),
main = "Proportion of Iris Species")

9. GG Plot (Violin Plot)
install.packages("ggplot2")
library(ggplot2)

ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violin() +
labs(
title = "Violin Plot of Sepal Length by Species",
x = "Species",
y = "Sepal Length"
)

Exp 4: Implement ARIMA model in python / R
Code1: R Program
install.packages("forecast")
library(forecast)
# Sample time series data
data <- c(200,220,250,270,300,320,350,370,400,420,450)
# Convert to time series object
ts_data <- ts(data)
# Build ARIMA model
model <- arima(ts_data, order = c(1,1,1))
# Forecast next 3 values
library(forecast)
forecast_values <- forecast(model, h = 3)
# Print output
print("Predicted values:")
print(forecast_values)
Code1: Python
# Import libraries
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# Sample time series data
data = [200, 220, 250, 270, 300, 320, 350, 370, 400, 420, 450]

# Convert to pandas Series
ts_data = pd.Series(data)

# Build ARIMA model
model = ARIMA(ts_data, order=(1, 1, 1))

# Fit model
model_fit = model.fit()

# Forecast next 3 values
forecast_values = model_fit.forecast(steps=3)

# Print output
print("Predicted values:")
print(forecast_values)

Code2: R Program
# Load dataset
data("Nile")
ts_data <- Nile

# Plot time series
plot(ts_data,
main="Nile River Flow Time Series Data",
xlab="Year",
ylab="Flow")

# Differencing
diff_data <- diff(ts_data)

# Plot differenced data
plot(diff_data,
main="Differenced Data")

# Build ARIMA model
model <- arima(ts_data, order=c(1,1,1))

# Model summary
print(model)

# Forecast next 10 years
forecast <- predict(model, n.ahead=10)

print(forecast)

# Plot forecast
ts.plot(ts_data, forecast$pred,
col=c("blue","red"),
main="ARIMA Forecast")
Code2: Python
# Import libraries
import matplotlib.pyplot as plt
from statsmodels.datasets import nile
from statsmodels.tsa.arima.model import ARIMA

# Load Nile dataset
data = nile.load_pandas().data

# Convert to time series
ts_data = data['volume']

# Plot time series
plt.figure(figsize=(10,5))
plt.plot(data['year'], ts_data)
plt.title("Nile River Flow Time Series Data")
plt.xlabel("Year")
plt.ylabel("Flow")
plt.grid(True)
plt.show()

# Differencing
diff_data = ts_data.diff().dropna()

# Plot differenced data
plt.figure(figsize=(10,5))
plt.plot(diff_data)
plt.title("Differenced Data")
plt.grid(True)
plt.show()

# Build ARIMA model
model = ARIMA(ts_data, order=(1,1,1))

# Fit model
model_fit = model.fit()

# Model summary
print(model_fit.summary())

# Forecast next 10 years
forecast = model_fit.forecast(steps=10)

print("nForecasted Values:")
print(forecast)

# Plot forecast
plt.figure(figsize=(10,5))

# Original data
plt.plot(ts_data.index, ts_data, color='blue', label='Original Data')

# Forecast data
forecast_index = range(len(ts_data), len(ts_data) + 10)
plt.plot(forecast_index, forecast, color='red', label='Forecast')

plt.title("ARIMA Forecast")
plt.legend()
plt.grid(True)
plt.show()

Exp 5: Implementation of text analysis using classification algorithm
# Install required libraries (run once in Colab)
!pip install nltk scikit-learn pandas

# Import libraries
import pandas as pd
import numpy as np
import re
import nltk

from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

# Download stopwords
nltk.download('stopwords')

# ===============================
# Step 1: Load Dataset
# ===============================

# Sample dataset (Spam / Not Spam)
data = {
'text': [
"Win money now!!!",
"Hi, how are you?",
"Claim your free prize",
"Let's meet tomorrow",
"Congratulations, you won lottery",
"Are you coming to class?",
"Free entry in contest",
"Call me when you are free"
],
'label': [1, 0, 1, 0, 1, 0, 1, 0] # 1 = Spam, 0 = Not Spam
}

df = pd.DataFrame(data)

print("Dataset:")
print(df.head())

# ===============================
# Step 2: Text Preprocessing
# ===============================

stop_words = set(stopwords.words('english'))

def preprocess(text):
text = text.lower() # Lowercase
text = re.sub(r'[^a-zs]', '', text) # Remove punctuation
words = text.split() # Tokenization
words = [w for w in words if w not in stop_words] # Remove stopwords
return " ".join(words)

df['clean_text'] = df['text'].apply(preprocess)

print("nCleaned Text:")
print(df[['text', 'clean_text']])

# ===============================
# Step 3: Feature Extraction (TF-IDF)
# ===============================

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['clean_text'])
y = df['label']

# ===============================
# Step 4: Train-Test Split
# ===============================

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)

# ===============================
# Step 5: Apply Classification Models
# ===============================

models = {
"Naive Bayes": MultinomialNB(),
"SVM": SVC(),
"Logistic Regression": LogisticRegression(),
"Decision Tree": DecisionTreeClassifier()
}

for name, model in models.items():
print("n==============================")
print(f"Model: {name}")

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("nClassification Report:n", classification_report(y_test, y_pred))
print("nConfusion Matrix:n", confusion_matrix(y_test, y_pred))

# ===============================
# Step 6: Test with New Input
# ===============================

sample = ["You have won a free ticket"]
sample_clean = [preprocess(sample[0])]
sample_vector = vectorizer.transform(sample_clean)

prediction = models["Naive Bayes"].predict(sample_vector)

print("nTest Sentence:", sample[0])
print("Prediction:", "Spam" if prediction[0] == 1 else "Not Spam")

Exp 6: Implementation of text analysis using clustering algorithms
# Import libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

# -------------------------------------------------------
# New Dataset (Travel, Technology, Sports)
# -------------------------------------------------------

reviews = [
"The beach resort had beautiful views and relaxing atmosphere",
"The hotel room was clean and the staff were very friendly",
"The mountain trip was adventurous and the scenery was amazing",

"This smartphone has a powerful processor and excellent camera",
"The laptop performance is fast and the battery lasts long",
"The new software update improved speed and security",

"The football match was exciting and the team played well",
"The cricket tournament had great performances from players",
"The basketball game was intense and the crowd was energetic"
]

# -------------------------------------------------------
# Convert text to TF-IDF features
# -------------------------------------------------------

vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(reviews)

features = vectorizer.get_feature_names_out()

print("TF-IDF Feature Names:")
print(features)
print()

# Show TF-IDF matrix
df = pd.DataFrame(X.toarray(), columns=features)
print("TF-IDF Matrix:")
print(df)
print()

# -------------------------------------------------------
# K-Means Clustering
# -------------------------------------------------------

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

labels = kmeans.labels_

print("Cluster Assignment:")
for i in range(len(reviews)):
print("Review:", reviews[i])
print("Cluster:", labels[i])
print()

# -------------------------------------------------------
# Top words per cluster
# -------------------------------------------------------

order_centroids = kmeans.cluster_centers_.argsort()[:, ::-1]

print("Top Words per Cluster:")
for i in range(3):
print(f"Cluster {i}:")
for ind in order_centroids[i, :5]:
print(features[ind])
print()

# -------------------------------------------------------
# Map cluster to topics
# -------------------------------------------------------

cluster_topics = {}

for i in range(3):
top_words = [features[ind] for ind in order_centroids[i, :5]]
print(f"Cluster {i} top words:", top_words)

if "hotel" in top_words or "beach" in top_words or "mountain" in top_words:
cluster_topics[i] = "Travel"

elif "laptop" in top_words or "software" in top_words or "smartphone" in top_words:
cluster_topics[i] = "Technology"

elif "football" in top_words or "cricket" in top_words or "basketball" in top_words:
cluster_topics[i] = "Sports"

else:
cluster_topics[i] = "General"

print()

# -------------------------------------------------------
# Final output with topic names
# -------------------------------------------------------

print("Final Review Topics:")
for i in range(len(reviews)):
cluster_label = labels[i]
print("Review:", reviews[i])
print("Cluster:", cluster_label, "-", cluster_topics[cluster_label])
print()

Exp 7: Implementation of Simple Linear Regression in Python
# 1. Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# 2. Sample dataset (you can replace this with your own dataset)
# X = independent variable, y = dependent variable
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# 3. Create model
model = LinearRegression()

# 4. Train model
model.fit(X, y)

# 5. Predictions
y_pred = model.predict(X)

# 6. Print results
print("Slope (m):", model.coef_[0])
print("Intercept (c):", model.intercept_)

# 7. Plot graph
plt.scatter(X, y, label="Actual Data")
plt.plot(X, y_pred, label="Regression Line")
plt.xlabel("X")
plt.ylabel("y")
plt.title("Simple Linear Regression")
plt.legend()
plt.show()

Exp 8: Implementation of Multiple Linear Regression in Python
# ============================
# Multiple Linear Regression (Single Cell)
# ============================

# Import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# ----------------------------
# Step 1: Create Dataset
# ----------------------------
data = {
'Area': [1000, 1500, 2000, 2500, 3000],
'Bedrooms': [2, 3, 3, 4, 4],
'Age': [10, 8, 5, 3, 1],
'Price': [300000, 400000, 500000, 600000, 650000]
}

df = pd.DataFrame(data)

print("Dataset:n", df)

# ----------------------------
# Step 2: Define Variables
# ----------------------------
X = df[['Area', 'Bedrooms', 'Age']]
y = df['Price']
# ----------------------------
# Step 3: Train-Test Split
# ----------------------------
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.4, random_state=42
)
# ----------------------------
# Step 4: Train Model
# ----------------------------
model = LinearRegression()
model.fit(X_train, y_train)

# ----------------------------
# Step 5: Model Parameters
# ----------------------------
print("nIntercept:", model.intercept_)
print("Coefficients:", model.coef_)

# ----------------------------
# Step 6: Predictions
# ----------------------------
y_pred = model.predict(X_test)

comparison = pd.DataFrame({
'Actual': y_test.values,
'Predicted': y_pred
})
print("nActual vs Predicted:n", comparison)

# ----------------------------
# Step 7: Evaluation
# ----------------------------
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("nMean Squared Error:", mse)
print("R2 Score:", r2)

# ----------------------------
# Step 8: Manual Implementation
# ----------------------------
X_b = np.c_[np.ones((len(X), 1)), X] # Add bias
y_np = y.values.reshape(-1, 1)

theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_np)

print("nManual Theta (parameters):n", theta)
# ----------------------------
# Step 9: Manual Prediction
# ----------------------------
X_new = np.array([[1, 2200, 3, 5]]) # [bias, Area, Bedrooms, Age]
prediction = X_new.dot(theta)

print("nManual Predicted Price:", prediction[0][0])

Notes is a web-based application for online taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000+ notes created and continuing...

With notes.io;

* You can take a note from anywhere and any device with internet connection.
* You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
* You can quickly share your contents without website, blog and e-mail.
* You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
* Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 14 years and has been free since the day it was started.

You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;

Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio

Regards;
Notes.io Team

Notes

Notes - notes.io

Shortened Note Link

Long File

Notes