Notes

Notes - notes.io

-ENGLISH ABSTRACT-

Atılım University's Computer, Software and Information Systems Engineering departments use limited past data and predictions from a committee of lecturers to determine course quotas and section numbers. This process adds an additional administrative responsibility for the committee members to determine the most suitable course structure in course selection intervals, preventing them from spending more efficient time on education and research. To address this problem, an AI-based approach is proposed to optimize the course registration process in the university. The AI system aims to uncover hidden relationships between multiple variables and generate high-accuracy predictions for registered students and associated classroom numbers that align with real-life scenarios.

To achieve this, the AI system will use features in multi-class classification such as preparatory school transition rates, semester-based student data, and prerequisite chains. The system will be developed using machine learning or deep learning techniques, with a focus on generalized gradient boosting algorithms such as XGBoost and LightGBM, which are highly optimized for processing sparse data and providing accurate predictions.

This proposed approach can significantly improve the registration experience for students and academicians by reducing negative experiences during the registration process and allowing them to focus on research and teaching instead of dealing with registration problems. Additionally, the AI system can adapt to changing circumstances by providing more accurate predictions in the future. In summary, this research aims to improve the educational experience for students and staff while maintaining high quality in university operations by optimizing the course registration process.

I.Introduction

The use of artificial intelligence in the face of problems encountered in daily life has become one of the most common activities in our world nowadays. From voice assistants on smartphones to personalized product recommendations on e-commerce websites, AI has become an integral part of our daily routine. It is also being utilized in a wide range of industries including healthcare, finance, transportation, and education to improve efficiency, accuracy and precision in decision making processes. As the rapid development in AI area, one of the objectives of this paper is to solve one of the fundamental problems observed at Atılım University by utilizing recent research in the field of artificial intelligence.

Atılım University's Computer, Software and Information Systems Engineering departments rely on limited data of previous semesters and lecturers’ predictions to determine course quotas and section numbers, which places an additional administrative burden on the academicians and prevents them from dedicating more time to education and research. To solve this problem, an AI-based approach is proposed to optimize the course registration process at the university. The AI system aims to reveal hidden relationships between multiple variables such as transition rates from preparatory schools, historical student data, and prerequisite chains, and generate highly accurate predictions for registered students and associated classroom numbers that are consistent with real-life scenarios.

Machine Learning and Deep Leanings Models, and generalized gradient boosting algorithms such as XGBoost and LightGBM, which have been optimized to process sparse data and provide accurate predictions [1], are studied by considering their pros and cons, and their applicability to the desired project.

II.Machine Learning Models

-LightGBM
LightGBM: A Highly Efficient Gradient Boosting Framework
LightGBM is an open-source gradient boosting framework developed by Microsoft that uses tree-based learning algorithms [2]. It is designed to be highly efficient in terms of both training speed and memory usage, making it an attractive option for many applications. In this article, we will explore the key features of LightGBM, how it differs from other popular gradient boosting frameworks, and some best practices for using it.
Key Features of LightGBM
Gradient-Based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB)
LightGBM implements a number of key optimizations to improve training speed and memory usage. Gradient-Based One-Side Sampling (GOSS) is a sampling strategy that focuses on reducing the number of samples that need to be computed for the gradient calculation. Exclusive Feature Bundling (EFB) is a feature engineering technique that combines features with low frequency values into one feature, reducing the number of features that need to be considered in the model.[2]
Histogram-Based Gradient Boosting
LightGBM uses a histogram-based approach to binning continuous features, which speeds up the computation of gradients and reduces the memory usage during training.[2]
Leaf-Wise Tree Growth
LightGBM implements a leaf-wise tree growth strategy, which can lead to faster convergence than the depth-wise strategy used in other gradient boosting frameworks. This strategy works by growing the tree node-by-node, choosing the node with the largest gradient for expansion at each step.
GPU Acceleration
LightGBM can leverage GPU hardware to accelerate training and inference, leading to significant speedups in many cases.[3]
How LightGBM Differs from Other Gradient Boosting Frameworks
There are several key ways in which LightGBM differs from other popular gradient boosting frameworks such as XGBoost and CatBoost.
Speed and Efficiency
LightGBM is designed to be highly efficient in terms of both training speed and memory usage. It achieves this by using a number of key optimizations, such as GOSS and EFB, as well as histogram-based binning and leaf-wise tree growth.
Flexibility and Customizability
LightGBM provides a wide range of hyperparameters that can be used to customize the training process to the specific needs of the application. It also supports a variety of loss functions, including regression, binary classification, and multiclass classification.
GPU Acceleration
LightGBM is designed to be highly parallelizable and can take advantage of GPU hardware to speed up training and inference. This makes it an attractive option for applications that require high throughput and low latency.[3]

-XGBoost(Extreme Gradient Boosting)

Gradient Boosting

Gradient boosting works by minimizing the loss function of the model. The loss function measures the difference between the predicted values and the actual values of the target variable. The goal of gradient boosting is to minimize the loss function by adjusting the weights of the weak models. [4][5]

XGBoost

XGBoost is an optimized distributed gradient boosting library that implements gradient boosting in a highly efficient and scalable way. It is designed to work with large datasets and can handle a wide range of machine learning tasks, including classification, regression, and ranking.

The XGBoost algorithm follows a similar process to the gradient boosting algorithm. It starts by training a simple decision tree on the data and then iteratively adds more trees to the model to reduce the loss function. Each tree is trained to correct the errors of the previous tree, and the final model is an ensemble of all the trees. [6]

The XGBoost algorithm uses a different objective function than the gradient boosting algorithm. The objective function in XGBoost is a sum of two parts: the loss function and a regularization term. The loss function measures the difference between the predicted values and the actual values, and the regularization term penalizes the complexity of the model.

XGBoost uses a combination of multiple algorithms to improve the accuracy and efficiency of the gradient boosting algorithm. Some of the key features of XGBoost are:

Regularized Learning: XGBoost uses L1 and L2 regularization techniques to avoid overfitting and improve the accuracy of the model. Regularization adds a penalty term to the loss function, which helps to reduce the complexity of the model and prevent it from overfitting the training data.

Tree-based Learning: XGBoost uses decision trees as the weak models in the gradient boosting algorithm. It can handle both regression and classification tasks using decision trees, and it can also handle missing values in the data. [7][8]

Parallel Processing: XGBoost uses parallel processing techniques to speed up the training process. It can run on multiple processors and can handle large datasets efficiently. [9]

Cross-validation: XGBoost implements k-fold cross-validation to evaluate the performance of the model. Cross-validation helps to avoid overfitting and provides a more accurate estimate of the model's performance on new data. [10]

-ARIMA(Autoregressive Integrated Moving Average)

ARIMA, or Autoregressive Integrated Moving Average, is a statistical model used for time series analysis and forecasting. It combines three components: autoregression (AR), differencing (I), and moving average (MA) to create a powerful model for analyzing and predicting time series data. [11]

Time Series Analysis:

Time series analysis is a statistical technique used to analyze time-dependent data. It is widely used in finance, economics, engineering, and many other fields. A time series is a sequence of data points, usually collected at regular intervals over time. Examples of time series data include stock prices, weather patterns, and sales figures.

Autoregression (AR): Autoregression is a statistical method that models the relationship between a variable and its past values. It assumes that the current value of a variable depends on its past values. In other words, autoregression models the correlation between the current value and one or more past values of the same variable. [12]

Moving Average (MA): Moving average is another statistical method used in time series analysis. It models the relationship between a variable and its past errors. In other words, moving average models the correlation between the current value and one or more past errors of the same variable.[buse1] [12]

Differencing (I): Differencing is a method used to remove trends and seasonality from time series data. It calculates the difference between consecutive observations. Differencing is used to transform a non-stationary time series into a stationary one. A stationary time series has a constant mean and variance over time. [12]

ARIMA Model: ARIMA models combine the autoregressive, moving average, and differencing components to create a powerful model for time series analysis and forecasting. The ARIMA model is denoted by ARIMA(p, d, q), where p is the order of the autoregressive component, d is the order of the differencing component, and q is the order of the moving average component.[buse2]

The ARIMA model is fitted to the time series data using maximum likelihood estimation. The model parameters are estimated by minimizing the sum of squared errors between the model predictions and the actual observations.

Once the ARIMA model is fitted to the time series data, it can be used for forecasting future values. The ARIMA model uses the past values of the time series data to make predictions about the future values.

III.Deep Learning Models

-LSTM(Long-Short Term Memory)
-LSTM stands for Long Short-Term Memory, and it's a type of neural network that's often used for processing sequences of data, like time series or text. What makes LSTM different from other neural networks is that it has a "memory" that allows it to remember important information from earlier in the sequence.
The memory is made up of "gates" that control the flow of information into and out of the LSTM. There are three types of gates: the input gate, the forget gate, and the output gate. Each gate is like a switch that can be turned on or off depending on the input.[buse3]
The input gate controls whether or not new information should be added to the memory. The forget gate controls whether or not old information should be removed from the memory. And the output gate controls whether or not the memory should be used to make a prediction.
By controlling the flow of information in this way, LSTM is able to learn from sequences of data and make predictions based on that learning. It's a powerful tool for a wide range of applications, from natural language processing to stock market prediction.

-GRU(Gated Recurrent Units)
Gated Recurrent Units (GRUs) are a type of Recurrent Neural Network (RNN) architecture that was introduced in 2014 by Kyunghyun Cho et al. GRUs are a simpler version of Long Short-Term Memory (LSTM) networks that have fewer parameters and are therefore easier to train.
GRUs were designed to address some of the shortcomings of traditional RNNs, which can have difficulty capturing long-term dependencies in sequential data. GRUs achieve this by using a gating mechanism that selectively updates the hidden state of the network, allowing it to remember or forget information as needed.
The GRU architecture consists of a hidden state, an input gate, and a reset gate. The hidden state is similar to the cell state in an LSTM network and represents the network's memory. The input gate and reset gate control how much new input should be added to the memory and how much of the existing memory should be discarded, respectively.
During training, the GRU learns the optimal values for its parameters by minimizing a loss function. This is typically done using gradient descent optimization algorithms like stochastic gradient descent (SGD) or Adam.
GRUs have been shown to be effective in a wide range of applications, including speech recognition, machine translation, and image captioning.[Erkin4] They are especially useful in situations where there is a need to capture long-term dependencies in sequential data, but training time and computational resources are limited.
Despite their effectiveness, GRUs are not without their limitations. One of the main challenges with GRUs is their sensitivity to initialization and optimization. Improper initialization of the network can lead to vanishing or exploding gradients, which can cause the network to either not learn or to learn slowly.
In conclusion, GRUs are a powerful tool in the field of deep learning and have been shown to be effective in a wide range of applications. However, care must be taken in their initialization and optimization to ensure that they are properly trained and optimized

-MLP(Multilayer Perceptron)

Multilayer perceptrons (MLPs) are a class of neural networks that are widely used for supervised learning tasks, such as classification and regression. MLPs are designed to learn complex non-linear mappings between inputs and outputs, making them well-suited for tasks that involve pattern recognition, feature extraction, and decision-making.

The basic building block of an MLP is a neuron, which takes one or more inputs, performs a weighted sum of those inputs, and applies an activation function to the result. The output of each neuron is then fed forward to the next layer of neurons, until the final layer produces the network's output.[buse4]

MLPs can have multiple layers of neurons, hence the name "multilayer perceptron." The first layer of neurons is the input layer, which takes the raw input data and passes it forward to the next layer. The next layers are called the hidden layers, because their outputs are not directly visible to the user. The final layer is the output layer, which produces the network's output.

One of the key advantages of MLPs is their ability to learn complex, non-linear mappings between inputs and outputs. This is accomplished by adjusting the weights of the connections between neurons during training, using an optimization algorithm such as stochastic gradient descent. The goal of training is to minimize a loss function that measures the difference between the network's predicted output and the true output for a given input.

MLPs can also be used for feature extraction, by training the network to recognize useful patterns in the input data. This can be done by adding additional hidden layers to the network, which allow it to learn increasingly complex features. The resulting feature vectors can then be used as input to other machine learning models, such as support vector machines or decision trees.

One limitation of MLPs is that they can be prone to overfitting, which occurs when the network becomes too complex and starts to fit the training data too closely, leading to poor generalization to new data. To address this, various regularization techniques can be used, such as weight decay or dropout.

Overall, MLPs are a powerful tool for supervised learning tasks, and their ability to learn complex, non-linear mappings makes them well-suited for a wide range of applications. With the development of new architectures and training techniques, it is likely that MLPs will continue to be a fundamental building block of machine learning systems in the future.

-RNN(Recurrent Neural Networks)

Recurrent Neural Networks (RNNs) are a class of deep neural networks that are designed to process sequential data, such as time-series data or natural language. Unlike traditional neural networks, which process inputs independently, RNNs use a feedback loop to allow information to persist across time steps, making them well-suited for tasks that involve predicting the future based on past events.

The basic building block of an RNN is a neuron that takes an input and produces an output. In a standard feedforward neural network, the output of each neuron is only dependent on the current input, but in an RNN, the output of each neuron is also dependent on the previous state of the network. This allows the network to maintain a kind of memory, and to use that memory to inform its predictions.

The architecture of an RNN typically involves feeding input sequences into the network one element at a time, and using the output of each time step as an input to the next. At each time step, the network computes a hidden state vector, which represents the current state of the network based on the current input and the previous hidden state. This hidden state vector is then used to compute the output for that time step, as well as the hidden state vector for the next time step.

One common type of RNN is the Long Short-Term Memory (LSTM) network, which was designed to address the problem of vanishing gradients in standard RNNs. The vanishing gradient problem arises when gradients propagated back through many time steps become very small, which can make it difficult to train the network effectively. LSTMs solve this problem by introducing a gating mechanism that allows the network to selectively remember or forget information over time.

Another popular variant of the RNN is the Gated Recurrent Unit (GRU), which is similar to the LSTM but with fewer parameters. The GRU also uses gating mechanisms to control the flow of information through the network, but it has a simpler structure than the LSTM and is often faster to train.

RNNs have been used successfully in a variety of applications, including speech recognition, natural language processing, and image captioning. One of the key advantages of RNNs is their ability to process sequences of varying lengths, making them well-suited for tasks that involve variable-length inputs or outputs. However, RNNs can be difficult to train effectively, particularly when dealing with long sequences or complex data distributions.

Overall, RNNs are a powerful tool for processing sequential data, and their ability to maintain a memory of past inputs makes them well-suited for a wide range of applications. With the development of new architectures and training techniques, it is likely that RNNs will continue to play an important role in the development of advanced AI systems in the future.

Notes.io is a web-based application for taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000 notes created and continuing...

With notes.io;

* You can take a note from anywhere and any device with internet connection.
* You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
* You can quickly share your contents without website, blog and e-mail.
* You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
* Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 12 years and has been free since the day it was started.

You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;

Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio

Regards;
Notes.io Team

Notes

Notes - notes.io

Shortened Note Link

Long File

Notes