Calories Predictor using XGBoost

Intro

As we know, people are becoming more and more aware of food and fitness nowadays. They want to know how many calories are in a chocolate bar, how much do you burn after a 30-minute treadmill session, and so on. For this project, i will be taking parameters such as the duration of exercise, average beats per minute, body temperature, height, weight, etc. to find out how many calories the person is burning.

Getting data

First we get the necessary libraries such as numpy and pandas. Xgboost regressor is used as the machine learning algorithm. Scikit-learn is a machine-learning library which will help to split the data for training and testing.

And for visualization seaborn and matplotlib is used.

The two files 'exercise.csv' and 'calories.csv' are joined into a pandas framework.

Analyzing data

The next step done here is to clean the data if any null values or disturbances have occured. The isnull() function can be used to do that. Luckily, there are no null values present here.

The describe() function tells us the statistical measures such as mean, percentiles, etc. which may be useful for data exploration.

Data Visualization

Using seaborn library, first, we plotted the gender column in a count plot to see the distribution of males vs females. It seems like a good distribution.

Next, using distplot we find distribution of age, weight and height, which are all contributing factors to the amount a person burns on a daily basis.

Here you can see that the ages of 20-30 are at the peak points in terms of keeping a fitness routine.

And there is a normal distribution for height, whereas people who weigh between 60-80 kg are more.

Next, we build a correlation heat map that visualizes the strength of relationships between numerical variables and the strength of this relationship.

From the heatmap, we find that as duration of exercise increases, then the person burns more calories.

Also heart rate, body temperature is also seen to have positive correlation with calories.

Data preprocessing and Creating the ML model

Before we feed the data to the algorithm, we must convert the categorical attribute, gender into a numerical one. Then we separate the features and target variables. The target (Y) will contain the calories column and the features (X) will be the other columns.

Now, let us build the model using XGBoost Regressor:

Evaluation

Now, let us build the model using XGBoost Regressor:

The model has predicted test values based on what it learnt from training data. By using mean absolute error, we compare the predicted values with the original value. A value of 1.5 is pretty good.