Analyzing Sales Trends with Python
12.12.2022 -
Pandas, NumPy, Matplotlib
Intro
In this project, we'll use basic libraries like Pandas and NumPy from Python to find sales trends and answer basic business questions related to them. The data contains hundreds of thousands of electronics store purchases broken down by month, cost, product type, purchase address, etc. We'll take this opportunity to showcase some basic EDA skills and save the clean data to a new dataset, which can then be used for exploration. Finally, we will explore answers to some high-level business questions.
This is a walkthrough notebook of the project.
Loading Data

​
Then, merging 12 months of sales data into a single file.
​
​

Cleaning

Let's load this dataset and take a look at their first few rows.
As you see, the data needs to be cleaned before proceeding to add columns and make graphs. The total NaN values were found to be 544 and these can be dropped.​

When trying to extract month column from the order date, we got the error saying 'or' is invalid. This might be indicating that there was a typo error in the date section, or the month section. We clean that up as well and try to create the month column and the sales column.
​
Here the date, price and quantity ordered are not in their right numeric format, so we fix that as well.

Visualization
Suppose we want to find the answer to a business question: "What was the best month for sales?"
A bar plot with sales as y axis and months on x axis can be used here using matplotlib.


It seems sales started to increase till April and then dips, finally seeing a big rise in December. This is probably due to all the Christmas and new years preparations and gift buying routines in December.
Another question could be "What city had the most sales?" since the addresses were given, all we had to do was separate the city from it.


Plotting the graph:


The graph comparing the sales and city shows that San Francisco had the highest no. of sales. Probably because the city has higher technology advancements, or maybe people living there made more money? That may be a question for later (if we get more data).
It is not unusual to wonder what time of day sales mostly happens. If the order date can be transformed into a datetime format, we could extract hour, minute and seconds from it to see what happens.


Plotting the graph:


By looking at the graph it seems like there is more sales activity at around 11 AM and then around 7 or 8 PM in the evening, mostly because it is after work hours and people have more free time on their hands. So the ideal time to target ads for a product could be around this time.