๐Ÿ”— GitHub

Description

Nowadays, Shopping Malls and Big Marts keep track of individual item sales data to forecast future client demand and adjust inventory management. In a data warehouse, these data stores hold a significant amount of consumer information and particular item details. By mining the datastore from the data warehouse, more anomalies and common patterns can be discovered.

Experience

The main aim is to develop models that are capable of forecasting the future client demands for different items available in a given specific store. The models will be trained based on the datasets that will be extracted from various data stores present in a given data warehouse. Different stores make use of different data warehouse software. A few examples include Amazon Redshift, IBM Db2, and so on. I was provided access to two different data warehouses of two different stores with the help of my professor.

For each of these two stores:

  • I extracted the required datasets relevant to the problem from the data stores present in the data warehouse of the given store.
  • Performed several data preprocessing processes to make the dataset suitable for training the machine learning models.
  • I developed several efficient machine learning models that are capable of forecasting the future client demands for different items available in the given store with great accuracy.
  • All of this enabled the store manager to efficiently manage the inventory according to the predictions made by the model.

Challenges Faced

  • This was my first time working with data warehousing software. I had to go through several tutorials both online and offline to get used to this software. This enabled me to perform several operations related to the data warehouse like extracting relevant data.
  • After obtaining the required dataset, there were various other things wrong with it. For instance: there were a lot of rows with null values w.r.t. various attributes, lack of uniform data formatting across a specific column, and so on. All of these issues were resolved one by one by using various Data Preprocessing Techniques.
  • There were a lot of attributes (columns) in the given dataset which were present in the categorical form. I had to employ various efficient data encoding techniques to convert the entries under these attributes into numerical values so that they can be used for training the models.
  • In the beginning, the predictions that were being made by the trained models were not accurate at all. This problem was resolved by undertaking several measures like making many changes in the dataset, hyperparameter tuning of the models, and so on. All of this enabled us in improving the accuracies of the models by a very large amount.
  • In the beginning, there was also a lot of confusion regarding how to plot various figures such as bar graphs, pie charts, line graphs, box plots, etc. To overcome this I had to go through several online courses, blogs, and youtube videos. This enabled me to learn about several tools/libraries that can be used for plotting good-looking and accurate figures from the given dataset.
  • One of the biggest challenges was time management. This is because all of the tasks that have been mentioned till now had to be performed two times (one time for each store)

Lessons Learnt

  • How to efficiently extract and combine data from different data stores present in the given data warehouse. There were several other operations related to data warehouses that I was able to learn/implement. Got familiar with tools like Amazon Redshift, and IBM Db2.
  • How to use different Data Preprocessing techniques to make the available dataset suitable for training the Machine Learning Models.
  • How to efficiently communicate related to any aspect of the project with the superiors (professors in this case).
  • How to improve the accuracy of the models using different Hyperparameter Tuning Techniques.
  • How to plot suitable graphs related to the dataset for checking the correctness of the dataset, observing useful patterns, and displaying/extracting valuable information from the dataset to the users.
  • How to effectively interact with people online who are working in the same domain as my work/project. This can be very useful to get problems resolved, or for learning a better or much more efficient technique. This also helps in expanding knowledge in the related field.
  • How to efficiently manage time between different tasks in a given project.