An End to End Data Engineering project using Azure tools.
- Githubhttps://github.com/metal0bird/Azure_data_engineering_project
- StackSQL, Apache Spark, Azure tools (Data Factory, Data Lake Gen 2, Synapse Analytics, Databricks)
- LinkInsights
- BlogpostEnd to End Netflix data analytics and recommendation system project using Microsoft Azure tools
Ideation
This project centers on harnessing the power of Azure's data engineering suite to extract valuable insights from a massive dataset – Netflix's movie and show library. The core idea lies in leveraging Azure tools to transform raw data into actionable recommendations for users. This could revolutionize Netflix's suggestion engine, leading to a more personalized and engaging viewing experience.
Building
By combining the strengths of various Azure services, the project aims to build a robust data pipeline. Azure Data Factory can orchestrate the data flow, while Data Lake Gen 2 serves as the centralized storage for the vast amount of Netflix data. Data transformation and analysis can be tackled using Azure Databricks, a powerful Apache Spark environment. Finally, Power BI can translate the extracted insights into compelling visualizations, allowing for clear communication of user preferences.
Learning
Hosting a end to end data pipeline using Azure tools. Along with it building a recommendation system from scratch, using exploratory data analysis, data wrangling, model building and evaluation.