Covid-19 Analysis with BigData Applications – Part 1
Hi again! So, if you came here after reading the introduction post, we’ll be talking about the part that we run on EMR cluster i.e. ETL job. Across BigData community, the term ETL generally refers to Extract, Transform and Load. And for this project, I’m using Apache Spark, which happens to be one of the most popular open-source projects currently. Data Ingestion Setup Since we’ll be running multiple ETL jobs for this project, I’ll mostly focus on the first one…