#spark – Sajjan's Blog

Covid-19 Analysis with BigData Applications – Part 3

Nov 19, 2020

—

by

in AWS, BigData, EMR, Spark, Uncategorized

Hi there! If you’ve been following these blog series, we are looking at a BigData project to analyze Covid-19 data. So far, we have looked at the overall architecture and ETL Spark jobs. In this post, let’s look the scheduler component (Lambda function) in this workflow. The main reason I’m using Lambda here is due…

Covid-19 Analysis with BigData Applications – Part 2

Nov 16, 2020

—

by

sajjanbh

in AWS, BigData, EMR, Spark

Hi again! On this post, I’ll explain on the second two ETL jobs: first one to process the Twitter data related to Covid-19 and second one will combine the data from previous two ETL jobs. As we have already covered the basic EMR concept earlier, I’ll directly get into the explanation of what is being…

Covid-19 Analysis with BigData Applications – Part 1

Nov 14, 2020

—

by

sajjanbh

in AWS, BigData, EMR, Spark

Hi again! So, if you came here after reading the introduction post, we’ll be talking about the part that we run on EMR cluster i.e. ETL job. Across BigData community, the term ETL generally refers to Extract, Transform and Load. And for this project, I’m using Apache Spark, which happens to be one of the…

Tag: #spark

Covid-19 Analysis with BigData Applications – Part 3

Covid-19 Analysis with BigData Applications – Part 2

Covid-19 Analysis with BigData Applications – Part 1