Browsed by
Tag: #emr

Covid-19 Analysis with BigData Applications – Part 2

Covid-19 Analysis with BigData Applications – Part 2

Hi again! On this post, I’ll explain on the second two ETL jobs: first one to process the Twitter data related to Covid-19 and second one will combine the data from previous two ETL jobs. As we have already covered the basic EMR concept earlier, I’ll directly get into the explanation of what is being done in these task. For ETL2, I’m creating a Hive table beforehand because this Twitter data is in semi-colon delimited format and isn’t easily parsed…

Read More Read More

Covid-19 Analysis with BigData Applications – Part 1

Covid-19 Analysis with BigData Applications – Part 1

Hi again! So, if you came here after reading the introduction post, we’ll be talking about the part that we run on EMR cluster i.e. ETL job. Across BigData community, the term ETL generally refers to Extract, Transform and Load. And for this project, I’m using Apache Spark, which happens to be one of the most popular open-source projects currently. Data Ingestion Setup Since we’ll be running multiple ETL jobs for this project, I’ll mostly focus on the first one…

Read More Read More

Covid-19 Analysis with BigData Applications – Introduction

Covid-19 Analysis with BigData Applications – Introduction

Hi there! As we know it, Covid-19 pandemic has been the prime highlight of this year. It has been directly affecting our way of living and has also made some serious dents to global social-economic dynamics. And there are lots of people, who have been continually working to understand, fight and control this epidemic. So, it is all of our common responsibility to help this endeavor with as much as we can. So, on this blog series, I’d like to…

Read More Read More