Browsed by
Tag: #etl

Covid-19 Analysis with BigData Applications – Part 3

Covid-19 Analysis with BigData Applications – Part 3

Hi there! If you’ve been following these blog series, we are looking at a BigData project to analyze Covid-19 data. So far, we have looked at the overall architecture and ETL Spark jobs. In this post, let’s look the scheduler component (Lambda function) in this workflow. The main reason I’m using Lambda here is due to its serverless nature and native integration with other AWS services. For example, we could trigger it via a CloudWatch scheduled rule in regular basis…

Read More Read More

Covid-19 Analysis with BigData Applications – Part 2

Covid-19 Analysis with BigData Applications – Part 2

Hi again! On this post, I’ll explain on the second two ETL jobs: first one to process the Twitter data related to Covid-19 and second one will combine the data from previous two ETL jobs. As we have already covered the basic EMR concept earlier, I’ll directly get into the explanation of what is being done in these task. For ETL2, I’m creating a Hive table beforehand because this Twitter data is in semi-colon delimited format and isn’t easily parsed…

Read More Read More