Big Data Workshop is a one-day event dedicated to everyone who wants to get to know with Big Data and Hadoop ecosystem. Participants will discover technologies such as Hadoop, Hive, Spark, Kafka, Flink or HBase by the most practical approach.
During the workshop you will act as a Big Data engineer and analyst working for a fictional company StreamRock that creates an app for music streaming (Spotify alike). The main goal of your work will be to take advantage of Big Data technologies such as Spark or Hive to analyze various datasets about the users and the song they played. We will be processing our data in batch and streaming manners to get data-driven answers to many business questions and power product features that StreamRock builds. Every exercise will be executed on a multi-node Hadoop cluster installed in a public cloud service.
The workshop is highly focused on a practical experience. The instructor will also introduce you to his/her own practical experience gained while working with Big Data technologies for several years.
- Description of the StreamRock company along with all its opportunities and challenges that come from the Big Data technologies.
- Introduction to core Hadoop technologies such as HDFS or YARN.
- Hands-on exercise: Accessing a remote multi-node Hadoop cluster.
- Introduction to Apache Hive.
- Hands-on exercise: Importing structured data into the cluster using HUE.
- Hands-on exercise: Ad-hoc analysis of the structured data with Hive.
- Hands-on exercise: The visualisation of results using HUE.
- Introduction to Apache Spark, Spark SQL and Spark DataFrames.
- Hands-on exercise: Implementation of the ETL job to clean and massage input data using Spark.
- Quick explanation of the Avro and Parquet binary data formats.
- Practical tips for implementing ETL processes like process scheduling, schema management, integrations with existing systems.
- Hands-on exercise: Implementing ad-hoc the queries using Spark SQL and DataFrames.
- Hands-on exercise: Visualisation of the results of Spark queries using the Spark Notebook.
- Real-time data collection with Apache Kafka (presentation and demo).
- Processing real-time streams of data using Apache Flink (presentation and demo).
- Price $300 per person
- Listing categories Big Data
- Min/Max Participants 4/min 12/max
- Education level None, Beginner, Intermediate
- Duration 1 day (6 hours + lunch + coffee breaks)
- Languages English, Polish
- Features Hands-on exercises included, Slides sent after the workshop, Travel and accommodation booked by the client, Travel possible