Python Spark Intro for Data Scientists

in Big Data

As a data scientist you need to know how to handle large data sets, how to clean them, analyze them and get conclusions from them. Spark is a mandatory tool for that – a distributed computation engine that enables you to run map-reduce tasks using a friendly Python (and Scala) API.

After this talk you will understand what Spark is and how to start using it. We will cover Spark architecture and workflow, understand the usage of RDD and DataFrame APIs and see some hands-on examples.

Click here to view the code


Download the Presentation
Contact us
You might also like