spark-sql

Python Spark Challenge

I was recently asked to solve a data science related challenge for a job application, the challenge was to simply write a Spark application that determines the top 10 most rated TV series’ genres of TV series with over 10 episodes. The challenge required the solution to be written in Scala using the SBT tool. I later wrote the solution again using Python which I am more comfortable with, here are my notes on the Python solution.