python

Python Spark Platform on AWS

Following on my post on setting up a platform to get started with data science tools I since have set up a Jupyter based platform for programming Python on Spark. On top of using Python libraries (like pandas, NumPy, Scikit-Learn, etc) that makes data analysis easier, in this platform I can also use Spark to code applications that run on distributed clusters This setup has the following benefits It is web based, I can work on my projects from anywhere as long as I have a web browser with an internet connection It is set up using light weight EC2 instance types (t2.

Python Spark Challenge

I was recently asked to solve a data science related challenge for a job application, the challenge was to simply write a Spark application that determines the top 10 most rated TV series’ genres of TV series with over 10 episodes. The challenge required the solution to be written in Scala using the SBT tool. I later wrote the solution again using Python which I am more comfortable with, here are my notes on the Python solution.

Python Libraries - NumPy

NumPy NumPy is the core library for scientific computing in Python. It It adds support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. Installation The easiest way to get NumPy installed is to install one of the Python distirbutions, like Anaconda, which include all of the key packages for the scientific computing stack Usage To start using NumPy you need to import it

Data Science Getting Started Platform

Data Science Getting Started Platform To get started quickly with data science, I started looking at python and its powerful set of libraries (like pandas, NumPy, Scikit-Learn, etc) that makes data analysis easier. I wanted to have a platform that is accessible over the internet so I can get to it from any laptop/PC that has internet access. I decided to get a minimal Virtual Private Server (VPS) that supports containers so I can set up a Docker container with all the languages and frameworks/libraries/tools and mount a path on the VPS that contains all the projects I am working on, which will be checked in to git.