Getting started with data science in Python
Installation
Use the Anaconda package. It will make starting with Data Science way easier, since almost all necessary packages are included and you can start right away.
$ cd ~/Downloads
$ wget http://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh
$ bash Anaconda2-4.1.1-Linux-x86_64.sh
$ source ~/.bashrc
$ conda --version
$ conda update conda
Examples
Make your first Data Frame
#!/usr/bin/env python
import pandas as pd
df = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1, index=list(range(4)), dtype='float32'),
'D' : pd.Series([1, 2, 1, 2], dtype='int32'),
'E' : pd.Categorical(["test", "train", "test", "train"]),
'F' : 'foo' })
df.groupby('E').sum().D
Create your first plots
First update Seaborn
$ conda install seaborn
Next, create a plot of an example dataset
#!/usr/bin/env python
import seaborn as sns
# Load one of the data sets that come with seaborn
tips = sns.load_dataset("tips")
tips.head()
sns.jointplot("total_bill", "tip", tips, kind='reg');
sns.lmplot("total_bill", "tip", tips, col="smoker");