MEVN stack experiment with Jupyter
An experiment to see how far I can go developing an application using only a Jupyter notebook.
An experiment to see how far I can go developing an application using only a Jupyter notebook.
Using the blockdiag
library of Python you can easily create block diagrams and with the magic of Jupyter's nbconvert
transform it into a presentation.
Create a lagged column in a PySpark dataframe:
from pyspark.sql.functions import monotonically_increasing_id, lag
from pyspark.sql.window import Window
# Add ID to be used by the window function
df = df.withColumn('id', monotonically_increasing_id())
# Set the window
w = Window.orderBy("id")
# Create the lagged value
value_lag = lag('value').over(w)
# Add the lagged values to a new column
df = df.withColumn('prev_value', value_lag)
This tutorial will go through creating an application using the MEVN stack.
A simple trick to select columns from a dataframe:
# Create the filter condition
condition = lambda col: col not in DESIRED_COLUMNS
# Filter the dataframe
filtered_df = df.drop(*filter(condition, df.columns))
Another experiment: using the progressive JavaScript Framework Vue.js in a Jupyter notebook.
In my previous posts I have already shown simple examples of using MapReduce and Spark with Pyspark. A missing piece moving from MapReduce to Spark is the usage of Pig scripts. This posts shows an example howto use a Pig script.
This is a short explanation on how to setup a Truffle decentralized app using Docker containers.
Last time I started to experiment with Hadoop and simple scripts using MapReduce and Pig on a Cloudera Docker container. Now lets start playing with Spark, since this is the goto language for machine learning on Hadoop.
This post describes my first experiment with the Cloudera environment by trying to use the basic MapReduce method on a simple dataset.