Setting up Spark with minIO as object storage
To setup one of my data projects, I need (object) storage to save my data. Using Spark I want to be able to read and write Parquet, CSV and other file formats.
To setup one of my data projects, I need (object) storage to save my data. Using Spark I want to be able to read and write Parquet, CSV and other file formats.
My first experiment with Ansible to automate the provisioning of my server.
A simple Terraform deployment of a Lambda function that exports a Looker view to S3.
In this notebook I interact with AWS Glue using boto3
.
In this notebook I create a date range with a precision of days and a date range with a precision of a month using datetime
with timedelta
. This has helped me for automating filtering tasks, where I had to query data each day for a certain period and write te results to timestamped files.
My attempt to interact with Parquet files on Azure Blob Storage. Reading and writing Pandas dataframes is straightforward, but only the reading part is working with Spark 2.4.0.
This notebook contains a small example that interpolates the values for a sparse dataframe and calculates the difference with a smaller dataframe.
A short example on how to interact with S3 from Pyspark.
In this short tutorial I show how I developed my first Glue scripts for the AWS platform.
This tutorial explains how to write a lambda functions in Python, test it locally, deploy it to AWS and test it in the cloud using Amazon's SAM. The README.md
inside the cookiecutter
template folder is used as the base of this tutorial.