Serverless Data Engineering: Hands On with AWS Glue, Aurora, and Athena

This post follows up from my recent one entitled ‘AWS Serverless Analytics: The Promise…’ in which I described the value proposition for serverless analytics. In today’s update, I have a database hosted in Amazon Aurora, which we will crawl and automatically catalog with AWS Glue, load it into an S3 data lake using Glue, and… Read More

Python: What is Pandas’ equivalent to a just-slightly complex SQL query?

Our Python journey now takes us into Pandas DataFrames, with a native syntax very unlike SQL, especially as queries become more analytically complex. We will answer the following question, based on an included public list of employees and their jobs.  From a list where one row indicates one employee,  how many employee job titles in… Read More

Live Presentation: Lean Data Model Storming For Project Leaders

Data Models are a’changin!  To learn about these changes, please join me Saturday, Oct 15, as I present “Lean Data Model Storming for Data Project Leaders” at the Southland Technology (SoTec) Conference 2016.  To view my session abstract, click here. This premier event, underwritten by PMI, AITP, IIBA and QAI, will bring together hundreds of… Read More

Data Preparation Is Easy with Alteryx

Without support from I.T., analysts increasingly need to perform data preparation tasks of varying complexity in order to wrangle data into shape for current analytic needs.  Using Alteryx Designer, many such tasks are simple and intuitive.  Let’s consider an example. For the completed Alteryx workflow sample published in Alteryz product documentation, assume that, due to… Read More