Featured

July 17, 2020July 17, 2020 by Daniel Upton

Amazon Quicksight: Deep Dive

One of my goals in this series on AWS Serverless Analytics has been to demonstrate how Amazon Quicksight allows us to build, share, and secure data visualizations and reports with minimal work associated with managing server hardware, operating systems or applications. In previous entries, I have explored AWS Glue, S3, Amazon Athena and, at a… Read More

July 6, 2020July 6, 2020 by Daniel Upton

Hands On Amazon Athena

Expanding on my recent post on Serverless Data Engineering with AWS Glue, note that Athena is another AWS managed service from which we can perform queries on an S3 data lake, connected via the query-able AWS Glue data catalog, using the full set of standard SQL, including complex joins, subqueries, string manipulations, and window (aka… Read More

June 29, 2020July 8, 2020 by Daniel Upton

Serverless Data Engineering: Hands On with AWS Glue, Aurora, and Athena

This post follows up from my recent one entitled ‘AWS Serverless Analytics: The Promise…’ in which I described the value proposition for serverless analytics. In today’s update, I have a database hosted in Amazon Aurora, which we will crawl and automatically catalog with AWS Glue, load it into an S3 data lake using Glue, and… Read More

June 23, 2020July 16, 2020 by Daniel Upton

Pre-Clinical Biopharmaceutical B&D: Data Modeling Amid Scientific Complexity

The following data model diagram is a reference for the ‘Challenges and Solutons’ entry of the same title, available here. To protect intellectual property, the image is intentionally blurred. It’s not your eyes. (-;

June 19, 2020August 26, 2020 by Daniel Upton

AWS Serverless Analytics: The Promise

As defined at cloudflare.com, a virtual machine, is “software that imitates a complete computer system [my note: an operating system, applications, network interfaces; everything except hardware]. It is isolated from the rest of the machine that hosts it and behaves as if it were the only OS on it…” A container, which does not have… Read More

May 18, 2020June 12, 2020 by Daniel Upton

PySpark or SparkSQL for Data Wrangling

Apache Spark is established as a strong data processing engine for data workflows that are large or complex enough to benefit from distributed processing across multiple compute nodes. I’ve created this demo from a Spark instance I spun up effortlessly and free of charge in DataBricks community. While RDD’s (Resilient Distributed Datasets) remain a foundation… Read More

May 5, 2020June 19, 2020 by Daniel Upton

Python: What is Pandas’ equivalent to a just-slightly complex SQL query?

Our Python journey now takes us into Pandas DataFrames, with a native syntax very unlike SQL, especially as queries become more analytically complex. We will answer the following question, based on an included public list of employees and their jobs. From a list where one row indicates one employee, how many employee job titles in… Read More

May 1, 2020May 31, 2020 by Daniel Upton

NumPy: Index, Slice, and Aggregate a 2D Array

Python’s NumPy library is fun in that it’s easy to work with multi-dimensional data. For simplicity, consider a 2D array (aka matrix). I wrote some code to demonstrate the creation, simple visualization, slicing, and aggregation of data within a matrix, including totals and slice-subtotals. Source Code: It is available in Git Hub: NumPy 2D Array… Read More

April 29, 2020June 4, 2020 by Daniel Upton

Python Object-Oriented Programming: Doing Math Just Once Beats Repetition

Although I don’t know whether OOP will be central to our exploration of NumPy, Pandas and other Python libraries for analytics, here is a simple example of what I find useful. I want to be able to perform any one of a set of related x,y matrix expressions, and do so repeatedly without re-specifying… Read More

April 28, 2020May 18, 2020 by Daniel Upton

Python Moment: Is ‘Never Odd or Even’ a Palindrome?

Quick little geek-out here: Had some initial fun with Python string manipulations in order to detect a palindrome, defined here as a word or phrase (perhaps a very long phrase) spelled the same when reversed as when forward. Had to dig just a bit deeper to accommodate any blank spaces that would otherwise violate the… Read More