PySpark or SparkSQL for Data Wrangling

Apache Spark is established as a good data processing engine for data workflows that are large and/or complex enough to benefit from distributed processing across multiple computing nodes.  I’ve created this demo from a Spark instance I spun up effortlessly and free of charge in DataBricks community. While RDD’s (Resilient Distributed Datasets) remain a technical… Read More

Python: What is Pandas’ equivalent to a just-slightly complex SQL query?

Our Python journey now takes us into Pandas DataFrames, with a native syntax very unlike SQL, especially as queries become more analytically complex. We will answer the following question, based on an included public list of employees and their jobs.  From a list where one row indicates one employee,  how many employee job titles in… Read More

Right-brain Data Visualizations: What works? Why?

Let’s set aside technical considerations and just explore a few unorthodox data visuals.  Why?  Because doing so helps us to reward viewers eye’s and brains.  At work, our eyes and brains are often forced to slog through repetitive logic and boring visual symbols that leave us uninspired, but an interesting visual can nudge us in… Read More

Live Presentation: Lean Data Model Storming For Project Leaders

Data Models are a’changin!  To learn about these changes, please join me Saturday, Oct 15, as I present “Lean Data Model Storming for Data Project Leaders” at the Southland Technology (SoTec) Conference 2016.  To view my session abstract, click here. This premier event, underwritten by PMI, AITP, IIBA and QAI, will bring together hundreds of… Read More

Data Preparation Is Easy with Alteryx

Without support from I.T., analysts increasingly need to perform data preparation tasks of varying complexity in order to wrangle data into shape for current analytic needs.  Using Alteryx Designer, many such tasks are simple and intuitive.  Let’s consider an example. For the completed Alteryx workflow sample published in Alteryz product documentation, assume that, due to… Read More

Fun with Calculations in Tableau

A recent customer gave me a chance to exercise Tableau features, on both the Server and Desktop. Although this write-up is focussed more on calculation than visualization, it also demonstrates my approach to visualizing complex data with a simple, impossible to misunderstand, presentation. My customer wanted visualizations that would bring simplicity to a rather complex… Read More