May 18, 2020June 12, 2020 by Daniel Upton

PySpark or SparkSQL for Data Wrangling

Apache Spark is established as a strong data processing engine for data workflows that are large or complex enough to benefit from distributed processing across multiple compute nodes. I’ve created this demo from a Spark instance I spun up effortlessly and free of charge in DataBricks community. While RDD’s (Resilient Distributed Datasets) remain a foundation… Read More

May 5, 2020June 19, 2020 by Daniel Upton

Python: What is Pandas’ equivalent to a just-slightly complex SQL query?

Our Python journey now takes us into Pandas DataFrames, with a native syntax very unlike SQL, especially as queries become more analytically complex. We will answer the following question, based on an included public list of employees and their jobs. From a list where one row indicates one employee, how many employee job titles in… Read More

May 1, 2020May 31, 2020 by Daniel Upton

NumPy: Index, Slice, and Aggregate a 2D Array

Python’s NumPy library is fun in that it’s easy to work with multi-dimensional data. For simplicity, consider a 2D array (aka matrix). I wrote some code to demonstrate the creation, simple visualization, slicing, and aggregation of data within a matrix, including totals and slice-subtotals. Source Code: It is available in Git Hub: NumPy 2D Array… Read More

April 29, 2020June 4, 2020 by Daniel Upton

Python Object-Oriented Programming: Doing Math Just Once Beats Repetition

Although I don’t know whether OOP will be central to our exploration of NumPy, Pandas and other Python libraries for analytics, here is a simple example of what I find useful. I want to be able to perform any one of a set of related x,y matrix expressions, and do so repeatedly without re-specifying… Read More

April 28, 2020May 18, 2020 by Daniel Upton

Python Moment: Is ‘Never Odd or Even’ a Palindrome?

Quick little geek-out here: Had some initial fun with Python string manipulations in order to detect a palindrome, defined here as a word or phrase (perhaps a very long phrase) spelled the same when reversed as when forward. Had to dig just a bit deeper to accommodate any blank spaces that would otherwise violate the… Read More

DecisionLab.Net

Category: Python