May 18, 2020June 12, 2020 by Daniel Upton

PySpark or SparkSQL for Data Wrangling

Apache Spark is established as a strong data processing engine for data workflows that are large or complex enough to benefit from distributed processing across multiple compute nodes. I’ve created this demo from a Spark instance I spun up effortlessly and free of charge in DataBricks community. While RDD’s (Resilient Distributed Datasets) remain a foundation… Read More

September 22, 2019September 28, 2019 by Daniel Upton

Right-brain Data Visualizations: What works? Why?

Let’s set aside technical considerations and just explore a few unorthodox data visuals. Why? Because doing so helps us to reward viewers eye’s and brains. At work, our eyes and brains are often forced to slog through repetitive logic and boring visual symbols that leave us uninspired, but an interesting visual can nudge us in… Read More

DecisionLab.Net

Tag: Data

PySpark or SparkSQL for Data Wrangling

Right-brain Data Visualizations: What works? Why?