PySpark or SparkSQL for Data Wrangling

Apache Spark is established as a strong data processing engine for data workflows that are large or complex enough to benefit from distributed processing across multiple compute nodes.  I’ve created this demo from a Spark instance I spun up effortlessly and free of charge in DataBricks community. While RDD’s (Resilient Distributed Datasets) remain a foundation… Read More

Right-brain Data Visualizations: What works? Why?

Let’s set aside technical considerations and just explore a few unorthodox data visuals.  Why?  Because doing so helps us to reward viewers eye’s and brains.  At work, our eyes and brains are often forced to slog through repetitive logic and boring visual symbols that leave us uninspired, but an interesting visual can nudge us in… Read More