PySpark or SparkSQL for Data Wrangling

Apache Spark is established as a good data processing engine for data workflows that are large and/or complex enough to benefit from distributed processing across multiple computing nodes.  I’ve created this demo from a Spark instance I spun up effortlessly and free of charge in DataBricks community. While RDD’s (Resilient Distributed Datasets) remain a technical… Read More

Right-brain Data Visualizations: What works? Why?

Let’s set aside technical considerations and just explore a few unorthodox data visuals.  Why?  Because doing so helps us to reward viewers eye’s and brains.  At work, our eyes and brains are often forced to slog through repetitive logic and boring visual symbols that leave us uninspired, but an interesting visual can nudge us in… Read More