Fundamentals of Scalable Data Science
Number of badges issued: 3609
This badge earner has proven a deep understanding of massive parallel data processing on ApacheSpark. They have mastered low-level functional programming using python on the Resilient Distributed Dataset (RDD) API and mastered relational data processing using Apache SparkSQL & the DataFrame API. Earners understand how data processing & machine learning can be parallelized using scale-out clusters, & can compute statistical measures, integrate & transform data, & create advanced visualizations.