Home / Course catalog / Apache Spark (SPark101)

AEG

Apache Spark (SPark101)


Description

Introduing the most powerful data processing framework.

Content
  • Introduction to Apache spark
  • Apache Spark vs Map reduce-1
  • Apache Spark vs Map reduce-2
  • Spark assignment
  • Resilient distributed datasets-1
  • Resilient distributed datasets-2
  • RDD assignment
  • Different File formats Support in Apache Spark
  • What is Data Frame
  • How Spark Dataframe is different than Pandas Dataframe
  • Different Ways to Create Dataframe
  • Dataframes assignment
  • Spark Transformations, Actions and Lazy Evaluation
  • Read and Write files using PySpark
  • display() and show() in pyspark
  • "Select" On Dataframes
  • withColumn in Pyspark
  • DropColumn in Pyspark
  • Rename Columns in PySpark
  • PySpark Filter vs Where
  • PySpark orderBy() and sort()
  • PySpark GroupBy()
  • Join Strategies
  • Joins in Pysaprk
  • PySpark Union
  • When-Case in pysaprk
  • PySpark Window Functions
  • COLLECT_LIST() and COLLECT_SET() in PySpark
  • PySpark Date & Time Functions
  • Null Handling in pyspark
  • NULL handling in Pysaprk-2
  • Types of Apache Spark tables and views
  • Shared Variables in PySpark
  • How Apache Spark is fault tolerant?
  • Partitioning in Spark
  • Partitioning vs Bucketing
  • Handling Data Skewness in Apache Spark
  • Catalyst Optimizer in Apache Spark
  • Optimizing Delta Tables using Vacuum and Optimize
Completion rules
  • All units must be completed