Transformations and Actions in Spark with example

In Apache Spark, transformation and action are two types of operations that can be performed on RDDs (Resilient Distributed Datasets) or DataFrames/Datasets.

Transformations are operations that are performed on an RDD or DataFrame/Dataset that create a new RDD or DataFrame/Dataset. Transformations are executed lazily, which means that they are not executed immediately when called, but instead are executed only when an action is called. Some examples of transformations include:

map: applies a function to each element of an RDD or DataFrame/Dataset
filter: filters elements of an RDD or DataFrame/Dataset based on a given condition
groupBy: groups elements of an RDD or DataFrame/Dataset based on a given key
distinct: removes duplicate elements from an RDD or DataFrame/Dataset
flatMap: applies a function to each element of an RDD or DataFrame/Dataset and flattens the results

Actions are operations that return a value or produce a side effect, such as writing data to disk or returning a value to the driver program. Actions are executed immediately when called and trigger the execution of any transformations that have not yet been executed. Some examples of actions include:

count: returns the number of elements in an RDD or DataFrame/Dataset
first: returns the first element of an RDD or DataFrame/Dataset
reduce: aggregates elements of an RDD or DataFrame/Dataset using a given function
foreach: applies a function to each element of an RDD or DataFrame/Dataset
collect: returns all elements of an RDD or DataFrame/Dataset to the driver program