Spark

Convert your hive output to Json format

Using CONCAT and string manipulation functions: SELECT CONCAT(‘{ “column1”: “‘, column1, ‘”, “column2”: “‘, column2, ‘”, “column3”: “‘, column3, ‘” }’) AS json_data FROM your_table; Using a custom UDF (User-Defined Function): If the built-in...

examples of transformations in PySpark

Here are some examples of transformations in PySpark: Map map(func) – Applies the given function to each element of the dataset and returns a new dataset with the results. For example: from pyspark.sql.functions import...

Transformations and Actions in Spark with example

In Apache Spark, transformation and action are two types of operations that can be performed on RDDs (Resilient Distributed Datasets) or DataFrames/Datasets. Transformations are operations that are performed on an RDD or DataFrame/Dataset that...

Hadoop Vs Apache Storm Vs Apache beam

Hadoop, Apache Storm, Apache Beam, and Apache Spark are all open-source big data processing frameworks that are used to process large amounts of data. However, each of these frameworks has its own strengths and...

How do we trigger automated clean-ups in Spark?

To trigger automated clean-ups in Spark, you can use the spark.cleaner.ttl configuration property. This property specifies the maximum time (in seconds) that a task’s working data is retained after the task has completed. If...

Apache Spark vs Apache Flink vs Apache Storm

Apache Spark: It is a powerful, in-memory data processing engine that is designed to be fast, easy to use, and flexible. It can handle batch, interactive, and streaming workloads, and can also be used...

outputMode method in spark

The outputMode method specifies how the output of the streaming query should be written to the sink. There are three possible output modes: “append“: This mode writes the output of the streaming query to...