Uncategorized

Exception spark

try { // Spark SQL code that might throw an exception spark.sql(“SELECT * FROM invalid_table”) } catch { case e: org.apache.spark.sql.AnalysisException => println(“Analysis exception: ” + e.getMessage) case e: org.apache.spark.sql.ParseException => println(“Parse exception: ”...

spark to oracle

To query an Oracle table using Spark, you need to set up a JDBC connection to the Oracle database. Here’s a step-by-step approach: Prerequisites: Oracle JDBC Driver: Ensure the Oracle JDBC driver (ojdbc8.jar) is...

new

import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import java.util.regex.Pattern // Initialize Spark session val spark = SparkSession.builder() .appName(“Dynamic SQL Execution with JSON Conversion”) .enableHiveSupport() .getOrCreate() // Load the table containing SQL queries val sqlQueriesDF =...

sql to json

import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ // Initialize Spark session val spark = SparkSession.builder() .appName(“Convert Columns to JSON with Actual Column Names”) .enableHiveSupport() .getOrCreate() // Load the table that contains the JSON mappings val...

spark optimization for big cluster

1. Increase Shuffle Partitions Given the size of your cluster, you can increase the shuffle partitions significantly to leverage the parallelism. spark.conf.set(“spark.sql.shuffle.partitions”, 1500) // Adjust as necessary 2. Increase Executor Memory and Cores With...

run sqlplus in shell script

#!/bin/bash # Define database connection details DB_USER=”your_username” DB_PASS=”your_password” DB_HOST=”your_db_host” DB_SID=”your_db_sid” # Check if enough parameters are provided if [ “$#” -lt 2 ]; then echo “Usage: $0 <source_table> <sql_file1> [sql_file2]” exit 1 fi SOURCE_TABLE=$1...

Dynamic Script to Create Table Using Input table

This script will: Dynamically determine the configuration file name based on the output Hive table name. Read the configuration file to get the columns. Extract the schema from the Hive table. Generate the Oracle...

Different ways of DATA STORAGE

Various techniques for storing data Cloud Storage: Widely adopted for its scalability, flexibility, and cost-effectiveness, cloud storage solutions like Amazon S3, Google Cloud Storage, and Microsoft Azure Blob Storage offer virtually unlimited storage capacity,...

Create and Store output in hive table of JSON data

  Using hive   — Create the target table CREATE TABLE combined_json_table ( id INT, column1 STRING, column2 INT, json_data STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ STORED AS TEXTFILE; — Insert...

RabbitMQ

RabbitMQ is an open-source message queue software that implements the Advanced Message Queuing Protocol (AMQP). It is designed for use in many use cases, including queuing, publish-subscribe, and event-driven architectures. It is developed by...