'How to query CSV using pure spark sql

I hope to get output from spark-sql CLI. But the data is in CSV which is separated by "\t". Is there any way to do this using pure sql?

cmd like: spark-sql -e 'select * from csv.`xxx` where xxx=xxx'

The only way is to create a temp view first and then query the view?



Solution 1:[1]

you can create table for existing data with all options, like, this:

create table if not exists table_name
using csv
OPTIONS (header "true", inferSchema "true", path ".....csv");

and the query it in next calls. Or you can have this statement in the .sql file, and then provide file name via -i option to be performed before your select query.

Solution 2:[2]

This command works using delimiter "t" and creating a table

spark-sql -e 'create table if not exists benchmark_eraseme
using csv 
OPTIONS (header "true", delimiter "\t", path "/home/ec2-user/demo/customer_benchmark.csv"); 
select * from benchmark_eraseme;'

File:

cat ~/demo/customer_benchmark.csv
id_customer     description
49      client_1
8       Benchmark Gold
6       Benchmark Silver
7       Benchmark Regular

Output:

Time taken: 8.467 seconds
49      client_1
8       Benchmark Gold
6       Benchmark Silver
7       Benchmark Regular
Time taken: 0.548 seconds, Fetched 4 row(s)

I tried using a temporary view but at least in Spark 3.2.1 version it is not possible to create temporary views from CSV file.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex Ott
Solution 2 Luis Estrada