'Is there are difference between PySpark and SparkSQL? If so, what's the difference?

Long story short, I'm tasked with converting files from SparkSQL to PySpark as my first task at my new job.

However, I'm unable to see many differences outside of syntax. Is SparkSQL an earlier version of PySpark or a component of it or something different altogether?

And yes, it's my first time using these tools. But, I have experience with both Python & SQL, so it's not seeming to be that difficult of a task. Just want a better understanding.

Example of the syntax difference I'm referring to:

spark.read.table("db.table1").alias("a")
.filter(F.col("a.field1") == 11)
.join(
    other = spark.read.table("db.table2").alias("b"),
    on = 'field2',
    how = 'left'

Versus

    df = spark.sql(
  """
    SELECT b.field1,
            CASE WHEN ...
              THEN ...
              ELSE ...
            end field2
    FROM db.table1 a
    LEFT JOIN db.table2 b 
      on a.field1= b.field1
    WHERE a.field1= {}
    """.format(field1)
)


Solution 1:[1]

From the documentation: PySpark is an interface within which you have the components of spark viz. Spark core, SparkSQL, Spark Streaming and Spark MLlib.

Coming to the task you have been assigned, it looks like you've been tasked with translating SQL-heavy code into a more PySpark-friendly format.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 fuzzy-memory