'PATINDEX in spark sql

I have this statement in sql

Case WHEN AAAA is not null then AAAA
     Else RTRIM(LEFT(BBBB, PATINDEX('%[0-9]%', BBBB) - 1))
     END as NAME. 

I need to convert this to spark sql. I tried using indexOf, but it doesn't take the string '%[0-9]%. How do i convert the above statement to spark SQL. please help

Thanks !



Solution 1:[1]

My code to do this in scala spark. I used udf to do this. Edit : Assuming string needs to be cut from first occurrence of number.

import spark.implicits._
val df = Seq(("SOUTH TEXAS SYNDICATE 454C"),
  ("SANDERS 34-27 #3TF"),
  ("K. R. BRACKEN B 3H"))
  .toDF("name")

df.createOrReplaceTempView("temp")
val getIndexOfFirstNumber = (s: String) => {
      val str = s.split("\\D+").filter(_.nonEmpty).toList
      s.indexOf(str(0))
    }
spark.udf.register("getIndexOfFirstNumber", getIndexOfFirstNumber)

spark.sql("""
select name,substr(name, 0, getIndexOfFirstNumber(name) -1) as final_name
from temp
""").show(20,false)

Result ::

   +------------------------------------+----------------------+
   |name                                |final_name            |
   +------------------------------------+----------------------+
   |SOUTH TEXAS SYNDICATE 454C          |SOUTH TEXAS SYNDICATE |
   |SANDERS 34-27 #3TF                  |SANDERS               |
   |K. R. BRACKEN B 3H                  |K. R. BRACKEN B       |
   |ALEXANDER-WESSENDORFF 1 (SA) A5 A 5H|ALEXANDER-WESSENDORFF |
   |USZYNSKI-FURLOW (SA) B 3H           |USZYNSKI-FURLOW (SA) B|
   +------------------------------------+----------------------+

Solution 2:[2]

Based on Manish answer I build this, it's more generic and was build in Python. You can use it on spark sql as well The exemple is not for numbers but for the string DATE

import re
def PATINDEX(string,s):
    if s:
        match = re.search(string, s)
        if match:
            return match.start()+1
        else:
            return 0
    else:
        return 0
spark.udf.register("PATINDEX", PATINDEX)
PATINDEX('DATE','a2aDATEs2s')

Solution 3:[3]

You can use the below method to remove the leading zeroes using Databricks or Spark SQL.

REPLACE(LTRIM(REPLACE('0000123045','0',' ')),' ','0')

EXPLANATION:

  • The first replace function replaces the zero with empty space. Example : ' 123 45'

  • The LTRIM function removes the empty space from the left. Example : '123 45'

  • Then the third replace function replaces the empty space with zero. Example:'123045'

Similarly, you can use the function with RTRIM for removing the trailing zeroes accordingly.

Do an upvote if you like my answer. Thanks

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Jose Macedo
Solution 3 noobprogrammer