'Tests in Foundry all pass when run one-by-one but many fail when run as part of a suite

I have many tests that all pass when run individually but 15% fail during a full build when I try to build a data set.

Of the 15% that fail, most of them fail with:

  E                   py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.upper. Trace:
  E                   py4j.Py4JException: Method upper([class java.lang.String]) does not exist
  E                     at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
  E                     at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
  E                     at py4j.Gateway.invoke(Gateway.java:276)
  E                     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
  E                     at py4j.commands.CallCommand.execute(CallCommand.java:79)
  E                     at py4j.GatewayConnection.run(GatewayConnection.java:238)
  E                     at java.base/java.lang.Thread.run(Thread.java:829)

The stack trace leading up to this is:

  ../build/conda/env/lib/python3.6/site-packages/pyspark/sql/functions.py:44: in _
      jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
  ../build/conda/env/lib/python3.6/site-packages/py4j/java_gateway.py:1286: in __call__
      answer, self.gateway_client, self.target_id, self.name)
  ../build/conda/env/lib/python3.6/site-packages/pyspark/sql/utils.py:63: in deco
      return f(*a, **kw)

Note that this snippet is called from code that works just fine in production.

Also, note that when the suite is run on my laptop, all tests pass.

Can somebody please help?



Solution 1:[1]

This could be caused by different Spark versions.

Some older versions of PySpark didn't support calling F.upper(<column name as string>), instead only supporting F.upper(<pyspark.sql.Column>). See Why pyspark.sql lower function not accept literal col name and length function do?

In Palantir Foundry Transforms, it'd be worth making sure your repository is on the latest Transforms versions. You can do this by selecting in Code Repositories by selecting 'Upgrade' from the branch you're working in:

image showing the Upgrade button in Code Repositories

This will then open a PR against your branch that will upgrade to the latest versions.

You can find more details on upgrading repositories here: https://www.palantir.com/docs/foundry/code-repositories/repository-upgrades


Alternatively, you could try using F.upper(F.col("my_column"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1