Category "scala"

Slick using mapped column type in update statement

I have a trouble in using slick MappedColumnType, the code snippet is as following: private trait UserTable { self: HasDatabaseConfigProvider[JdbcProfile] =&

Scala Quill - queries always dynamic

I am always getting dynamic queries in Quill, even with simplest query I get dynamic query compiler warning log: type DbContext = PostgresAsyncContext[Liter

How to convert scala.some to scala.mutable.Map?

i am trying to write code to mask nested json fields.. def maskRecursively(map :mutable.Map[String,Object]):mutable.Map[String,Object] ={ val maskColumns = PII

Ways to make maven build faster?

I have a multi module java project. Maven takes almost around 40 secs to build it. I have tried maven with multi threaded builds too by specifying -T and -C arg

How to elegantly perform multiple effects in parallel with ZIO

I know that I can use import zio.Task def zip3Par[A, B, C](a: Task[A], b: Task[B], c: Task[C]): Task[(A, B, C)] = a.zipPar(b).zipWithPar(c) { case ((a, b),

Job aborted due to stage failure: ShuffleMapStage 20 (repartition at data_prep.scala:87) has failed the maximum allowable number of times: 4

I am submitting Spark job with following specification:(same program has been used to run different size of data range from 50GB to 400GB) /usr/hdp/2.6.0.3-8/

Snowpark connection errors with 0.6.0 jar

I am trying to use snowpark(0.6.0) via Jupiter notebooks(after installing Scala almond kernel). I am using Windows laptop. Had to change the examples here a b

How to write the contents of a Scala stream to a file?

I have a Scala stream of bytes that I'd like to write to a file. The stream has too much data to buffer all of it memory. As a first attempt, I created an Inp

How to Get the Number of Points in Each Polygon in Scala

I have two data frames. The first contains a list of latitude and longitude points along with an ID number associated with the person who was at those coordina

Spark Scala Split dataframe into equal number of rows

I have a Dataframe and wish to divide it into an equal number of rows. In other words, I want a list of dataframes where each one is a disjointed subset of the

How to test a Try[T] with ScalaTest correctly?

I have a method that returns a Try object: def doSomething(p: SomeParam): Try[Something] = { // code } I now want to test this with ScalaTest. Currently I

Spark hangs on union with zero running task

I have two records of type RDD[T] For example: val a: RDD[Integer] = .... val b: RDD[Integer] = ... when I perform val z = a.union(b) println(z) I find the s

Writing Algebraic Data Type in Scala

In Haskell, I can define a Tree: data Tree a = Empty | Node a (Tree a) (Tree a) How could I write this in Scala? I'm not sure how to keep the type paramete

How to modify values of JsonObject

I want to add a new field to jsonObject and this new field's name will be based on a value of another field. To be clear, this an examples of what I want to ach

How to modify values of JsonObject

I want to add a new field to jsonObject and this new field's name will be based on a value of another field. To be clear, this an examples of what I want to ach

How to efficiently remove duplicate rows in Spark Dataframe, keeping row with highest timestamp

I have a large data set which I am reading from Postgres. It has an ID column, a timestamp column and several other columns which may have been updated. For eac

How to extract values from key value map?

I have a column of type map, where the key and value changes. I am trying to extract the value and create a new column. Input: ----------------+ |symbols

sbt-assembly: deduplication found error

I am not sure whether mergestrategy or exclude jars is the best option here. Any help with how do I proceed further with this error will be great! [sameert@pzx

How to use spark with large decimal numbers?

My database has numeric value, which is up to 256-bit unsigned integer. However, spark's decimalType has a limit of Decimal(38,18). When I try to do calculatio

Circe list deserialization with best-attempt and error reporting

I'm using Circe to deserialize json containing a list. Sometimes a few items in the json list are corrupted, and that causes the entire deserialization to fail.