Category "scala"

How to elegantly perform multiple effects in parallel with ZIO

I know that I can use import zio.Task def zip3Par[A, B, C](a: Task[A], b: Task[B], c: Task[C]): Task[(A, B, C)] = a.zipPar(b).zipWithPar(c) { case ((a, b),

Job aborted due to stage failure: ShuffleMapStage 20 (repartition at data_prep.scala:87) has failed the maximum allowable number of times: 4

I am submitting Spark job with following specification:(same program has been used to run different size of data range from 50GB to 400GB) /usr/hdp/2.6.0.3-8/

Snowpark connection errors with 0.6.0 jar

I am trying to use snowpark(0.6.0) via Jupiter notebooks(after installing Scala almond kernel). I am using Windows laptop. Had to change the examples here a b

How to write the contents of a Scala stream to a file?

I have a Scala stream of bytes that I'd like to write to a file. The stream has too much data to buffer all of it memory. As a first attempt, I created an Inp

How to Get the Number of Points in Each Polygon in Scala

I have two data frames. The first contains a list of latitude and longitude points along with an ID number associated with the person who was at those coordina

Spark Scala Split dataframe into equal number of rows

I have a Dataframe and wish to divide it into an equal number of rows. In other words, I want a list of dataframes where each one is a disjointed subset of the

How to test a Try[T] with ScalaTest correctly?

I have a method that returns a Try object: def doSomething(p: SomeParam): Try[Something] = { // code } I now want to test this with ScalaTest. Currently I

Spark hangs on union with zero running task

I have two records of type RDD[T] For example: val a: RDD[Integer] = .... val b: RDD[Integer] = ... when I perform val z = a.union(b) println(z) I find the s

Writing Algebraic Data Type in Scala

In Haskell, I can define a Tree: data Tree a = Empty | Node a (Tree a) (Tree a) How could I write this in Scala? I'm not sure how to keep the type paramete

How to modify values of JsonObject

I want to add a new field to jsonObject and this new field's name will be based on a value of another field. To be clear, this an examples of what I want to ach

How to modify values of JsonObject

I want to add a new field to jsonObject and this new field's name will be based on a value of another field. To be clear, this an examples of what I want to ach

How to efficiently remove duplicate rows in Spark Dataframe, keeping row with highest timestamp

I have a large data set which I am reading from Postgres. It has an ID column, a timestamp column and several other columns which may have been updated. For eac

How to extract values from key value map?

I have a column of type map, where the key and value changes. I am trying to extract the value and create a new column. Input: ----------------+ |symbols

sbt-assembly: deduplication found error

I am not sure whether mergestrategy or exclude jars is the best option here. Any help with how do I proceed further with this error will be great! [sameert@pzx

How to use spark with large decimal numbers?

My database has numeric value, which is up to 256-bit unsigned integer. However, spark's decimalType has a limit of Decimal(38,18). When I try to do calculatio

Circe list deserialization with best-attempt and error reporting

I'm using Circe to deserialize json containing a list. Sometimes a few items in the json list are corrupted, and that causes the entire deserialization to fail.

Consume TCP stream and redirect it to another Sink (with Akka Streams)

I try to redirect/forward a TCP stream to another Sink with Akka 2.4.3. The program should open a server socket, listen for incoming connections and then consum

getStrLn in ZIO working with flatMap but not inside for comprehension

val askNameFlatMap: ZIO[Console, IOException, Unit] = putStrLn("What is your Name? ") *> getStrLn.flatMap(name => putStrLn(s"Hello $name")) val askNa

Apache Flink + CEP - Detect same events

I'd like to detect events that share the same property. Suppose I have a simple case class: case class Record(name: String, value: Int) Suppose there is the

Sum a column values based on a condition using spark scala

I have a dataframe like this: JoiKey period Age Amount Jk1 2022-02 2 200 Jk1 2022-02 3 450 Jk2 2022-03 5 500 Jk3 2022-03 0 200 Jk2 2022-02 8 300 Jk3 2022-03 9