Category "scala"

Trigger IF Statement only when two Spark dataframe meet the conditions

I have two identical Spark DataFrame. They have the same columns. I am trying to create a IF-Else statement in one line but couldnt find a better way to do it.

Find the maximum value from JSON data in Scala

I am very new to programming in Scala. I am writing a test program to get maximum value from JSON data. I have following code: import scala.io.Source import sc

How do I verbalize the term F[_] in scala/cats-effect

I'm learning the concept of F[_] as a constructor for other types, but how do you pronounce this to another human or say it in your head (for us internal monolo

Cannot connect to Cassandra in spark-shell

I am trying to connect to a remote cassandra cluster in my spark shell using the Spark-cassandra connector. But its throwing some unusual errors. I do the usual

Provider com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be instantiated while reading from bigquery in Jupyter lab

I have followed this post pyspark error reading bigquery: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class and followed the resolution

Programmatically add/remove executors to a Spark Session

I'm looking for a reliable way in Spark (v2+) to programmatically adjust the number of executors in a session. I know about dynamic allocation and the ability

What should I use instead deprecated FlinkKafkaConsumer? Scala Flink

I try to get data from Kafka to Flink, I use FlinkKafkaConsumer but Intellij shows me that it is depricated and also ssh console in Google Cloud shows me this e

sbt package is trying to download a package whose path does not exist

These are the contents of my build.sbt file: name := "WordCounter" version := "0.1" scalaVersion := "2.13.1" libraryDependencies ++= Seq( "org.apache.spar

Find the Max value of an Array column and find associated value in another Array with in the dataframe

I have a csv file with below data. Id Subject Marks 1 M,P,C 10,8,6 2 M,P,C 5,7,9 3 M,P,C 6,7,4 I Need to find out Max value in the Marks column for each Id an

spark save simple string to text file

I have a spark job that needs to store the last time it ran to a text file. This has to work both on HDFS but also on local fs (for testing). However it seems

Akka Finite State Machine and how to protocol Behaviors.unhandled?

I have a question about Behaviors.unhandled, I know that Akka sends the unhandled message to the Dead Letter and with the following configuration it also logs i

Improvement of State Machine implemented in scala

I am working on an implementation of a state machine in scala. The original version is written in python, therefore I have a lot of if /else clauses in the co

Json4s not serializing Java classes

I have some scala code that needs to be able to serialize/deserialize some Java classes using Json4s. I am using "org.json4s" %% "json4s-ext" % "4.0.5" and "org

Error building maven project in Intellij : "object apache is not a member of package org"

Whenever I try to run my main program directly in IntelliJ I get this error: Error:(5, 12) object apache is not a member of package org import org.apache.common

Using scalamock: Could not find implicit value for evidence parameter of type error

I am writing unit tests for my spark/scala application. I am using scalamock as well to mock objects, specifically Session / Session Factory. In one of my test

Update DeltaTable properties on S3

I have a DeltaTable at aws S3 location (s3://bucket/myDeltaTable) which has a default table property delta.logRetentionDuration set to 30 days. Is there a way I

How to make sure my DataFrame frees its memory?

I have a Spark/Scala job in which I do this: 1: Compute a big DataFrame df1 + cache it into memory 2: Use df1 to compute dfA 3: Read raw data into df2 (again,

How to run ConfigMapWrapperSuite?

I need to write an integration test and it requires starting a server executable. I want to make location of the server configurable, so that I could set it on

Range queries in Elasticsearch Java API

I have two fields in my ES index: min_duration and max_duration. I want to create a query to find all the documents for input duration such that : min_duration

Installing new Scala SDK 3.x.x on IntelliJ

I need to run Scala 3.x.x from IntelliJ. This is how I am trying to install it: Click File > Project Structure... Click on the "+" button (New Project Libra