I have two identical Spark DataFrame. They have the same columns. I am trying to create a IF-Else statement in one line but couldnt find a better way to do it.
I am very new to programming in Scala. I am writing a test program to get maximum value from JSON data. I have following code: import scala.io.Source import sc
I'm learning the concept of F[_] as a constructor for other types, but how do you pronounce this to another human or say it in your head (for us internal monolo
I am trying to connect to a remote cassandra cluster in my spark shell using the Spark-cassandra connector. But its throwing some unusual errors. I do the usual
I have followed this post pyspark error reading bigquery: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class and followed the resolution
I'm looking for a reliable way in Spark (v2+) to programmatically adjust the number of executors in a session. I know about dynamic allocation and the ability
I try to get data from Kafka to Flink, I use FlinkKafkaConsumer but Intellij shows me that it is depricated and also ssh console in Google Cloud shows me this e
These are the contents of my build.sbt file: name := "WordCounter" version := "0.1" scalaVersion := "2.13.1" libraryDependencies ++= Seq( "org.apache.spar
I have a csv file with below data. Id Subject Marks 1 M,P,C 10,8,6 2 M,P,C 5,7,9 3 M,P,C 6,7,4 I Need to find out Max value in the Marks column for each Id an
I have a spark job that needs to store the last time it ran to a text file. This has to work both on HDFS but also on local fs (for testing). However it seems
I have a question about Behaviors.unhandled, I know that Akka sends the unhandled message to the Dead Letter and with the following configuration it also logs i
I am working on an implementation of a state machine in scala. The original version is written in python, therefore I have a lot of if /else clauses in the co
I have some scala code that needs to be able to serialize/deserialize some Java classes using Json4s. I am using "org.json4s" %% "json4s-ext" % "4.0.5" and "org
Whenever I try to run my main program directly in IntelliJ I get this error: Error:(5, 12) object apache is not a member of package org import org.apache.common
I am writing unit tests for my spark/scala application. I am using scalamock as well to mock objects, specifically Session / Session Factory. In one of my test
I have a DeltaTable at aws S3 location (s3://bucket/myDeltaTable) which has a default table property delta.logRetentionDuration set to 30 days. Is there a way I
I have a Spark/Scala job in which I do this: 1: Compute a big DataFrame df1 + cache it into memory 2: Use df1 to compute dfA 3: Read raw data into df2 (again,
I need to write an integration test and it requires starting a server executable. I want to make location of the server configurable, so that I could set it on
I have two fields in my ES index: min_duration and max_duration. I want to create a query to find all the documents for input duration such that : min_duration
I need to run Scala 3.x.x from IntelliJ. This is how I am trying to install it: Click File > Project Structure... Click on the "+" button (New Project Libra