Category "scala"

How to define a Monad for a function type?

I am trying Cats for the first time and am using Scala 3, and I am trying to implement a set of parser combinators for self-pedagogy, however; I am stuck on the

It is possible to Stream data from beam (Scio) to an S3 bucket?

Currently, I'm working on a project which extracts data from a BigQuery table using Scio in Scala. I'm able to extract and ingest the data into ElasticSearch, b

Connection between kafka and spark : Failed to find data source : kafka

I am trying to do link between kafka and spark by reading data from one topic and tryy to print the content of this topic into a DataFrame, but by doing connect

How to get Unit Test counts in SonarQube for a Scala SBT build

Note: We are executing this as part of CI build in Teamcity Step 1: Getting coverage details addSbtPlugin("org.scoverage" % "sbt-scoverage" % "1.6.1") Step 2: S

Exception in thread "main" java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$

Hi I try to run spark on my local laptop. I created a mvn project in intelijidea and in my main class I have one line like bellow and when I try to run a projec

Run scalafmtCheck in an sbt assembly

I would like to run a scalafmtCheck in sbt assembly. I tried to add: (compile in Compile) := ((compile in Compile) dependsOn scalafmtCheck).value I got that e

Type inference in ZIO giving Any in for comprehension

So I have written a method to count the number of lines in a file in ZIO. def lines(file: String): Task[Long] = { def countLines(reader: BufferedReader): Ta

How to execute scala tests programmatically

I'm looking for a way to execute scala tests (implemented in munit, but it could be also ScalaTest) programmatically. I want to perform more or less what sbt te

Does AWS Glue support positional arguments

How to capture a Glue job's arguments by position rather than using the getResolvedOptions function and passing the arguments as key value pairs?

scala spark partitionby and get current partition name

I'm using scala spark and have a DataFrame: Source | Column1 | Column2 A ... ... B ... ... B ... ... C ...

Efficient way to parse a file with different json schemas in spark

I am trying to find the best way to parse a json file with inconsistent schema (but the schema of the same type is known and consistent) in spark in order to sp

On the road to understanding F-Bounded Polymorphism

Before even getting to F-bounded Polymorphism there is construction that underpin it that i already have a hard time to understand. trait Container[A] trait

Spark RDD: Find the single row that has the highest count and for that row report the month, count and hashtag name. Output Using PrintLn

[Spark RDD] Find the single row that has the highest count and for that row report the month, count and hashtag name. Print the result to the terminal output us

Random Sampling base on 1 column after Groupby

I have a Spark Table, which contains 400+ millions records/rows. I used spark.table to convert it into a DF. The DF looks like this below id pub_date

I am trying to setup spark in local but getting error

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(

Spark load csv file in jar from resources folder

I am trying to create a Spark application running on Scala that reads a .csv file that is located in src/main/resources directory and saves it on the local hdfs

Azure Storage Account file details in a table in databricks

I am loading data via pipelines in ADLS gen2 container. Now I want to create a table that has details that when the pipeline start running and then completed. l

Word count using map reduce on Seq[String]

I have a Seq which contains randomly generated words. I want to calculate the occurrence count of each word using map reduce. Now, I have been able to map the w

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/scala/StreamExecutionEnvironment

package com.knoldus import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.win

How to use countDistinct using a window function in Spark/Scala?

I need to use window function that is paritioned by 2 columns and do distinct count on the 3rd column and that as the 4th column. I can do count with out any is