I am trying Cats for the first time and am using Scala 3, and I am trying to implement a set of parser combinators for self-pedagogy, however; I am stuck on the
Currently, I'm working on a project which extracts data from a BigQuery table using Scio in Scala. I'm able to extract and ingest the data into ElasticSearch, b
I am trying to do link between kafka and spark by reading data from one topic and tryy to print the content of this topic into a DataFrame, but by doing connect
Note: We are executing this as part of CI build in Teamcity Step 1: Getting coverage details addSbtPlugin("org.scoverage" % "sbt-scoverage" % "1.6.1") Step 2: S
Hi I try to run spark on my local laptop. I created a mvn project in intelijidea and in my main class I have one line like bellow and when I try to run a projec
I would like to run a scalafmtCheck in sbt assembly. I tried to add: (compile in Compile) := ((compile in Compile) dependsOn scalafmtCheck).value I got that e
So I have written a method to count the number of lines in a file in ZIO. def lines(file: String): Task[Long] = { def countLines(reader: BufferedReader): Ta
I'm looking for a way to execute scala tests (implemented in munit, but it could be also ScalaTest) programmatically. I want to perform more or less what sbt te
How to capture a Glue job's arguments by position rather than using the getResolvedOptions function and passing the arguments as key value pairs?
I'm using scala spark and have a DataFrame: Source | Column1 | Column2 A ... ... B ... ... B ... ... C ...
I am trying to find the best way to parse a json file with inconsistent schema (but the schema of the same type is known and consistent) in spark in order to sp
Before even getting to F-bounded Polymorphism there is construction that underpin it that i already have a hard time to understand. trait Container[A] trait
[Spark RDD] Find the single row that has the highest count and for that row report the month, count and hashtag name. Print the result to the terminal output us
I have a Spark Table, which contains 400+ millions records/rows. I used spark.table to convert it into a DF. The DF looks like this below id pub_date
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(
I am trying to create a Spark application running on Scala that reads a .csv file that is located in src/main/resources directory and saves it on the local hdfs
I am loading data via pipelines in ADLS gen2 container. Now I want to create a table that has details that when the pipeline start running and then completed. l
I have a Seq which contains randomly generated words. I want to calculate the occurrence count of each word using map reduce. Now, I have been able to map the w
package com.knoldus import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.win
I need to use window function that is paritioned by 2 columns and do distinct count on the 3rd column and that as the 4th column. I can do count with out any is