I'm looking for a way to execute scala tests (implemented in munit, but it could be also ScalaTest) programmatically. I want to perform more or less what sbt te
How to capture a Glue job's arguments by position rather than using the getResolvedOptions function and passing the arguments as key value pairs?
I'm using scala spark and have a DataFrame: Source | Column1 | Column2 A ... ... B ... ... B ... ... C ...
I am trying to find the best way to parse a json file with inconsistent schema (but the schema of the same type is known and consistent) in spark in order to sp
Before even getting to F-bounded Polymorphism there is construction that underpin it that i already have a hard time to understand. trait Container[A] trait
[Spark RDD] Find the single row that has the highest count and for that row report the month, count and hashtag name. Print the result to the terminal output us
I have a Spark Table, which contains 400+ millions records/rows. I used spark.table to convert it into a DF. The DF looks like this below id pub_date
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(
I am trying to create a Spark application running on Scala that reads a .csv file that is located in src/main/resources directory and saves it on the local hdfs
I am loading data via pipelines in ADLS gen2 container. Now I want to create a table that has details that when the pipeline start running and then completed. l
I have a Seq which contains randomly generated words. I want to calculate the occurrence count of each word using map reduce. Now, I have been able to map the w
package com.knoldus import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.win
I need to use window function that is paritioned by 2 columns and do distinct count on the 3rd column and that as the 4th column. I can do count with out any is
With scala 2.11 and spark-streaming-kafka-0-8_2.11 I could do import org.apache.spark.streaming.kafka.KafkaCluster val params = Map[String, Object]( "bootstr
I have a pipe delimited file I need to strip the first two rows off of. So I read it into and RDD, exclude the first two rows, and make it into a data frame. va
I am trying to upgrade from Jackson-2.5 to 2.10 Below deserialization code was working for me, before but post upgrade the solution is failing with following er
I'm trying to use sbt-avro in a Scala project to generate Scala classes from an Avro schema. Here is the project structure: multi-tenant/ build.sbt proj
I'm trying to instantiate a SparkContext inside a SBT console, using the following scala commands: import org.apache.spark.SparkConf import org.apache.spark.Spa
I have junit tests in my scala sbt project. I know, that I can generate html reports for ScalaTest with: testOptions in Test += Tests.Argument(TestFrameworks.
I have junit tests in my scala sbt project. I know, that I can generate html reports for ScalaTest with: testOptions in Test += Tests.Argument(TestFrameworks.