Category "apache-flink"

How to prevent Flink job from getting cancelled by itself

Environment My Flink Job runs on a standalone cluster, session mode. Version is 1.13 (https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/res

Flink taskmanager metrics disappear after starting a job

I have a Standalone cluster (with jobmanager and taskmanager on same machine) on 1.14.4 and I'm testing the migration to 1.15.0 But I keep losing the taskmanage

How to skip kafka history data in flink job if certain lag is encountered?

Sometimes we encounter lag in kafka consumer due to some external issues. Flink job will always consume kafka history (delayed data) with exactly-once semantics

How to skip kafka history data in flink job if certain lag is encountered?

Sometimes we encounter lag in kafka consumer due to some external issues. Flink job will always consume kafka history (delayed data) with exactly-once semantics

Inconsistent results when joining multiple tables in Flink

We've 4 CDC sources defined of which we need to combine the data into one result table. We're creating a table for each source using the SQL API, eg: "CREATE TA

Flink Python Datastream API Kafka Producer Sink Serializaion

I'm trying to read data from one kafka topic and writing to another after making some processing. I'm able to read data and process it when i try to write it to

Where is my main method runs when using in yarn-cluster and detached mode

I am new to flink and reading Flink 1.8 source code(https://github.com/apache/flink/tree/release-1.8) to understand how flink works with YARN. I know there are

No ExecutorFactory found to execute the application in Flink 1.11.1

first of all I have read this post about the same issue and tried to follow the same solution that works for him (create a new quickstart with mvn and migrate t

Is there an example of PyFlink SQL unit testing in a self-contained repo?

Is there an example of a self-contained repository showing how to perform SQL unit testing of PyFlink (specifically 1.13.x if possible)? There is a related SO q

if we cancel the job with savepoint, job got cancelled and savepoint was failure how to restore this job now

I have deployed flink job in application mode using native kubernetes deployment and stopping job along with savepoint (I'm using rest api command for that) but

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/scala/StreamExecutionEnvironment

package com.knoldus import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.win

Flink TaskManager not reconnecting to the new Jobmanager

I have configured Flink in HA mode as mentioned here: I wanted to test the fault tolerance, hence I did the following: Setup Flink cluster with 2 JobManagers

how to filter None in Row Type Pyflink

I'm using Pyflink and was wondering if there is more Generic way to filter None value or handle none json format. # main(): try: data_stream = data_

How to drain the window after a Flink join using coGroup()?

I'd like to join data coming in from two Kafka topics ("left" and "right"). Matching records are to be joined using an ID, but if a "left" or a "right" record i

Flink checkpoint not replaying the kafka events which were in process during the savepoint/checkpoint

I want to test end-to-end exactly once processing in flink. My job is: Kafka-source -> mapper1 -> mapper-2 -> kafka-sink I had put a Thread.sleep(100

Apache Flink - writing stream to S3 error - null uri host

I have a Flink data pipeline that transforms the log file downloaded from S3 and write back in parquet file format to another S3 bucket. I have configured the S

Cannot access Flink dashboard localhost:8081 on windows

I follow the first steps to install Flink. I can start the cluster without any problem $ start-cluster.sh Starting cluster. Starting standalonesession daemon on

Integration testing flink job

I've written a small flink application. I has some input, and enriches it with data from an external source. It's an RichAsyncFunction and within the open metho

What should I use instead deprecated FlinkKafkaConsumer? Scala Flink

I try to get data from Kafka to Flink, I use FlinkKafkaConsumer but Intellij shows me that it is depricated and also ssh console in Google Cloud shows me this e

Windowing on mapped stream is giving different results

Use Case : Map the input stream of sometype to another type and sink. Also apply the aggragator on the mapped stream to count the number of events of each event