Category "apache-flink"

Recalculate historical data using Apache Beam

I have an Apache Beam streaming project that calculates data and writes it to the database, what is the best way to reprocess all historical records after a bug

Get nested fields from Kafka message using Apache Flink SQL

I'm trying to create a source table using Apache Flink 1.11 where I can get access to nested properties in a JSON message. I can pluck values off root properti

Why Apache flink in not running on Windows?

I am a newbie and I am using apache flink for the first time. I have downloaded flink-1.14.4-bin-scala_2.12 version in windows, I have installed cygwin to run

The RemoteEnvironment cannot be used when submitting a program through a client, or running in a TestEnvironment context

I was trying to execute the apache-beam word count having Kafka as input and output. But on submitting the jar to the flink cluster, this error came - The Remot

Pyflink jdbc sink

I am trying to make use of Pyflink's JdbcSink to connect to Oracle's ADB instance. I can find examples of JdbcSink using java in Flink's official documentation.

How to prevent Flink job from getting cancelled by itself

Environment My Flink Job runs on a standalone cluster, session mode. Version is 1.13 (https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/res

Flink taskmanager metrics disappear after starting a job

I have a Standalone cluster (with jobmanager and taskmanager on same machine) on 1.14.4 and I'm testing the migration to 1.15.0 But I keep losing the taskmanage

How to skip kafka history data in flink job if certain lag is encountered?

Sometimes we encounter lag in kafka consumer due to some external issues. Flink job will always consume kafka history (delayed data) with exactly-once semantics

How to skip kafka history data in flink job if certain lag is encountered?

Sometimes we encounter lag in kafka consumer due to some external issues. Flink job will always consume kafka history (delayed data) with exactly-once semantics

Inconsistent results when joining multiple tables in Flink

We've 4 CDC sources defined of which we need to combine the data into one result table. We're creating a table for each source using the SQL API, eg: "CREATE TA

Flink Python Datastream API Kafka Producer Sink Serializaion

I'm trying to read data from one kafka topic and writing to another after making some processing. I'm able to read data and process it when i try to write it to

Where is my main method runs when using in yarn-cluster and detached mode

I am new to flink and reading Flink 1.8 source code(https://github.com/apache/flink/tree/release-1.8) to understand how flink works with YARN. I know there are

No ExecutorFactory found to execute the application in Flink 1.11.1

first of all I have read this post about the same issue and tried to follow the same solution that works for him (create a new quickstart with mvn and migrate t

Is there an example of PyFlink SQL unit testing in a self-contained repo?

Is there an example of a self-contained repository showing how to perform SQL unit testing of PyFlink (specifically 1.13.x if possible)? There is a related SO q

if we cancel the job with savepoint, job got cancelled and savepoint was failure how to restore this job now

I have deployed flink job in application mode using native kubernetes deployment and stopping job along with savepoint (I'm using rest api command for that) but

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/scala/StreamExecutionEnvironment

package com.knoldus import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.win

Flink TaskManager not reconnecting to the new Jobmanager

I have configured Flink in HA mode as mentioned here: I wanted to test the fault tolerance, hence I did the following: Setup Flink cluster with 2 JobManagers

how to filter None in Row Type Pyflink

I'm using Pyflink and was wondering if there is more Generic way to filter None value or handle none json format. # main(): try: data_stream = data_

How to drain the window after a Flink join using coGroup()?

I'd like to join data coming in from two Kafka topics ("left" and "right"). Matching records are to be joined using an ID, but if a "left" or a "right" record i

Flink checkpoint not replaying the kafka events which were in process during the savepoint/checkpoint

I want to test end-to-end exactly once processing in flink. My job is: Kafka-source -> mapper1 -> mapper-2 -> kafka-sink I had put a Thread.sleep(100