'do two instances from the same Spark Streaming can be in conflict?

I want to run the same Java Spark Streaming (10 seconds micro batch) through 2 instances (sparkStr1 and sparkStr2).

Mainly, they consume the same kafka topic (30 000 records / seconds), process the data and store it to a dedicated ElasticSearch cluster (sparkStr1 has ES1 and sparkStr2 has ES2).

  • When the first one is running = all is good
  • When both are running = the first one is slowed (and the second one is not very fast) => i assumed it was due to the topic Kafka at the beginning

I re-run the second one without the ES part (saveToEs) (to avoid this case) = all is good again !!

I am wondering if it could be a network issue but it seems not (I looked input/output network interfaces loads, they seemed normal).

I don't understand what's going on and where I should search now => do you have any idea about this ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source