0

Doc says this mode can cause bugs, but it does not tell me the rule of using this mode, in what case it will cause bugs? Let's say I have a job,

  1. source: kafka (byte[] data),
  2. flat-map: parse byte[] to Google Protobuf object 'foo', create a Tuple2<>(foo.id, foo), and return this tuple2
  3. keyby and process: for each id, put the first foo into ValueState, update ValueState if there are multiple object with same id. Emit the first foo(updated) after 10 seconds.

In this case, is it OK to turn on 'object reuse mode'?

1 Answer 1

10

For the pipeline you have described, yes, object reuse can be safely enabled.

Object reuse only applies to situations where data is forwarded between operator instances within the same task -- so in your case, between the source and flatmap. The keyBy forces ser/de and a network shuffle, so object reuse cannot be used between the flatmap and process function. But object reuse would probably also apply between the process function and sink (which I assume is present).

With object reuse enabled, is it NOT safe to

  • remember input object references across function calls or
  • modify input objects

If you avoid those two points, you may safely

  • modify an output object and emit it again

By the way, it would be preferable to implement your deserialization in a DeserializationSchema or KafkaDeserializationSchema, rather than in a flatmap, in which case object reuse would be irrelevant for that part of your pipeline.

Sign up to request clarification or add additional context in comments.

2 Comments

"Object reuse only applies to situations where data is forwarded between operator instances within the same task" - Could you please share the source of this in the documentation? I can't find any useful resources/documentation around this.
I'm saying that based on my personal knowledge of how the Flink runtime works. Data sent from one task to another is always serialized, and then either communicated via shared memory (if the destination is in the same JVM), or send via TCP. Object reuse is only relevant when serde isn't involved.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.