An empty partition can block data ingestion from Kafka/Kinesis #7600

jihoonson · 2019-05-06T19:25:11Z

Affected Version

All version since 0.10

Description

When the supervisor is restarted or reset, it first finds the valid offsets to continue reading from metadata storage. If there's no offsets in metadata storage, then it fetches the latest/earliest offset from Kafka/Kinesis per partition. It's implemented here.

  private ImmutableMap<PartitionIdType, OrderedSequenceNumber<SequenceOffsetType>> generateStartingSequencesForPartitionGroup(
      int groupId
  )
  {
    ImmutableMap.Builder<PartitionIdType, OrderedSequenceNumber<SequenceOffsetType>> builder = ImmutableMap.builder();
    for (Entry<PartitionIdType, SequenceOffsetType> entry : partitionGroups.get(groupId).entrySet()) {
      PartitionIdType partition = entry.getKey();
      SequenceOffsetType sequence = entry.getValue();

      if (!getNotSetMarker().equals(sequence)) {
        // if we are given a startingOffset (set by a previous task group which is pending completion) then use it
        if (!isEndOfShard(sequence)) {
          builder.put(partition, makeSequenceNumber(sequence, useExclusiveStartSequenceNumberForNonFirstSequence()));
        }
      } else {
        // if we don't have a startingOffset (first run or we had some previous failures and reset the sequences) then
        // get the sequence from metadata storage (if available) or Kafka/Kinesis (otherwise)
        OrderedSequenceNumber<SequenceOffsetType> offsetFromStorage = getOffsetFromStorageForPartition(partition);

        if (offsetFromStorage != null) {
          builder.put(partition, offsetFromStorage);
        }
      }
    }
    return builder.build();
  }

If the supervisor sees an exception while fetching the offset from Kafka/Kinesis for some reason (e.g., timeout because of an empty partition), then this method would return immediately and handling the runNotice would fail because of the exception. This may block the entire data ingestion until any data is ingested to all partitions.

I think the supervisor should be able to skip some partitions if it sees some error while getting the offset and continue ingestion for other partitions.

shivtools · 2019-05-13T06:17:13Z

Hey @jihoonson, thanks for opening creating this issue! I've put up #7648 - let me know if this is what you had in mind.

jihoonson added Bug Area - Streaming Ingestion labels May 6, 2019

shivtools mentioned this issue May 13, 2019

Handle exceptions while fetching offsets #7648

Closed

jon-wei mentioned this issue Jan 6, 2020

Kinesis Indexing throwing exception #9133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An empty partition can block data ingestion from Kafka/Kinesis #7600

An empty partition can block data ingestion from Kafka/Kinesis #7600

jihoonson commented May 6, 2019

shivtools commented May 13, 2019 •

edited

Loading

An empty partition can block data ingestion from Kafka/Kinesis #7600

An empty partition can block data ingestion from Kafka/Kinesis #7600

Comments

jihoonson commented May 6, 2019

Affected Version

Description

shivtools commented May 13, 2019 • edited Loading

shivtools commented May 13, 2019 •

edited

Loading