Skip to main content

Batch processing with Iterator pattern

 Greetings!

While peeping on a pull request I came across this 60+ lines long complex looking code. Intention of the code was to fetch data from the legacy database as batches.

Source code - Iterator Pattern 

In brief, idea was like this,


while there_are_more_records
  fetch next batch
  do any processing
  handle exceptions
  calculate next batches
  calculate is there another iteration


Though lines count has been reduced by creating private methods later, this has multpiple problems.
  • Doing multiple things in this method.
  • Looks too complex.
  • Difficult to unit tests.
  • Unit tests will not be that much valid.
  • Hard to maintain.
  • Error prone.
  • Cannot resuse the logics.
Unfortunately, people think doing this kind of complex codes are the good work. A better code will treated like an easy work.
Anyway, we can do better with an Iterator. As Java's collection framework uses this heavily, most of the time, people don't use this pattern as they don't know they can use Iterators in their own.

Meet the Iterator

(from wikipedia)

In object-oriented programming, the iterator pattern is a design pattern in which an iterator is used to traverse a container and access the container's elements. The iterator pattern decouples algorithms from containers; in some cases, algorithms are necessarily container-specific and thus cannot be decoupled.

In short, we are delegating the iteration logic into it's own class. Main class will not know how to traverse trough the data set, and it doesn't have to!

Iterator provides the next element in data set as well as it provides a method to find whether there are more data to process.

Now, you would think how does this help in this case. I'm just going to borrow the idea of the iterator and organize logics into its own.



While i'm not able to provide my orginal code (due to policies), i'l build an example by passing ids to fetch data.

package com.slmanju.iterator;

import java.util.List;

public interface BatchIterator<T> {

  boolean hasNext();

  List<T> nextBatch();

}

package com.slmanju.iterator;

import java.util.List;

public class IdIterator<T> implements BatchIterator<T> {

  private static final int BATCH_LIMIT = 5;
  private List<T> ids;
  private int total;
  private int currentIndex;

  public IdIterator(List<T> ids) {
    this.ids = ids;
    this.total = ids.size();
  }

  @Override
  public boolean hasNext() {
    return currentIndex < total;
  }

  @Override
  public List<T> nextBatch() {
    int next = Math.min((currentIndex + BATCH_LIMIT), total);
    List<T> idSlice = ids.subList(currentIndex, next);
    currentIndex = next;
    return idSlice;
  }

}

package com.slmanju.iterator;

public interface Batchable<T> {

  BatchIterator<T> batchIterator();

}

package com.slmanju.iterator;

import java.util.List;

public class BatchId<T> implements Batchable<T> {

  private List<T> ids;

  public BatchId(List<T> ids) {
    this.ids = ids;
  }

  @Override
  public BatchIterator<T> batchIterator() {
    return new IdIterator<>(ids);
  }

}

package com.slmanju.iterator;

import java.util.List;

@FunctionalInterface
public interface BatchFunction<T, R> {

  List<R> apply(List<T> nextBatch);

}

package com.slmanju.iterator;

import java.util.List;

public interface BatchFetcher<T, R> {

  List<R> fetchAll(BatchFunction<T, R> batchFunction);

}

package com.slmanju.iterator;

import java.util.ArrayList;
import java.util.List;

public class IdBatchFetcher<T, R> implements BatchFetcher<T, R> {

  private BatchId<T> batchId;

  public IdBatchFetcher(List<T> ids) {
    this.batchId = new BatchId<>(ids);
  }

  @Override
  public List<R> fetchAll(BatchFunction<T, R> batchFunction) {
    List<R> result = new ArrayList<>();
    BatchIterator<T> batchIterator = batchId.batchIterator();

    while (batchIterator.hasNext()) {
      List<T> nextBatch = batchIterator.nextBatch();
      List<R> nextBatchResult = batchFunction.apply(nextBatch);
      result.addAll(nextBatchResult);
    }

    return result;
  }

}

package com.slmanju.iterator;

import java.util.List;
import java.util.stream.IntStream;

import static java.util.stream.Collectors.toList;

public class IteratorApplication {

  public static void main(String[] args) {
    List<Pet> pets = IntStream.range(0, 25).mapToObj(value -> new Pet(value, "Pet-".concat(String.valueOf(value)))).collect(toList());
    List<Integer> ids = IntStream.range(3, 21).boxed().collect(toList());

    BatchFetcher<Integer, Pet> petBatchFetcher = new IdBatchFetcher<>(ids);

    List<Pet> collectedPets = petBatchFetcher.fetchAll(nextBatch -> {
      System.out.println("fetching batch from data store " + nextBatch);
      return pets.stream().filter(pet -> nextBatch.contains(pet.id)).collect(toList());
    });

    collectedPets.forEach(pet -> System.out.println(pet.name));
  }

  private static class Pet {
    public Integer id;
    public String name;

    public Pet(Integer id, String name) {
      this.id = id;
      this.name = name;
    }
  }

}


Source code - Iterator Pattern 

Happy coding :)