Skip to content

Commit

Permalink
more doc cleanup CuckooFilter
Browse files Browse the repository at this point in the history
  • Loading branch information
bdupras committed Jan 24, 2016
1 parent dc876d7 commit 6e1cda5
Show file tree
Hide file tree
Showing 5 changed files with 165 additions and 109 deletions.
8 changes: 2 additions & 6 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,15 @@ Guava-Probably: TODO List
=======================================================

=1.0
* removeAll(Collection)
* removeAll(Filter)
* @throws UnsupportedOperationException - true up to java.util.Set
* Full interface tests on CuckooFilter and BloomFilter
* ?? check out MultiSet interface for semantics


=Beyond 1.0
* double-check CuckooFilter.MIN_FPP value - calculation seems wrong

==CI
* commit/push to release SNAPSHOT, major, minor, patch :: maven central && javadocs

== Features
* MultiSet interface operations (count, set counts)
* CuckooFilter impl increase max capacity (separate even/odd tables?)
* Primitive interface API (to avoid object alloc)
* Direct hash fn invocation (to avoid object alloc)
Expand Down
249 changes: 154 additions & 95 deletions src/main/java/com/duprasville/guava/probably/CuckooFilter.java
Original file line number Diff line number Diff line change
Expand Up @@ -40,92 +40,58 @@
import static com.google.common.math.DoubleMath.log2;
import static com.google.common.math.LongMath.divide;
import static java.lang.Math.ceil;
import static java.lang.Math.pow;
import static java.math.RoundingMode.CEILING;
import static java.math.RoundingMode.HALF_DOWN;

/**
* A Cuckoo filter for instances of {@code T}. A Cuckoo filter offers an approximate containment
* test with one-sided error: if it claims that an element is contained in it, this might be in
* error, but if it claims that an element is <i>not</i> contained in it, then this is definitely
* true. <p/> <p>The false positive probability ({@code FPP}) of a cuckoo filter is defined as the
* probability that {@link #contains(Object)} will erroneously return {@code true} for an object
* that has not actually been added to the {@link CuckooFilter}. <p/> <p>Cuckoo filters are
* serializable. They also support a more compact serial representation via the {@link
* #writeTo(OutputStream)} and {@link #readFrom(InputStream, Funnel)} methods. Both serialized forms
* will continue to be supported by future versions of this library. However, serial forms generated
* by newer versions of the code may not be readable by older versions of the code (e.g., a
* serialized cuckoo filter generated today may <i>not</i> be readable by a binary that was compiled
* 6 months ago). <p/> ref: <i>Cuckoo Filter: Practically Better Than Bloom</i> Bin Fan, David G.
* Andersen, Michael Kaminsky†, Michael D. Mitzenmacher‡ Carnegie Mellon University, †Intel Labs,
* ‡Harvard University https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf
* A Cuckoo filter that implements the {@link ProbabilisticFilter} interface.<p/>
*
* @param <T> the type of objects that the {@link CuckooFilter} accepts
* <blockquote>"Cuckoo filters can replace Bloom filters for approximate set membership tests.
* Cuckoo filters support adding and removing items dynamically while achieving even higher
* performance than Bloom filters. For applications that store many items and target moderately low
* false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom
* filters. Cuckoo filters outperform previous data structures that extend Bloom filters to support
* deletions substantially in both time and space." - Fan, et. al.</blockquote>
*
* Cuckoo filters offer constant time performance for the basic operations {@link #add(Object)},
* {@link #remove(Object)}, {@link #contains(Object)} and {@link #size()}. <p/>
*
* This class does not permit {@code null} elements. <p/>
*
* Cuckoo filters implement the {@link Serializable} interface. They also support a more compact
* serial representation via the {@link #writeTo(OutputStream)} and {@link #readFrom(InputStream,
* Funnel)} methods. Both serialized forms will continue to be supported by future versions of this
* library. However, serial forms generated by newer versions of the code may not be readable by
* older versions of the code (e.g., a serialized cuckoo filter generated today may <i>not</i> be
* readable by a binary that was compiled 6 months ago). <p/>
*
* <i>ref: <a href="https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf">Cuckoo Filter:
* Practically Better Than Bloom</a></i> Bin Fan, David G. Andersen, Michael Kaminsky†, Michael D.
* Mitzenmacher‡ Carnegie Mellon University, †Intel Labs, ‡Harvard University
*
* @param <E> the type of elements that this filter accepts
* @author Brian Dupras
* @author Alex Beal
* @see ProbabilisticFilter
*/
@Beta
public final class CuckooFilter<T> implements ProbabilisticFilter<T>, Serializable {
public final class CuckooFilter<E> implements ProbabilisticFilter<E>, Serializable {
static final int MAX_ENTRIES_PER_BUCKET = 8;
static final int MIN_ENTRIES_PER_BUCKET = 2;

/**
* Minimum false positive probability supported, 8.67E-19.
*/
public static double MIN_FPP = 2.0D * 8 / Math.pow(2, Long.SIZE); // 8 is max entries per bucket
public static double MIN_FPP = 2.0D * MAX_ENTRIES_PER_BUCKET / pow(2, Long.SIZE);

/**
* Maximum false positive probability supported, 0.99.
*/
public static double MAX_FPP = 0.99D;

/**
* Combines this cuckoo filter with another cuckoo filter by performing multiset sum of the
* underlying data. The mutations happen to <b>this</b> instance. Callers must ensure the cuckoo
* filters are appropriately sized to avoid saturating them. The behavior of this operation is
* undefined if the specified filter is modified while the operation is in progress.
*
* @param f The cuckoo filter to combine this cuckoo filter with. It is not mutated.
* @return {@code true} if the filters are successfully summed.
* @throws IllegalArgumentException if {@link #isCompatible(ProbabilisticFilter)}{@code == false}
*/
public boolean addAll(ProbabilisticFilter<T> f) {
checkNotNull(f);
checkArgument(this != f, "Cannot combine a " + this.getClass().getSimpleName() +
" with itself.");
checkCompatibility(f, "combine");
return this.strategy.addAll(this.table, ((CuckooFilter) f).table);
}

private void checkCompatibility(ProbabilisticFilter<T> f, String verb) {
checkArgument(f instanceof CuckooFilter, "Cannot" + verb + " a " +
this.getClass().getSimpleName() + " with a " + f.getClass().getSimpleName());
checkArgument(this.isCompatible(f), "Cannot" + verb + " incompatible filters. " +
this.getClass().getSimpleName() + " instances must have equivalent funnels; the same " +
"strategy; and the same number of buckets, entries per bucket, and bits per entry.");
}

/**
* Adds all of the elements in the specified collection to the filter. Some elements of {@code c}
* may have been added to the filter even when {@code false} is returned. In this case, the caller
* may {@link #remove(Object)} the additions by comparing the filter {@link #size()} before and
* after the invocation, knowing that additions from {@code c} occurred in {@code c}'s iteration
* order. The behavior of this operation is undefined if the specified collection is modified
* while the operation is in progress.
*
* @return {@code true} if all elements of the collection were successfully added
*/
public boolean addAll(Collection<? extends T> c) {
for (T e : c) {
if (!add(e)) {
return false;
}
}
return true;
}

public void clear() {
table.clear();
}

private final CuckooTable table;
private final Funnel<? super T> funnel;
private final Funnel<? super E> funnel;
private final Strategy strategy;
private final long capacity;
private final double fpp;
Expand All @@ -134,7 +100,7 @@ public void clear() {
* Creates a CuckooFilter.
*/
private CuckooFilter(
CuckooTable table, Funnel<? super T> funnel, Strategy strategy, long capacity, double fpp) {
CuckooTable table, Funnel<? super E> funnel, Strategy strategy, long capacity, double fpp) {
this.capacity = capacity;
this.fpp = fpp;
this.table = checkNotNull(table);
Expand All @@ -143,35 +109,59 @@ private CuckooFilter(
}

/**
* Creates a new {@link CuckooFilter} that's a copy of this instance. The new instance is equal to
* Returns a new {@link CuckooFilter} that's a copy of this instance. The new instance is equal to
* this instance but shares no mutable state.
*/
@CheckReturnValue
public CuckooFilter<T> copy() {
return new CuckooFilter<T>(table.copy(), funnel, strategy, capacity, fpp);
public CuckooFilter<E> copy() {
return new CuckooFilter<E>(table.copy(), funnel, strategy, capacity, fpp);
}

/**
* Returns {@code true} if the object <i>might</i> have been added to this Cuckoo filter, {@code
* Returns {@code true} if this cuckoo filter <i>might</i> contain the specified element, {@cod
* false} if this is <i>definitely</i> not the case.
*
* @throws NullPointerException if the specified element is null and this filter does not permit
* null elements
*/
@CheckReturnValue
public boolean contains(T e) {
public boolean contains(E e) {
checkNotNull(e);
return strategy.contains(e, funnel, table);
}

/**
* Returns {@code true} if all elements of the given collection <i>might</i> have been added to
* this Cuckoo filter, {@code false} if this is <i>definitely</i> not the case.
* Returns {@code true} if this cuckoo filter <i>might</i> contail all elements of the given
* collection, {@code false} if this is <i>definitely</i> not the case.
*
* @param c collection containing elements to be checked for containment in this filter
* @return {@code true} if this filter <i>might</i> contail all elements of the specified
* collection
* @throws NullPointerException if the specified collection contains one or more null elements, or
* if the specified collection is null
* @see #contains(Object)
*/
public boolean containsAll(Collection<? extends T> c) {
for (T o : c) {
if (!contains(o)) return false;
public boolean containsAll(Collection<? extends E> c) {
checkNotNull(c);
for (E e : c) {
checkNotNull(e);
if (!contains(e)) return false;
}
return true;
}

public boolean containsAll(ProbabilisticFilter<T> f) {
/**
* Returns {@code true} if this cuckoo filter <i>might</i> contain all elements contained in the
* specified filter, {@code false} if this is <i>definitely</i> not the case.
*
* @param f filter containing elements to be checked for probable containment in this filter
* @return {@code true} if this filter <i>might</i> contain all elements contained in the
* specified filter, {@code false} if this is <i>definitely</i> not the case
* @throws NullPointerException if the specified filter is null
* @throws IllegalArgumentException if {@link #isCompatible(ProbabilisticFilter)} {@code == false}
* given {@code f}
*/
public boolean containsAll(ProbabilisticFilter<E> f) {
checkNotNull(f);
if (this == f) {
return true;
Expand All @@ -181,19 +171,79 @@ public boolean containsAll(ProbabilisticFilter<T> f) {
}

/**
* Adds an object into this {@link CuckooFilter}. Ensures that subsequent invocations of {@link
* #contains(Object)} with the same object will always return {@code true}.
* @return {@code true} if {@code e} was successfully added to the filter, {@code false} if this
* is <i>definitely</i> not the case. A return value of {@code true} ensures that {@link
* #contains(Object)} given {@code e} will also return {@code true}.
* @throws UnsupportedOperationException if the {@link #add(Object)} operation is not supported by
* this filter
* @throws ClassCastException if the class of the specified element prevents it from
* being added to this filter
* @throws NullPointerException if the specified element is null and this filter does not
* permit null elements
* @throws IllegalArgumentException if some property of the specified element prevents it
* from being added to this filter
*/


/**
* Adds the specified element to this cuckoo filter.
*
* @return true if {@code e} has been successfully added to the filter. false if {@code e} was not
* added to the filter, as would be the case when the filter gets saturated. This may occur even
* if actualInsertions < capacity. e.g. If {@code e} has already been added 2*b times to the
* filter, a subsequent attempt will fail.
* @param e element to be added to this filter
* @return {@code true} if {@code e} was successfully added to the filter, {@code false} if this
* is <i>definitely</i> not the case, as would be the case when the filter gets saturated. This
* may occur even if {@link #size()} {@code < } {@link #capacity()}. e.g. If {@code e} has already
* been added {@code 2*b} times to the filter, a subsequent call to {@link #add(Object)} will
* return {@code false}.
*/
@CheckReturnValue
public boolean add(T e) {
public boolean add(E e) {
return strategy.add(e, funnel, table);
}

/**
* Combines {@code this} cuckoo filter with another compatible cuckoo filter by performing
* multiset sum of the underlying data. The mutations happen to {@code this} instance. Callers
* must ensure the cuckoo filters are appropriately sized to avoid running out of space. The
* behavior of this operation is undefined if the specified filter is modified while the operation
* is in progress.
*
* @param f cuckoo filter to be combined into {@code this} filter. {@code f} is not mutated.
* @return {@code true} if the operation was successful, {@code false} otherwise.
* @throws NullPointerException if the specified filter is null
* @throws IllegalArgumentException if {@link #isCompatible(ProbabilisticFilter)}{@code == false}
*/
@CheckReturnValue
public boolean addAll(ProbabilisticFilter<E> f) {
checkNotNull(f);
checkArgument(this != f, "Cannot combine a " + this.getClass().getSimpleName() +
" with itself.");
checkCompatibility(f, "combine");
return this.strategy.addAll(this.table, ((CuckooFilter) f).table);
}

/**
* Adds all of the elements in the specified collection to the filter. Some elements of {@code c}
* may have been added to the filter even when {@code false} is returned. In this case, the caller
* may {@link #remove(Object)} the additions by comparing the filter {@link #size()} before and
* after the invocation, knowing that additions from {@code c} occurred in {@code c}'s iteration
* order. The behavior of this operation is undefined if the specified collection is modified
* while the operation is in progress.
*
* @return {@code true} if all elements of the collection were successfully added
*/
public boolean addAll(Collection<? extends E> c) {
for (E e : c) {
if (!add(e)) {
return false;
}
}
return true;
}

public void clear() {
table.clear();
}

/**
* Removes {@code e} from this {@link CuckooFilter}. {@code e} must have been previously added to
* the filter. Removing an {@code e} that hasn't been added to the filter may put the filter in an
Expand All @@ -207,7 +257,7 @@ public boolean add(T e) {
* @return true if {@code e} was successfully removed from the filter.
*/
@CheckReturnValue
public boolean remove(T e) {
public boolean remove(E e) {
return strategy.remove(e, funnel, table);
}

Expand All @@ -224,16 +274,16 @@ public boolean remove(T e) {
* @return true if {@code e} was successfully removed from the filter.
*/
@CheckReturnValue
public boolean removeAll(Collection<? extends T> c) {
for (T e : c) {
public boolean removeAll(Collection<? extends E> c) {
for (E e : c) {
if (!remove(e)) {
return false;
}
}
return true;
}

public boolean removeAll(ProbabilisticFilter<T> f) {
public boolean removeAll(ProbabilisticFilter<E> f) {
checkNotNull(f);
if (this == f) {
clear();
Expand Down Expand Up @@ -290,7 +340,7 @@ public double currentFpp() {
* @param f The filter to check for compatibility.
* @return {@code true} if {@code f} is compatible with {@code this} filter.
*/
public boolean isCompatible(ProbabilisticFilter<T> f) {
public boolean isCompatible(ProbabilisticFilter<E> f) {
checkNotNull(f);

return (this != f)
Expand Down Expand Up @@ -475,11 +525,11 @@ public static <T> CuckooFilter<T> create(Funnel<? super T> funnel, long capacity
static int optimalEntriesPerBucket(double e) {
checkArgument(e > 0.0D, "e must be > 0.0");
if (e <= 0.00001) {
return 8;
return MAX_ENTRIES_PER_BUCKET;
} else if (e <= 0.002) {
return 4;
return MAX_ENTRIES_PER_BUCKET / 2;
} else {
return 2;
return MIN_ENTRIES_PER_BUCKET;
}
}

Expand Down Expand Up @@ -540,7 +590,7 @@ static long evenCeil(long n) {
}

private Object writeReplace() {
return new SerialForm<T>(this);
return new SerialForm<E>(this);
}

private static class SerialForm<T> implements Serializable {
Expand Down Expand Up @@ -703,4 +753,13 @@ public String toString() {
", size=" + size() +
'}';
}

private void checkCompatibility(ProbabilisticFilter<E> f, String verb) {
checkArgument(f instanceof CuckooFilter, "Cannot" + verb + " a " +
this.getClass().getSimpleName() + " with a " + f.getClass().getSimpleName());
checkArgument(this.isCompatible(f), "Cannot" + verb + " incompatible filters. " +
this.getClass().getSimpleName() + " instances must have equivalent funnels; the same " +
"strategy; and the same number of buckets, entries per bucket, and bits per entry.");
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ public boolean contains(T e) {

/**
* Puts an object into the underlying {@code com.google.common.hash.BloomFilter}. Ensures that
* subsequent invocations of {@link #contains(T)} with the same object will always return {@code
* subsequent invocations of {@link #contains(Object)} with the same object will always return {@code
* true}.
*
* @return true if the bloom filter's bits changed as a result of this operation. If the bits
Expand Down
Loading

0 comments on commit 6e1cda5

Please sign in to comment.