Skip to content

A Java image processing library for perceptual hashing and near-duplicate image detection research.

License

Notifications You must be signed in to change notification settings

apaz-cli/Image-Hashing-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image-Hashing-Tools

A general purpose framework for finding near-duplicate images, which provides an image library for fast pixel comparisons, along with extensible implementations and tools for image hashing and attacking image hashes, including simulating jpg/jpeg compression.

Supported Colorspaces (Entire project is WIP, finished items will have a ✓)

Greyscale ✓

RGB (Red, Green, Blue) ✓

RGBA (RGB with alpha/transparency channel) ✓

YCbCr (Luminance, Chrominance toward blue, Chrominance toward red)

Supported Hash Algorithms

Average Hash (aHash) ✓

Difference Hash (dHash) ✓

Perceptual Hash (pHash) ✓

Block Mean Value Hash (blockHash)

RGB Histogram Hash

Machine-Learned Hash (WIP, will require external dependencies.)

All of these different hashing algorithms are going to have their own unique tradeoffs in terms of computation time, robustness, and fitness for the purpose of identifying different sorts of images.

Average Hash is extremely fast, and the hashes are small, but it's not particularly robust.

Difference Hash is only slightly slower than aHash, and still not extremely robust, but generates much fewer false positives.

RGB Histogram Hash is perfectly robust against flips, rotations, resizing, and some other transforms, which makes it stand out. Many other algorithms can't do that. However the hashes take up a lot of space, and it fails completely against any sort of recoloring.

PHash, for example, has been proven to be extremely robust for real photographs, but takes longer to complete than other hashes and may be less exact for the hard pixel borders in digital illustrations.

I suggest that you learn more about these algorithms, and choose the one that's best for your use case. Papers are cited down below.

Soon I'm going to begin work on a machine learning model-based hash. The idea is that, at the same time, the model learns both how to compress and decompress images to/from a very small latent space, and make sure that said latent space when interpreted as a vector is very close to other similar images in Euclidean space. I'll post updates as work is completed.

Supported Operations

JPEG Compression Simulation

Flip Vertical/Horizontal ✓

Random Noise ✓

Gaussian Noise ✓

Subimage Insertion ✓

Gaussian Blur ✓

Sharpen

Laplace of Gaussian Edge Detection

Example Usage

Create images and check if they match robustly:

IImage<?> img1 = null, img2 = null;
		URL imageURL = null;
		try {
			// Download image from file/url.
			// Try copy/pasting all sorts of image files/links here.
			img1 = new RGBImage(new URL("https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png"));
			img2 = img1
					// 1.1x width, 1.2x height
					.rescaleBilinear(1.1f, 1.2f)
					// Kernel side length, blur intensity
					.convolveWith(KernelFactory.gaussianBlurKernel(7, 5f))
					// Mean, Standard Deviation
					.apply(new GaussianNoiseAttack(3f, 7f)).toGreyscale();

		} catch (IOException e) {
			System.err.println("Failed to load the image.");
			e.printStackTrace();
			System.exit(1);
		}

		IHashAlgorithm h = new PerceptualHash();
		boolean matches = h.matches(img1, img2);

		// The images match, even though we did all the things above to one of them.
		System.out.println(matches ? "MATCH" : "FAILED TO MATCH");

		// Display the images
		ImageUtils.showImage(img1);
		ImageUtils.showImage(img2);

Inspiration, Citations, And Cool Papers to Read

RGBHistogramHash "Image Hashing Based on Color Histogram" by Bian Yang, Fan Gu and Xiamu Niu

BlockHash "Block Mean Value Based Image Perceptual Hashing by Bian Yang, Fan Gu and Xiamu Niu"

AHash/PHash: Hacker Factor, PHash.org

DHash: Hacker Factor

About

A Java image processing library for perceptual hashing and near-duplicate image detection research.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages