This Java library is designed to identify and remove duplicates from collections of images. It can help you efficiently manage your photo gallery or image stock. The library also provides robust extensibility options, allowing you to implement custom image comparison algorithms, advanced caching solutions, and more.
In addition, it provides convenient tools to manage and organize your files, making it a versatile solution for various image management needs.
The application requires:
-
Java: 21 (requires virtual threads, introduced in JDK 21 | Java 21 Virtual Threads)
-
Maven (building from source)
-
Git (building from source)
-
Supported picture types:
- all that are supported by
ImageIO
class - and all that are supported by:
TwelveMonkeys
(used packages:bmp
,tiff
,jpeg
,core
).
- all that are supported by
-
Runtime dependencies:
- SLF4J: 2.0.13
- logback: 1.5.6
- TwelveMonkeys: 3.11.0
- JTransforms: 3.1
- Caffeine: 3.1.8
-
Testing dependencies:
- JUnit: 5.8.2
- Mockito: 5.11.0
- Byte-Buddy: 1.14.10
Easy way:
To add this library to your project, use jitpack.io:
- Add repository JitPack to your project:
Maven (
pom.xml
)
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
- Add the dependency:
Maven (
pom.xml
)
<dependency>
<groupId>com.github.maksik997</groupId>
<artifactId>PictureComparer</artifactId>
<version>[Version]</version>
</dependency>
Replace [Version]
with valid version (e.g., 0.7.0
)
Note: In the future, this library will likely be published to the Maven Central Repository. For now, JitPack is the recommended way to integrate it into your project.
Building from source:
- Clone git repository:
git clone https://github.com/maksik997/PictureComparer.git
cd PictureComparer
- Build project using Maven:
mvn clean package
The built .jar
file will be available in the target
directory.
To ensure that AdaptiveCache
, which leverages the Caffeine caching library,
can effectively manage memory, it's important to allocate enough memory to the JVM heap.
This section provides guidelines on how to adjust the JVM memory settings.
The JVM heap size determines how much memory Java is allowed to use for objects and data structures,
including cache. By allocating sufficient heap memory, you can ensure that the
AdaptiveCache
has enough room to function efficiently.
To control the maximum heap size, you can use the following JVM options when starting your Java application:
-Xmx
- Sets the maximum heap size.-Xms
- Sets the initial heap size.
For example, to allocate a maximum of 2 GB of heap space, use
java -Xmx512m -Xmx2g -jar your-application.jar
This command sets the initial heap size to 512 MB (-Xms512m
) and the maximum heap size to 2 GB (-Xmx2g
).
The AdaptiveCache
utilize up to 60% of the available JVM heap for caching,
you should calculate the desired cache size based on the -Xmx
value you set.
For example, if you set -Xmx2g
(2 GB) you can allocate up to 60% of that for the cache:
60% of 2GB = 1.2 GB.
Once you've configured the cache size, it's important to monitor the performance of the cache and the JVM memory usage. You can track memory usage with tools like:
- JVM Monitoring Tools: Tools such as VisualVM or JConsole allow you to monitor memory usage and garbage collection within the JVM.
- Caffeine's Cache Metrics: Caffeine provides internal metrics that you can use to observe cache hits, misses, and eviction rates.
Those metrics are available via
AdaptiveCache.getInstance().monitor(1)
method in theAdaptiveCache
class.
- For optimal performance, aim to fit all of your data into memory. If this isn't feasible, monitor cache evictions and adjust the heap size or cache policy accordingly.
- Keep the cache as big as possible.
The library uses SLF4J for logging, which allows you to configure different logging frameworks to suit your needs. To get started with logging, follow the instructions below:
You can use different logging frameworks. Some popular options are:
- Logback (implemented natively, thus configuration is required).
- Log4j2
- java.util.logging
To use specific framework, include its corresponding dependency in your project.
Each logging framework requires a configuration file.
Create a file name logback.xml
in your classpath (typically in the src/main/resources
directory)
with the following content:
<configuration>
<appender name="Console" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss} - %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="info">
<appender-ref ref="Console" />
</root>
</configuration>
This configuration will log messages to the console, with a pattern that includes the timestamp, log level, logger name, and message.
If you prefer to use Log4j2, create log4j2.xml
configuration file in your classpath with a
similar structure:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss} - %-5level %logger{36} - %msg%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
Usage of this library is quiet straightforward.
For starters, you should attach the .jar
file to your project (or use e.g., jitpack.io).
And then:
- Create instance of
FileOperator
andProcessor
classes.
import pl.magzik.Processor;
import pl.magzik.algorithms.PerceptualHash;
import pl.magzik.algorithms.PixelByPixel;
import pl.magzik.grouping.CRC32Grouper;
import pl.magzik.io.FileOperator;
import pl.magzik.predicates.ImageFilePredicate;
import java.util.List;
public class MyClass {
public FileOperator fileOperator;
public Processor processor;
public MyClass() { // Example class
fileOperator = new FileOperator(new ImageFilePredicate(), Integer.MAX_VALUE);
processor = new Processor(
new CRC32Grouper(),
List.of(new PerceptualHash(), new PixelByPixel())
);
}
}
- Prepare your image collection:
import java.io.File;
import java.util.List;
public static void main(String[] args) {
MyClass mc = new MyClass();
List<File> files = List.of(new File("path/to/my/gallery"));
files = mc.fileOperator.load(files);
}
- And process you image collection using
Processor
class:
public static void main(String[] args) {
// ...
Map<File, Set<File>> duplicates = mc.processor.process(files);
}
Visit the GitHub Wiki for detailed documentation on installation, configuration, and usage examples.
- Please report any bugs you encounter.
- If you have ideas for enhancements, please let me know. I welcome suggestions to improve this library.
We welcome contributions! If you'd like to contribute, please fork the repository, create a new branch, and submit a pull request with your changes.
This project is licensed under the MIT License. See the LICENSE file for details.