This proof of concept shows the use of Spring Boot, Kafka and iText together to execute long-running tasks asynchronously with a REST API. The implemented service checks whether PDF files contain IBANs that are suspected of being used for money laundering. The implementation makes it possible to add other checks for the PDF files later.
Kafka runs with Docker Compose, which is integrated into Spring Boot. A working Docker setup must therefore be available to start the project. Java 21 and Maven are also required.
- Clone the repo
git clone https://github.com/murygin/malware-scanner.git
- Compile
./mvnw clean compile
- Run
./mvnw spring-boot:run
The blacklist with the suspicious IBANs is configured in the file src/main/resources/application.properties. The property in the file is iban.check.blacklist. IBANs are separated by commas.
iban.check.blacklist=BG18RZBB91550123456789,FO9264600123456789,GB33BUKB20201555555555The API provides an endpoint for starting the check of a PDF file and an endpoint for loading the result. If the service is started with ./mvnw spring-boot:run the base url is http://localhost:8080.
Starts the check of a PDF file. The PDFs are checked asynchronously. The result is not returned directly in the response. The response contains a confirmation of the start with the ID of the check. The response header Location contains the URL for loading the result.
Request:
{
"url": "http://localhost:9090/pdf-with-iban.pdf",
"file-type": "pdf"
}Response:
- Status:
202 Accepted - Header:
Location: /check/files/b3a5896f-387b-4363-a631-cfbf467db1ce
{
"state": "CREATED",
"results": [],
"id": "b3a5896f-387b-4363-a631-cfbf467db1ce"
}Loads the result of checking a PDF file. The PDFs are checked asynchronously. If the check has not yet been started, the status CREATED is returned. If the check is currently running, the status RUNNING is returned. When the check is completed, the status FINISHED and a result is returned.
Response:
- Status:
200 OK
{
"state": "FINISHED",
"results": [
{
"state": "SUSPICIOUS",
"name": "money-laundering",
"details": "Unique IBANs: 111, suspicious IBANs: 2"
}
],
"id": "b3a5896f-387b-4363-a631-cfbf467db1ce"
}The REST endpoint POST /check/files can be used to trigger a PDF file check. When the endpoint is called, the method create is called in the controller. The Spring Boot REST Controller o.d.m.rest.MalwareScannerController contains the methods that are executed when the endpoints are called. The controller is only a facade and passes the calls on to the o.d.m.service.CheckJobService.
If a new check is requested, the controller calls the method createCheckJob in the CheckJobService. The check is not started directly. The check is only triggered by the Kafka event. This has the advantage that the caller of the REST endpoint is not blocked and has to wait, but receives a response immediately. This method createCheckJob in CheckJobService creates a o.d.m.model.CheckJob with the status CREATED and saves it in the database. A o.d.m.model.CheckEvent is then sent to event streaming platform Kafka.
The checkEvents are consumed by the o.d.m.kafka.kafkaKafkaTopicListener. After receiving the event, the KafkaTopicListener set the status of the CheckJob to RUNNING and starts a check by calling the checkPDFFile method in the o.d.m.service.IBANCheckService.
The checkPDFFile method in the IBANCheckService loads the PDF file via the URL first and finds all IBANs in the file. Afterward it checks whether the IBANs found are in the blacklist, which contains the IBANs suspected of being used for money laundering. An instance of the o.d.m.service.IBANFinder is created to search for IBANs in the PDF file. After calling the run method, the IBANFinder collects all IBANs found in a Set. The iText library is used to check the PDF files. After the check in the IBANCheckService is completed, an o.d.m.model.CheckResultEvent is sent to Kafka. The CheckResultEvent is consumed by the KafkaTopicListener. The KafkaTopicListener takes the result of the check from the event and saves it in the CheckJob The status of the job is set to FINISHED. Now the result of the job can be loaded from the client via the REST endpoint GET /check/files/<UUID>.
- The service should only be able to be used if a client is authenticated.
- Loading arbitrary external resources during the runtime of an application is a major security risk. The URL of the PDF files that the client sends to the service must not be trusted. The URL must be checked before it is processed. Only data from selected hosts should be loaded.
- The API should be documented with Spring SpringDoc, OpenAPI and Swagger.
- Test coverage should be improved. Integration tests are to be implemented for the controller calls and the processing of Kafka events.
- A load test needs to be written to test how the system performs when many requests have to be processed simultaneously.
- Other check handlers can be added that consume Kafka events and check, for example, whether an IBAN actually exists.
- Exception handling should be improved if an invalid request body is sent to the
POST /check/filesendpoint. - Exception handling when executing file checks should be improved if errors occur during execution.
With the articles in this section you can learn more about frameworks and systems that are used in this application.
Kafka
- Apache Kafka Quickstart
- Run Kafka Streams Demo Application
- Is a Key Required as Part of Sending Messages to Kafka?
- What should I use as the key for my Kafka message?
API Design
Spring Boot
- Docker Compose Support in Spring Boot 3.1
- Getting started with Spring Boot 3, Kafka over docker with docker-compose.yaml
- Building REST services with Spring
- Spring Boot With H2 Database
- Building REST services with Spring
- Getting started with unit testing in spring boot
IBAN
- Register of countries using the IBAN standard
- IBAN Validation API V4 Documentation
- IBAN Validation and Calculation - openiban
- Global IBAN regex
Daniel Murygin - linkedin.com/in/murygin - [email protected]
Project Link: https://github.com/murygin/malware-scanner