This is a java web crawler which crawls a URL and returns the URLs visited with all the links on that URL found. You can find here the released version of this application: http://web-crawl-env.eba-upp2ihyt.eu-west-2.elasticbeanstalk.com/
The project is build with Spring and Java 17. It uses JUnit. The project follows the Google Java Style Guide by utilising the spotless plugin. It also provides a code coverage report by using jacoco.
For CI/CD it uses CircleCI, for deployment AWS Elastic Beanstalk and the application is wrapped with Docker.
It also integrated with Snyk for security vulnerabilities.
The CI/CD includes the following steps:
- Check codestyle
- Run tests
- Create Code Coverage report and publish it to Codecov
- Deploys application to ELB
You can find the full list of tech in the Tech & Tools Documentation.
-
Clone the project on your local machine.
$ git clone https://github.com/apavlidi/WebCrawler.git -
Navigate to the project folder and install the dependencies with the following command.
$ mvn install -
Run the application locally (the application can be accessed from localhost:8080)
$ mvn spring-boot:run
You can also run the application using docker:
-
$ docker build -t app . -
$ docker run -p 8080:8080 app
You can run the tests by using $ mvn test.
You can produce code coverage report using the jacoco plugin $ mvn jacoco:report.
The code coverage report has been deployed to Codecov.
You can format the code by using the spotless plugin $ mvn spotless:apply. Spotless has been configured to use google style code.
Web-Crawl documentation is available here. The API is also exposed via OpenAPI, and it's accessible here: /v3/api-docs
Web-Crawl project kanban is available here.