Apache StormCrawler 3.1.0 (Incubating)
Disclaimer
Apache StormCrawler is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
Release Summary
This is our 2nd release after joining the ASF incubator as a poddling. It contains the new playwright module, which can be used for scraping dynamic content.
What's Changed
- send email if CI build fails by @pjfanning in #1217
- Fixes #1214 - "Update Release Docs with Feedback from 3.0 RC2 Vote" by @rzo1 in #1218
- Fix #1223 - Remove declareOutputFields from Solr StatusUpdaterBolt by @mvolikas in #1224
- Apache StormCrawler 3.0 (Incubating) by @rzo1 in #1225
- Fix #1226 "Add FileSpout TestCase for Custom Metadata Injections" by @rzo1 in #1227
- 1024 Playwright protocol implementation, fixes #1024 by @jnioche in #1228
- Fix #1230: Set sitemap key before outlink processing by @mvolikas in #1231
- #1220 - Add disclaimer for binary test artifacts by @rzo1 in #1234
- #1221 - Switch Source to tar.gz by @rzo1 in #1233
- #1215 - Update RAT exclusions. Fixes licenses by @rzo1 in #1235
- #1236 - Fix Typos in StormCrawler by @rzo1 in #1237
- #1222 - Fix Release Docs by @rzo1 in #1232
- #1238 - Avoid use of star imports by @rzo1 in #1239
- Fix #1244 "Migrate to JUnit 5" by @rzo1 in #1245
- Fix #1216 - Add RAT Exclusion File for standalone RAT by @rzo1 in #1243
- #1248 - Use pre-compiled patterns for mime type matching in TikaParser by @rzo1 in #1249
- #1251 - Update to Storm 2.6.3 by @rzo1 in #1252
- #626: Add routing field in metadata - Solr StatusUpdaterBolt by @mvolikas in #1242
- #851 Merge branch 851 into main by @mvolikas in #1256
- #1259 - Enable Dependabot by @rzo1 in #1260
- #1261 - Automatically generate THIRD-PARTY.txt via GitHub Action by @rzo1 in #1262
- #1257 - Update to Storm 2.6.4 by @rzo1 in #1258
- #1162 - Replace Coveralls with JaCoCo by @sigee in #1255
- Bump testcontainers.version from 1.19.7 to 1.20.1 by @dependabot in #1277
- Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.5.0 to 3.10.0 by @dependabot in #1267
- Bump actions/setup-java from 3 to 4 by @dependabot in #1264
- Bump actions/checkout from 3 to 4 by @dependabot in #1265
- Bump org.jsoup:jsoup from 1.17.2 to 1.18.1 by @dependabot in #1271
- Regenerated License file after dependency upgrades by @github-actions in #1280
- Bump tika.version from 2.9.1 to 2.9.2 by @dependabot in #1269
- Bump com.ibm.icu:icu4j from 74.2 to 75.1 by @dependabot in #1272
- Bump org.apache.maven.plugins:maven-enforcer-plugin from 3.4.1 to 3.5.0 by @dependabot in #1289
- Bump org.apache.maven.plugins:maven-jar-plugin from 3.3.0 to 3.4.2 by @dependabot in #1288
- Bump org.apache.maven.plugins:maven-compiler-plugin from 3.11.0 to 3.13.0 by @dependabot in #1285
- Bump org.apache.rat:apache-rat-plugin from 0.15 to 0.16.1 by @dependabot in #1283
- Bump org.apache:apache from 31 to 33 by @dependabot in #1275
- Bump junit.version from 5.10.2 to 5.11.0 by @dependabot in #1278
- Bump org.apache.solr:solr-solrj from 9.5.0 to 9.6.1 by @dependabot in #1281
- Bump org.apache.maven.archetype:archetype-packaging from 2.4 to 3.2.1 by @dependabot in #1287
- Bump org.mockito:mockito-core from 5.10.0 to 5.13.0 by @dependabot in #1279
- Bump com.microsoft.playwright:playwright from 1.43.0 to 1.46.0 by @dependabot in #1268
- Bump selenium.version from 4.18.1 to 4.24.0 by @dependabot in #1266
- Bump log4j2.version from 2.23.0 to 2.24.0 by @dependabot in #1284
- Regenerated License file after dependency upgrades by @github-actions in #1282
- Fix #1290 "Add close/cleanup method to ParseFilters" by @rzo1 in #1291
- Bump opensearch.version from 2.12.0 to 2.16.0 by @dependabot in #1276
- Regenerated License file after dependency upgrades by @github-actions in #1292
- Aligned version of OpenSearch in test with recent upgrade to 2.16 by @jnioche in #1293
- Bump actions/cache from 3 to 4 by @dependabot in #1263
- Revert "Bump log4j2.version from 2.23.0 to 2.24.0" by @rzo1 in #1294
- #1295 - Add workflow to publish SNAPSHOTS to repository.a.o by @rzo1 in #1296
- Regenerated License file after dependency upgrades by @github-actions in #1297
New Contributors
- @sigee made their first contribution in #1255
- @github-actions made their first contribution in #1280
Full Changelog: stormcrawler-3.0...stormcrawler-3.1.0