Skip to content

Text Extraction API for SilverStripe CMS (mostly used with 'fulltextsearch' module)

License

Notifications You must be signed in to change notification settings

creative-commoners/silverstripe-textextraction

This branch is 1 commit ahead of, 20 commits behind silverstripe/silverstripe-textextraction:4.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

d736e5b · Jan 30, 2024
Jan 30, 2024
Aug 25, 2015
Jul 3, 2018
Mar 6, 2022
Nov 28, 2022
Oct 27, 2021
Nov 19, 2015
Jul 2, 2018
Nov 7, 2015
Aug 29, 2023
Apr 19, 2023
Nov 21, 2015
Jul 2, 2018
Jan 30, 2024
Jan 20, 2021
Jan 30, 2024
Aug 1, 2022

Repository files navigation

Text extraction module

CI Silverstripe supported module

Provides a text extraction API for file content, that can hook into different extractor engines based on availability and the parsed file format. The output returned is always a string of the file content.

Via the FileTextExtractable extension, this logic can be used to cache the extracted content on a DataObject subclass (usually File).

The module supports text extraction on the following file formats:

  • HTML (built-in)
  • PDF (with XPDF or Solr)
  • Microsoft Word, Excel, Powerpoint (Solr)
  • OpenOffice (Solr)
  • CSV (Solr)
  • RTF (Solr)
  • EPub (Solr)
  • Many others (Tika)

Installation

composer require silverstripe/textextraction

Documentation

Bugtracker

Bugs are tracked in the issues section of this repository. Before submitting an issue please read over existing issues to ensure yours is unique.

If the issue does look like a new bug:

  • Create a new issue
  • Describe the steps required to reproduce your issue, and the expected outcome. Unit tests, screenshots and screencasts can help here.
  • Describe your environment as detailed as possible: Silverstripe version, Browser, PHP version, Operating System, any installed Silverstripe modules.

Please report security issues to security@silverstripe.org directly. Please don't file security issues in the bugtracker.

Development and contribution

If you would like to make contributions to the module please ensure you raise a pull request and discuss with the module maintainers.

About

Text Extraction API for SilverStripe CMS (mostly used with 'fulltextsearch' module)

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 98.7%
  • Shell 1.3%