- Support for URL parsing
- Initial testing code
- Benchmarking code
- Improvements in OpenAI prompt
- Conversion of PDFs to images before parsing with OpenAI models
AUTO
parse mode
- Switch from multithreading to multiprocessing
- Support for structured parsing of HTML pages
- Support for recursive URL parsing in websites and PDFs
- URL extraction regex
- Bug in document appending logic
- Bug caused by split pdfs being in same dir as source pdf
- Improved pdfplumber parsing to format markdown and detect hyperlinks
- Support for parsing .csv, .txt, and .html, and .docx files
- Support for parsing links to documents when recursive HTML parsing
- Colab example notebook
- Support for bold and italic formatting in PDFPlumber
- Support for Llama 3.2 models through HuggingFace and Together AI
- Improved PDFPlumber table parsing
- PDFPlumber text detection bug
- Rretry and error handling for LLM_PARSE
- Remove together Python client dependency and use REST API calls instead