A streamlit app for Document segmentation into different sections. The web-app also performs OCR on the text-based annotated sections, and image-analysis on the image sections in each page of the document. OCR is performed on the text using the tesseract-ocr package. For image analysis, we have used llama-3.2-11b-vision model.
This app uses the YOLOv10x model for document segmentation to annotate various sections of a document such as text-fields, formulae, pictures, list-items,etc. The model uses pretrained weights which may be dowloaded using this colab notebook.
Link for deployed web-application using streamlit.