-
-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: Spaces make a new line #1171
base: master
Are you sure you want to change the base?
Conversation
only for when we're testing thumbnail generation on the command line
Hi. I cant raise an issue so im posting here. I encounter several issues in parsing lattice and stream tables. is it possible to add param like row or column tolerate like in camelot ? thanks |
Whatever Keycloak is, it's running on port 8080, blocking Tabula from using
that port. Quit Keycloak, then restart Tabula, then Tabula should work.
Jeremy B. Merrill
Sent from my mobile device
…On Fri, Aug 13, 2021, 8:13 PM Nathan Harris ***@***.***> wrote:
Hi I cant raise an issue so I am posting here
Tabula opens to keycloak (I don't even know that is)
[image: image]
<https://user-images.githubusercontent.com/16217396/129428482-15fa5c1b-5c16-4bf1-aae3-27d620a6e24d.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1171 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEF3GXSMXWJ6KFCR4OHHVLT4WYLNANCNFSM4TL3EPNA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Hi guys, sorry for posting this here but there is nowhere else to do it. Maybe you should enable discussions on this repo, so that tabula users can help each other. My post here is about line breaks inside table cells, like this (this is a screen of Tabula output): Tabula will insert those line breaks in the CSV but that would create chaos when importing to excel as it would create a new line for every line break, break text string indications (between quotes), etc... Maybe there is a use case to keep those line breaks inside the cell but a "remove line breaks inside cells" flag would be a great feature. Meanwhile, for my fellow users who are battling this problem, after much searching I've found a solution that would work with both Notepad++ and Sublime. A regex find and replace (taken for a very helpful post at StackOverflow): _Use Notepad++ regex Find-and-Replace: Find what:_
Replace with:
(_There is a single space after $1) Repeatedly click "Replace All" until no more matches are found._ This works. |
In Excel they are entered as . |
Hey! I can't raise an issue, so am writing here. Could you please tell, what to do with "Java Heap Error", if my pdf is only 50 KB and has only 1 page? I've already tried multiple times to reinstall both Java and Tabula, restarted the whole computer, cleared the "Local/Temp" folder, cleared ports and changed 8080 to 9999, but nothing happens. OS: Windows 10 The pfd is 100% valid — I have opened it this morning without any trouble but then suddenly it just stopped working. Since then it returns "Sorry, your file upload could not be processed. Please double-check that the file you uploaded is a valid PDF file and try again" — and I cannot see any of my previous files, it just shows "First time using Tabula? Welcome!" I will be very grateful for any help 🙏 The error is the following: |
` Howdy. I don't have any affiliation with this project. I just follow it on github. Can you share your pdf? I can try it out on my computer and see if I can figure out what is going on. |
Thank you! Here |
So I was able to extract the tables. Here is the result: I'm running on a mac. It's weird but I cannot figure out what version of Java I have installed. Like java --version Says it cannot find it, but it's got to be installed somewhere because Tabula is running :). I don't want to muck around with Java and end up breaking it. Hopefully this extraction works for you. |
Thank you so much for your help! Wish you all the best things in the world and all the blessings! ☀️ |
Great tool and thank you for your great job! |
Great tool! We use it to parse PDF files that appear to be the same format to us humans, but drive Tabula nuts in either stream or matrix mode. Culprits include formatting characters used by the PDF creators (different people I assume). We find spaces, tabs and other un-printable characters in the CSV output. The files themselves always present as header, trailer and intermediate details with page headers and trailer in the details. May I suggest that since we humans recognize the format, allowing us to specify vertical columns (groups) and horizontal rows (elements) might remove reliance on recognizing said groups and elements. Were I a seasoned open-source developer who had time on his hands (I'm neither), I'd look at the code and see where / how this might work. |
For some documents that contain spaces and text in the first row, the CSV file contains a new line instead of space. For example, if I had the first row with a, "Hello world", and b, use Tabula to capture the first row, I have the CSV file as:
I expect the CSV file to be: