-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to test run and train #1
Comments
i think it's a Tika parser problem. I did not want to use Tika because of the need to interface with Java but sadly other methods require a lot of dependencies. I think you can try restarting your Tika server or maybe upgrade your Python to 3.7. Also, Tika requires the Internet (unfortunately) so it is possible you might have not connected to Apache Tika. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Getting following error during testing and training with pdf files
python3 main.py --type fixed "./src/data/test/Dong Xing_Catherine Zhang_Equity Research Intern.pdf" --model_name model
Loading nlp tools...
Loading pdf parser...
2019-06-13 12:32:38,162 [MainThread ] [WARNI] Tika server returned status: 500
Traceback (most recent call last):
File "main.py", line 101, in
r.test(path_to_resume, infoExtractor)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/model.py", line 568, in test
doc, _ = loadDocumentIntoSpacy(filename, self.parser, self.nlp)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 162, in loadDocumentIntoSpacy
new_text = getPDFText(f, parser)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 144, in getPDFText
raw = parser.from_file(filename)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 40, in from_file
return _parse(jsonOutput)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 77, in _parse
realJson = json.loads(jsonOutput[1])
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The text was updated successfully, but these errors were encountered: