-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
redmine_full_text_searchで使用させて頂いているのですが、一部のファイルから文字が取り出せていません。
.doc , .docxはハングして応答なし、pdf(iTextSharpで作成)は抽出文字数が0になってしまいます。
Webアクセス方式で試しても同様でした。
環境
Windows 10 pro
Docker desktop 2.1.0.5
Chrome , Postman
ログと検証ファイルを添付します。
テスト.zip(.doc)
chupa-text_1 | I, [2019-11-20T00:41:17.398233 #884] INFO -- : [adfdc7bf-d67a-44b3-a231-54a0d7c90c46] Started POST "/extraction" for 172.18.0.1 at 2019-11-20 00:41:17 +0000
chupa-text_1 | I, [2019-11-20T00:41:17.398717 #884] INFO -- : [adfdc7bf-d67a-44b3-a231-54a0d7c90c46] Processing by ExtractionsController#create as HTML
chupa-text_1 | I, [2019-11-20T00:41:17.398791 #884] INFO -- : [adfdc7bf-d67a-44b3-a231-54a0d7c90c46] Parameters: {"utf8"=>"?", "authenticity_token"=>"YwYVY5tO8ctUX8VDNRUUI0lX0bsARlic2G1VXP3b5p/XYHXGM7G86NfmSSLlqFqR2dDluXRniCmAd4/Lupyf2w==", "extraction"=>{"data"=>#<ActionDispatch::Http::UploadedFile:0x0000561c562a78f8 @tempfile=#<Tempfile:/tmp/RackMultipart20191120-884-1xdugq7.doc>, @original_filename="テスト.doc", @content_type="application/octet-stream", @headers="Content-Disposition: form-data; name=\"extraction[data]\"; filename=\"\xE3\x83\x86\xE3\x82\xB9\xE3\x83\x88.doc\"\r\nContent-Type: application/octet-stream\r\n">, "uri"=>""}, "commit"=>"Extract"}
chupa-text_1 | D, [2019-11-20T00:41:17.399542 #884] DEBUG -- : [adfdc7bf-d67a-44b3-a231-54a0d7c90c46] [extractor][extract][target] <file:///home/chupa-text/chupa-text-http-server/%E3%83%86%E3%82%B9%E3%83%88.doc>:<application/octet-stream>
chupa-text_1 | D, [2019-11-20T00:41:17.399666 #884] DEBUG -- : [adfdc7bf-d67a-44b3-a231-54a0d7c90c46] [extractor][extract][decomposer] ChupaText::Decomposers::AbiWord
chupa-text_1 | I, [2019-11-20T00:42:34.819710 #1029] INFO -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] Started POST "/extraction" for 172.18.0.1 at 2019-11-20 00:42:34 +0000
chupa-text_1 | I, [2019-11-20T00:42:34.820171 #1029] INFO -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] Processing by ExtractionsController#create as HTML
chupa-text_1 | I, [2019-11-20T00:42:34.820254 #1029] INFO -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] Parameters: {"utf8"=>"?", "authenticity_token"=>"rXDDP1fSuMEpE/7S2jLGPoAKGmOFb/cGl4Lh3SCgSmoZFqOa/y314qqqcrMKj4iMEI0uYfFOJ7PPmDtKZ+czLg==", "extraction"=>{"data"=>#<ActionDispatch::Http::UploadedFile:0x0000561c55757de0 @tempfile=#<Tempfile:/tmp/RackMultipart20191120-1029-1sealwx.docx>, @original_filename="テスト.docx", @content_type="application/octet-stream", @headers="Content-Disposition: form-data; name=\"extraction[data]\"; filename=\"\xE3\x83\x86\xE3\x82\xB9\xE3\x83\x88.docx\"\r\nContent-Type: application/octet-stream\r\n">, "uri"=>""}, "commit"=>"Extract"}
chupa-text_1 | D, [2019-11-20T00:42:34.821392 #1029] DEBUG -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] [decomposer][libreoffice][word][command][found] libreoffice
chupa-text_1 | D, [2019-11-20T00:42:34.821526 #1029] DEBUG -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] [decomposer][libreoffice][powerpoint][command][found] libreoffice
chupa-text_1 | D, [2019-11-20T00:42:34.821656 #1029] DEBUG -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] [decomposer][libreoffice][excel][command][found] libreoffice
chupa-text_1 | D, [2019-11-20T00:42:34.821804 #1029] DEBUG -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] [decomposer][abiword][command][found] abiword
chupa-text_1 | D, [2019-11-20T00:42:34.823010 #1029] DEBUG -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] [extractor][extract][target] <file:///home/chupa-text/chupa-text-http-server/%E3%83%86%E3%82%B9%E3%83%88.docx>:<application/octet-stream>
chupa-text_1 | D, [2019-11-20T00:42:34.823188 #1029] DEBUG -- : [56d5d011-962f-4a61-9dfa-1cc5133c9711] [extractor][extract][decomposer] ChupaText::Decomposers::AbiWord
テスト1.pdf
iTextSharp作成pdf
Metadata
Metadata
Assignees
Labels
No labels