You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This was tested with python's multiprocessing library.
I checked the code, everything hangs at the line prior to Goose().extract(...)
not at join() or run(), etc.
This example is specific but the problem is general. Has anyone gotten multiprocessing to work with
goose's extraction period? If so a brief explanation of how or a tiny code snippet sample would be great thanks.
Or at least an explanation of why my code blow is incorrect would be great.!
Sorry if i'm doing something blatantly wrong. This is my first time
doing multiprocessing in python.
from multiprocessing import Process, Queue
from multiprocessing import cpu_count as num_cores
import pickle
import codecs
from goose import Goose
class Processor(Process):
def __init__(self, queue, html):
super(Processor, self).__init__()
self.queue = queue
self.html = html
def ret(self):
g = Goose()
article = g.extract(raw_html=self.html) # THE CODE HANGS HERE
pickle.dump(article, codecs.open(str(id(article))+'.txt', 'wb'))
return str(id(article))
def run(self):
self.queue.put(self.ret())
processes = []
if __name__ == '__main__':
for i in range(0, num_cores()):
q = Queue()
html = ...
p = Processor(q, html)
processes.append((p, q))
print 'appending', (p, q)
p.start()
for val in processes:
val[0].join()
id_ = val[1].get()
article = pickle.load( codecs.open(id_+'.txt', 'rb'))
print article.cleaned_text
The text was updated successfully, but these errors were encountered:
This was tested with python's multiprocessing library.
I checked the code, everything hangs at the line prior to Goose().extract(...)
not at join() or run(), etc.
This example is specific but the problem is general. Has anyone gotten multiprocessing to work with
goose's extraction period? If so a brief explanation of how or a tiny code snippet sample would be great thanks.
Or at least an explanation of why my code blow is incorrect would be great.!
Sorry if i'm doing something blatantly wrong. This is my first time
doing multiprocessing in python.
The text was updated successfully, but these errors were encountered: