Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

US Census Block plugin - add better recovery after failure #50

Open
jefffriesen opened this issue Sep 13, 2017 · 0 comments
Open

US Census Block plugin - add better recovery after failure #50

jefffriesen opened this issue Sep 13, 2017 · 0 comments

Comments

@jefffriesen
Copy link

I'm wondering if there is a way to get better recovery after failures. I have about 35K unique lat/lon points I'm getting census blocks for. At 1 second/request that's about 9 hours. After about 7 hours I got this failure below. The script didn't recover and the data collected before was lost. I'm wondering if there is a more graceful way that the code can recover and at least keep what was done previously.

[01:53:45] [INFO] [dku.utils]  - 17651 - processing: (40.0309365,-105.2930896)
[01:53:46] [INFO] [dku.utils]  - 17652 - processing: (40.0309393,-105.2643413)
[01:55:43] [INFO] [dku.utils]  - *************** Recipe code failed **************
[01:55:43] [INFO] [dku.utils]  - Begin Python stack
[01:55:43] [INFO] [dku.utils]  - Traceback (most recent call last):
[01:55:43] [INFO] [dku.utils]  -   File "/Users/jeffers/Library/DataScienceStudio/dss_home/jobs/BOULDERCOUNTYSOURCETRANSFORMS/Build_census_blocks_2017-09-13T00-37-50.653/compute_census_blocks_NP/custompyrecipehdMLSMuNyO49/python-exec-wrapper.py", line 3, in <module>
[01:55:43] [INFO] [dku.utils]  -     execfile(sys.argv[1])
[01:55:43] [INFO] [dku.utils]  -   File "/Users/jeffers/Library/DataScienceStudio/dss_home/jobs/BOULDERCOUNTYSOURCETRANSFORMS/Build_census_blocks_2017-09-13T00-37-50.653/compute_census_blocks_NP/custompyrecipehdMLSMuNyO49/script.py", line 68, in <module>
[01:55:43] [INFO] [dku.utils]  -     'showall': 'true'
[01:55:43] [INFO] [dku.utils]  -   File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/api.py", line 70, in get
[01:55:43] [INFO] [dku.utils]  -     return request('get', url, params=params, **kwargs)
[01:55:43] [INFO] [dku.utils]  -   File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/api.py", line 56, in request
[01:55:43] [INFO] [dku.utils]  -     return session.request(method=method, url=url, **kwargs)
[01:55:43] [INFO] [dku.utils]  -   File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/sessions.py", line 488, in request
[01:55:43] [INFO] [dku.utils]  -     resp = self.send(prep, **send_kwargs)
[01:55:43] [INFO] [dku.utils]  -   File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/sessions.py", line 609, in send
[01:55:43] [INFO] [dku.utils]  -     r = adapter.send(request, **kwargs)
[01:55:43] [INFO] [dku.utils]  -   File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/adapters.py", line 499, in send
[01:55:43] [INFO] [dku.utils]  -     raise ReadTimeout(e, request=request)
[01:55:43] [INFO] [dku.utils]  - ReadTimeout: HTTPConnectionPool(host='data.fcc.gov', port=80): Read timed out. (read timeout=None)
[01:55:43] [INFO] [dku.utils]  - End Python stack
[01:55:43] [INFO] [com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner] - Error file found, trying to throw it: /Users/jeffers/Library/DataScienceStudio/dss_home/jobs/BOULDERCOUNTYSOURCETRANSFORMS/Build_census_blocks_2017-09-13T00-37-50.653/compute_census_blocks_NP/custompyrecipehdMLSMuNyO49/error.json
[01:55:43] [ERROR] [com.dataiku.dip.dataflow.streaming.DatasetWritingService]  - Wait session error: null
org.eclipse.jetty.io.EofException
	at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:464)
	at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:161)
	at java.io.BufferedReader.readLine(BufferedReader.java:324)
	at java.io.BufferedReader.readLine(BufferedReader.java:389)
	at com.dataiku.dip.input.stream.InputStreamLineReader.readLine(InputStreamLineReader.java:30)
	at com.dataiku.dip.input.formats.csv.RFC4180CSVParser.next(RFC4180CSVParser.java:21)
	at com.dataiku.dip.dataflow.streaming.DatasetWriter.appendFromCSVStream(DatasetWriter.java:139)
	at com.dataiku.dip.dataflow.streaming.DatasetWritingService.pushData(DatasetWritingService.java:255)
	at com.dataiku.dip.dataflow.kernel.slave.KernelSession.pushData(KernelSession.java:237)
	at com.dataiku.dip.dataflow.kernel.slave.KernelServlet.service(KernelServlet.java:199)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:738)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:462)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
	at java.lang.Thread.run(Thread.java:745)
[01:55:43] [INFO] [dku.flow.activity] - Run thread failed for activity compute_census_blocks_NP
com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='data.fcc.gov', port=80): Read timed out. (read timeout=None)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:304)
	at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:79)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
[01:55:43] [ERROR] [com.dataiku.dip.dataflow.streaming.DatasetWritingService]  - Push data error during streaming:null
org.eclipse.jetty.io.EofException
	at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:464)
	at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:161)
	at java.io.BufferedReader.readLine(BufferedReader.java:324)
	at java.io.BufferedReader.readLine(BufferedReader.java:389)
	at com.dataiku.dip.input.stream.InputStreamLineReader.readLine(InputStreamLineReader.java:30)
	at com.dataiku.dip.input.formats.csv.RFC4180CSVParser.next(RFC4180CSVParser.java:21)
	at com.dataiku.dip.dataflow.streaming.DatasetWriter.appendFromCSVStream(DatasetWriter.java:139)
	at com.dataiku.dip.dataflow.streaming.DatasetWritingService.pushData(DatasetWritingService.java:255)
	at com.dataiku.dip.dataflow.kernel.slave.KernelSession.pushData(KernelSession.java:237)
	at com.dataiku.dip.dataflow.kernel.slave.KernelServlet.service(KernelServlet.java:199)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:738)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:462)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
	at java.lang.Thread.run(Thread.java:745)
[01:55:43] [DEBUG] [dku.jobs]  - Command /tintercom/datasets/push-data processed in 26269806ms
[01:55:43] [DEBUG] [dku.jobs]  - Command /tintercom/datasets/wait-write-session processed in 26269807ms
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - activity is finished
[01:55:43] [ERROR] [dku.flow.activity] running compute_census_blocks_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='data.fcc.gov', port=80): Read timed out. (read timeout=None)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:304)
	at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:79)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - Executing default post-activity lifecycle hook
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - Removing samples for BOULDERCOUNTYSOURCETRANSFORMS.census_blocks
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - Done post-activity tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant