-
Notifications
You must be signed in to change notification settings - Fork 602
BSONLoader has trouble with highly nested Mongo documents #113
base: master
Are you sure you want to change the base?
Conversation
… schema retrieval.
Thanks for the patch. You're probably correct that working up the class hierarchy is the correct fix. My biggest concern for this patch is the lack of an associated test. Could you try to attach a unit test for this? Testing is less than ideal for the pig code right now but I'm trying to fix that, too. |
I added a simple unit test. |
Hi @gminerbo |
Hello, Unfortunately, I've changed jobs and lost access to this data. It would On Tue, Apr 28, 2015 at 4:06 PM, Luke Lovett [email protected]
|
I can be of assistance here. If you don't have the LoadMetadata interface on the BSONLoader (which is on the MongoLoader!) then you get a pretty generic Projected field not found when trying to operate on the data: Pig Stack TraceERROR 1025: org.apache.pig.impl.plan.PlanValidationException: ERROR 1025: Perhaps there's a way to sideload/reload the schema, but it's already been loaded once, it shouldn't have to be! |
Additionally, there's the ClassCastException I mentioned in the source code elsewhere if using BasicDB[List/Object] instead of BasicBSON. I don't have a stack trace for that right now, but there seems to be nothing special or different about our BSON dumps. |
@dggrj Thanks for the additional info here. Do you have a Pig script that can reproduce the issue? It would also be useful to have the stack trace of the ClassCastException to pinpoint the location of the problem. Thanks! |
I don't think I can mutate things back to where I got the CCE for a stack trace right now, I'm sorry to say, but I can tell you that I definitely saw them at line 136/7 inside of TUPLE conversion, and suspect that if it hadn't hit there first that it'd have hit inside BAG, as well. I will provide a variant of the schema of our BSON data that we're loading and the script and point out where different parts errored as best as I can. |
I encountered problems using the BSONLoader to read highly nested Mongo documents.
For example,
The changes contained herein resolved the problem for me. The presence of the BasicDBObject & BasicDBList rather than BasicBSONObject and BasicBSONList seem like a mistake to me, yet my confidence is low as I would have expected this to be discovered by now.
Adding the LoadMetadata interface is just a convenience, but it was required due to the other processing I needed to do in my Pig script.
I can try to create a test case (bson file) for this, but it will take some time, as I cannot give away my data.