-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problematic branches cause a crash #19
Comments
hey, thanks for reporting the crash! short answer, this is because of non-forward (changed from backward...) compatible changes in ROOT 6.12 for I/O VK |
I've run into this same problem. I'd like contribute a fix to this. I know Spark and Scala, but new to Root. @vkhristenko - could you point me at some clues to get me started on fixing this change to Root? |
the problematic part is this
I persoanlly do not know the logic on the ROOT side (w.r.t i do not know how
|
Dear All,
When I try to load a root file (that you can find here [1]) I get the following error [2].
Usually I never had problems using the command below:
df = spark.read.format("org.dianahep.sparkroot").load("file.root")
We noted that some branches (those starting with Jet_btagSF) for instance, are visible in the TBrowser, but not in PYROOT.
I assume some mistake have been done when creating those branches.
There is any plan to be able to skip 'problematic' branches when loading a root file, instead of having a crash?
Cheers,
Luca
[1] https://www.dropbox.com/s/8yzbdvs4rbaiyf7/6214A145-5711-E811-997E-0CC47A78A42C_Friend.root?dl=0
[2]
Map(path -> /data/taohuang/HHNtuple_20180418_DYestimation/DYJetsToLL_M-10to50_TuneCUETP8M1_13TeV-madgraphMLM-pythia8/6214A145-5711-E811-997E-0CC47A78A42C_Friend.root)
Warnng: Generating dummy read for fIOBits
Traceback (most recent call last):
File "trainDY.py", line 32, in
df = spark.read.format("org.dianahep.sparkroot").load("/data/taohuang/HHNtuple_20180418_DYestimation/DYJetsToLL_M-10to50_TuneCUETP8M1_13TeV-madgraphMLM-pythia8/6214A145-5711-E811-997E-0CC47A78A42C_Friend.root")
File "/home/demarley/anaconda2/lib/python2.7/site-packages/pyspark/sql/readwriter.py", line 166, in load
return self._df(self._jreader.load(path))
File "/home/demarley/anaconda2/lib/python2.7/site-packages/py4j/java_gateway.py", line 1160, in call
answer, self.gateway_client, self.target_id, self.name)
File "/home/demarley/anaconda2/lib/python2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/home/demarley/anaconda2/lib/python2.7/site-packages/py4j/protocol.py", line 320, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o41.load.
: java.io.IOException: Cannot skip object with no length
at org.dianahep.root4j.core.RootInputStream.skipObject(RootInputStream.java:596)
at org.dianahep.root4j.core.RootHDFSInputStream.skipObject(RootHDFSInputStream.java:387)
at org.dianahep.root4j.proxy.ROOT.TIOFeatures.readMembers()
at org.dianahep.root4j.core.AbstractRootObject.read(AbstractRootObject.java:52)
at org.dianahep.root4j.core.RootInputStream.readObject(RootInputStream.java:466)
at org.dianahep.root4j.core.RootHDFSInputStream.readObject(RootHDFSInputStream.java:222)
at org.dianahep.root4j.proxy.TTree.readMembers()
at org.dianahep.root4j.core.AbstractRootObject.read(AbstractRootObject.java:52)
at org.dianahep.root4j.proxy.TKey.getObject(:57)
at org.dianahep.sparkroot.core.package$$anonfun$findTree$1.apply(ast.scala:1177)
at org.dianahep.sparkroot.core.package$$anonfun$findTree$1.apply(ast.scala:1166)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at org.dianahep.sparkroot.core.package$.findTree(ast.scala:1166)
at org.dianahep.sparkroot.package$RootTableScan.(sparkroot.scala:97)
at org.dianahep.sparkroot.DefaultSource.createRelation(sparkroot.scala:146)
at org.dianahep.sparkroot.DefaultSource.createRelation(sparkroot.scala:143)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
The text was updated successfully, but these errors were encountered: