Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Get Past Generating VocabIDs #1

Open
chanelm opened this issue Oct 12, 2011 · 1 comment
Open

Can't Get Past Generating VocabIDs #1

chanelm opened this issue Oct 12, 2011 · 1 comment

Comments

@chanelm
Copy link

chanelm commented Oct 12, 2011

was testing out this project and whenever i get to generating the vocabIDs (right after the first hadoop m/r), it always throws this error:

BigFatLM.Sentences 100000
BigFatLM.Tokens 1003830
BigFatLM.Types 63106
Finished: BigFatLM -- Make Vocabulary ID's
Merging unigram count files: BigFatLM -- Make Vocabulary ID's
Copying HDFS file hdfs://rhl095.in.escapemg.com:54310/user/search/tmp/BigFatLM5204421244738246067 to /tmp/BigFatLM956014149429852912.vocab1
Oct 11, 2011 9:22:50 PM bigfat.hadoop.SortUtils sortInPlace
INFO: Running external sort: sort -n -r -k2 -t -o /tmp/BigFatLM956014149429852912.vocab1 /tmp/BigFatLM956014149429852912.vocab1
Assigning vocab IDs for: BigFatLM -- Make Vocabulary ID's
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1937)
at bigfat.step1.VocabIteration.run(VocabIteration.java:79)
at bigfat.BigFatLM.run(BigFatLM.java:115)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at bigfat.BigFatLM.main(BigFatLM.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

@jhclark
Copy link
Owner

jhclark commented Oct 12, 2011

Could you say a bit more about this error. A few questions:

  • What OS are you running?
  • How big is your cluster?
  • How much data are you testing on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants