Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in JVM Runtime #30

Open
stevenybw opened this issue May 5, 2021 · 9 comments
Open

SIGSEGV in JVM Runtime #30

stevenybw opened this issue May 5, 2021 · 9 comments
Assignees

Comments

@stevenybw
Copy link

Configuration

  • Operating system: Ubuntu 16.04.6 LTS
  • Kernel: 4.4.0-135-generic
  • UCX: UCX Release v1.9.0 configured with ./contrib/configure-release --with-java
  • Java: Oracle JDK 11.0.8
  • Spark: Apache Spark 3.0.1

Spark launch commandline

spark-shell --master yarn --name ExploreSparkUCX --deploy-mode client --num-executors 32 --conf spark.dynamicAllocation.maxExecutors=32 --executor-cores 7 --executor-memory 22g --driver-memory 22g --conf spark.eventLog.enabled='true' --conf spark.eventLog.dir='/user/spark/applicationHistory' --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' --conf spark.driver.extraClassPath='~/Software/ucx-1.9.0-java/lib:~/Software/ucx-1.9.0-java/lib/jucx-1.9.0.jar:~/sparkucx/target/spark-ucx-1.0-for-spark-3.0.jar' --conf spark.executor.extraClassPath='~/Software/ucx-1.9.0-java/lib:~/Software/ucx-1.9.0-java/lib/jucx-1.9.0.jar:~/sparkucx/target/spark-ucx-1.0-for-spark-3.0.jar' --conf spark.shuffle.manager='org.apache.spark.shuffle.UcxShuffleManager' --conf spark.shuffle.sort.io.plugin.class='org.apache.spark.shuffle.compat.spark_3_0.UcxLocalDiskShuffleDataIO'

Scala application:

sc.textFile("Dataset/some-44gb-text-file").flatMap(_.split(' ')).map(x => (x, 1L)).reduceByKey(_+_, 224).count

Phenomena

Of the first stage, with total 448 tasks, 447 tasks have been finished. After that, the Java Runtime is terminated by SIGSEGV as follow:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f551e278b50, pid=3253764, tid=3257270
#
# JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.8+10) (build 11.0.8+10-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.8+10-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xd06b50][thread 3254166 also had an error]
[thread 3254607 also had an error]
  ResolvedMethodTable::lookup(int, unsigned int, Method*)+0x30
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to ~/core.3253764)
#
# An error report file with more information is saved as:
# ~/hs_err_pid3253764.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

With the hs_err_pid3253764.log:

Current thread (0x00007f51f0095000):  JavaThread "task-result-getter-2" daemon [_thread_in_vm, id=3257270, stack(0x00007f51a5cfb000,0x00007f51a5dfc000)]

Stack: [0x00007f51a5cfb000,0x00007f51a5dfc000],  sp=0x00007f51a5df8fd0,  free space=1015k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd06b50]  ResolvedMethodTable::lookup(int, unsigned int, Method*)+0x30
V  [libjvm.so+0x891c7d]  java_lang_invoke_ResolvedMethodName::find_resolved_method(methodHandle const&, Thread*)+0x1d
V  [libjvm.so+0xaacdec]  CallInfo::set_resolved_method_name(Thread*)+0x6c
V  [libjvm.so+0xbe4062]  MethodHandles::resolve_MemberName(Handle, Klass*, bool, Thread*)+0x802
V  [libjvm.so+0xbe41ea]  MHN_resolve_Mem+0x12a
J 772  java.lang.invoke.MethodHandleNatives.resolve(Ljava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (0 bytes) @ 0x00007f5502b961af [0x00007f5502b960c0+0x00000000000000ef]
J 9382 c1 java.lang.invoke.MemberName$Factory.resolve(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (157 bytes) @ 0x00007f54fbb5bab4 [0x00007f54fbb5b8c0+0x00000000000001f4]
J 16342 c1 java.lang.invoke.MemberName$Factory.resolveOrFail(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/invoke/MemberName; [email protected] (53 bytes) @ 0x00007f54fc4efe9c [0x00007f54fc4efe20+0x000000000000007c]
J 2735 c1 java.lang.invoke.MethodHandles$Lookup.resolveOrFail(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MemberName; [email protected] (48 bytes) @ 0x00007f54fbcffab4 [0x00007f54fbcff6e0+0x00000000000003d4]
J 2058 c1 java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite; [email protected] (168 bytes) @ 0x00007f54fbba916c [0x00007f54fbba82e0+0x0000000000000e8c]
J 2357 c1 java.lang.invoke.LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; [email protected] (287 bytes) @ 0x00007f54fbc43314 [0x00007f54fbc41f60+0x00000000000013b4]
J 15481 c2 java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (20 bytes) @ 0x00007f550334bcd8 [0x00007f550334bca0+0x0000000000000038]
J 2356 c1 java.lang.invoke.DelegatingMethodHandle$Holder.delegate(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (23 bytes) @ 0x00007f54fbc41644 [0x00007f54fbc411e0+0x0000000000000464]
J 1876 c1 java.lang.invoke.BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object; [email protected] (688 bytes) @ 0x00007f54fbb3d58c [0x00007f54fbb3a020+0x000000000000356c]
J 1875 c1 java.lang.invoke.CallSite.makeSite(Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/invoke/CallSite; [email protected] (91 bytes) @ 0x00007f54fbb34244 [0x00007f54fbb341c0+0x0000000000000084]
J 1874 c1 java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (44 bytes) @ 0x00007f54fbb2fa2c [0x00007f54fbb2f9c0+0x000000000000006c]
J 1873 c1 java.lang.invoke.MethodHandleNatives.linkCallSite(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (66 bytes) @ 0x00007f54fbb2f454 [0x00007f54fbb2f000+0x0000000000000454]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x889559]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9
V  [libjvm.so+0x888285]  JavaCalls::call_static(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x115
V  [libjvm.so+0xdd1009]  SystemDictionary::find_dynamic_call_site_invoker(Klass*, int, Handle, Symbol*, Symbol*, Handle*, Handle*, Thread*)+0x459
V  [libjvm.so+0xab507f]  LinkResolver::resolve_dynamic_call(CallInfo&, int, Handle, Symbol*, Symbol*, Klass*, Thread*)+0x4f
V  [libjvm.so+0xab5434]  LinkResolver::resolve_invokedynamic(CallInfo&, constantPoolHandle const&, int, Thread*)+0x2c4
V  [libjvm.so+0xab93d6]  LinkResolver::resolve_invoke(CallInfo&, Handle, constantPoolHandle const&, int, Bytecodes::Code, Thread*)+0x3c6
V  [libjvm.so+0x87f698]  InterpreterRuntime::resolve_invokedynamic(JavaThread*)+0x168
V  [libjvm.so+0x87f9dd]  InterpreterRuntime::resolve_from_cache(JavaThread*, Bytecodes::Code)+0x15d
j  org.apache.spark.scheduler.TaskSetManager.maybeFinishTaskSet()V+39
j  org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(JLorg/apache/spark/scheduler/DirectTaskResult;)V+341
J 19438 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(Lorg/apache/spark/scheduler/TaskResultGetter$$anon$3;Ljava/lang/Object;)V (810 bytes) @ 0x00007f54fdf1bd9c [0x00007f54fdf17820+0x000000000000457c]
J 19437 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3$$Lambda$2884.apply$mcV$sp()V (12 bytes) @ 0x00007f54fdf018c4 [0x00007f54fdf01840+0x0000000000000084]
J 18878 c2 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f550355e5dc [0x00007f550355e5a0+0x000000000000003c]
J 19257 c1 org.apache.spark.util.Utils$.logUncaughtExceptions(Lscala/Function0;)Ljava/lang/Object; (66 bytes) @ 0x00007f54fde87ab4 [0x00007f54fde879a0+0x0000000000000114]
J 19290 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.run()V (47 bytes) @ 0x00007f54fde961ac [0x00007f54fde95d40+0x000000000000046c]
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 [email protected]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 [email protected]
j  java.lang.Thread.run()V+11 [email protected]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x889559]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9
V  [libjvm.so+0x88750d]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1ed
V  [libjvm.so+0x9335ec]  thread_entry(JavaThread*, Thread*)+0x6c
V  [libjvm.so+0xe0f0aa]  JavaThread::thread_main_inner()+0x1fa
V  [libjvm.so+0xe0f411]  JavaThread::run()+0x351
V  [libjvm.so+0xe0acaa]  Thread::call_run()+0x13a
V  [libjvm.so+0xc5293e]  thread_native_entry(Thread*)+0xee

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 772  java.lang.invoke.MethodHandleNatives.resolve(Ljava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (0 bytes) @ 0x00007f5502b96136 [0x00007f5502b960c0+0x0000000000000076]
J 9382 c1 java.lang.invoke.MemberName$Factory.resolve(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (157 bytes) @ 0x00007f54fbb5bab4 [0x00007f54fbb5b8c0+0x00000000000001f4]
J 16342 c1 java.lang.invoke.MemberName$Factory.resolveOrFail(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/invoke/MemberName; [email protected] (53 bytes) @ 0x00007f54fc4efe9c [0x00007f54fc4efe20+0x000000000000007c]
J 2735 c1 java.lang.invoke.MethodHandles$Lookup.resolveOrFail(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MemberName; [email protected] (48 bytes) @ 0x00007f54fbcffab4 [0x00007f54fbcff6e0+0x00000000000003d4]
J 2058 c1 java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite; [email protected] (168 bytes) @ 0x00007f54fbba916c [0x00007f54fbba82e0+0x0000000000000e8c]
J 2357 c1 java.lang.invoke.LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; [email protected] (287 bytes) @ 0x00007f54fbc43314 [0x00007f54fbc41f60+0x00000000000013b4]
J 15481 c2 java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (20 bytes) @ 0x00007f550334bcd8 [0x00007f550334bca0+0x0000000000000038]
J 2356 c1 java.lang.invoke.DelegatingMethodHandle$Holder.delegate(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (23 bytes) @ 0x00007f54fbc41644 [0x00007f54fbc411e0+0x0000000000000464]
J 1876 c1 java.lang.invoke.BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object; [email protected] (688 bytes) @ 0x00007f54fbb3d58c [0x00007f54fbb3a020+0x000000000000356c]
J 1875 c1 java.lang.invoke.CallSite.makeSite(Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/invoke/CallSite; [email protected] (91 bytes) @ 0x00007f54fbb34244 [0x00007f54fbb341c0+0x0000000000000084]
J 1874 c1 java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (44 bytes) @ 0x00007f54fbb2fa2c [0x00007f54fbb2f9c0+0x000000000000006c]
J 1873 c1 java.lang.invoke.MethodHandleNatives.linkCallSite(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (66 bytes) @ 0x00007f54fbb2f454 [0x00007f54fbb2f000+0x0000000000000454]
v  ~StubRoutines::call_stub
j  org.apache.spark.scheduler.TaskSetManager.maybeFinishTaskSet()V+39
j  org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(JLorg/apache/spark/scheduler/DirectTaskResult;)V+341
J 19438 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(Lorg/apache/spark/scheduler/TaskResultGetter$$anon$3;Ljava/lang/Object;)V (810 bytes) @ 0x00007f54fdf1bd9c [0x00007f54fdf17820+0x000000000000457c]
J 19437 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3$$Lambda$2884.apply$mcV$sp()V (12 bytes) @ 0x00007f54fdf018c4 [0x00007f54fdf01840+0x0000000000000084]
J 18878 c2 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f550355e5dc [0x00007f550355e5a0+0x000000000000003c]
J 19257 c1 org.apache.spark.util.Utils$.logUncaughtExceptions(Lscala/Function0;)Ljava/lang/Object; (66 bytes) @ 0x00007f54fde87ab4 [0x00007f54fde879a0+0x0000000000000114]
J 19290 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.run()V (47 bytes) @ 0x00007f54fde961ac [0x00007f54fde95d40+0x000000000000046c]
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 [email protected]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 [email protected]
j  java.lang.Thread.run()V+11 [email protected]
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000

Register to memory mapping:

RAX=0x00007f55184749b8 points into unknown readable memory: 70 64 06 f0 51 7f 00 00
RBX={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
RCX=0x7fa5f0a12000d16c is an unknown value
RDX=0x0000000000000005 is an unknown value
RSP=0x00007f51a5df8fd0 is pointing into the stack for thread: 0x00007f51f0095000
RBP=0x00007f51a5df9020 is pointing into the stack for thread: 0x00007f51f0095000
RSI=0x0000000000000005 is an unknown value
RDI=0x00007f5518474950 points into unknown readable memory: ef 03 00 00 00 00 00 00
R8 ={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
R9 =0x0000000000000005 is an unknown value
R10=0x0000000000000065 is an unknown value
R11=0x000001fd47c00cb2 is an unknown value
R12=0x0000000000000005 is an unknown value
R13=0x00000000606ce22e is an unknown value
R14={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
R15=0x7fa5f0a12000d16c is an unknown value
@petro-rudenko petro-rudenko self-assigned this May 6, 2021
@petro-rudenko
Copy link
Member

Can you please try with export UCX_ERROR_SIGNALS= on driver and --conf spark.executorEnv.UCX_ERROR_SIGNALS= in spark conf.

@stevenybw
Copy link
Author

After add export UCX_ERROR_SIGNALS= and export SPARK_UCX_HOME=$HOME/sparkucx/target together with Spark configuration, the problem still exists. JVM terminates with SIGSEGV with the following stack track:

---------------  T H R E A D  ---------------

Current thread (0x00007fc2fc02e000):  GCTaskThread "GC Thread#27" [stack: 0x00007fc2cd2b9000,0x00007fc2cd3b9000] [id=261425]

Stack: [0x00007fc2cd2b9000,0x00007fc2cd3b9000],  sp=0x00007fc2cd3b7b70,  free space=1018k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x5dbe40]  ClassLoaderDataGraph::roots_cld_do(CLDClosure*, CLDClosure*)+0x20
V  [libjvm.so+0x7d2a15]  G1RootProcessor::process_java_roots(G1RootClosures*, G1GCPhaseTimes*, unsigned int)+0x65
V  [libjvm.so+0x7d319e]  G1RootProcessor::evacuate_roots(G1ParScanThreadState*, unsigned int)+0x9e
V  [libjvm.so+0x78285c]  G1ParTask::work(unsigned int)+0xec
V  [libjvm.so+0xea176d]  GangWorker::loop()+0x4d
V  [libjvm.so+0xe0acaa]  Thread::call_run()+0x13a
V  [libjvm.so+0xc5293e]  thread_native_entry(Thread*)+0xee


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00000000000001d0

@stevenybw
Copy link
Author

We try to construct a smaller dataset (11GB), shrinking the size about 4x, the result is correct and no more SIGSEGV. However, every time we try to experiment with larger dataset (44GB), the problem can be stably reproduced. This would suggest that somewhere buffer has overflow.

@petro-rudenko
Copy link
Member

Can you please send hs_err.log file with whole stacktrace.

@stevenybw
Copy link
Author

hs_err_pid261284.log

@petro-rudenko
Copy link
Member

Does it work with the same parameters and not using ucx? Does it work with -XX:+UseParallelGC?

@stevenybw
Copy link
Author

Does it work with the same parameters and not using ucx?

Yes. I've verified that just now, without sparkucx it is fine.

Does it work with -XX:+UseParallelGC?

No. Adding "-XX:+UseParallelGC" to both the driver and the executor won't help, the problem still exists.

@petro-rudenko
Copy link
Member

Does it happen at the beginning of the job, at map phase or reduce phase or at the end? Do you see something in dmesg? SparkUCX is doing memory mapping of the file, so it requires a lot of virtual memory. Can you try with ``--conf spark.shuffle.ucx.memory.useOdp=true`

@stevenybw
Copy link
Author

Does it happen at the beginning of the job, at map phase or reduce phase or at the end?

Some tasks can complete, for example, 447 succeed among 448 tasks in the map stage but the last failed. Both could happen, sometimes map phase, sometimes reduce phase.

Do you see something in dmesg?

Rarely we may observe the following triggered by SparkUCX on the driver side.

2149 [Tue May 11 22:16:34 2021] ib_umem_get: failed to get user pages, nr_pages=512
2150 [Tue May 11 22:16:34 2021] mlx5_0:mr_umem_get:713:(pid 3268577): umem get failed (-512)

Can you try with ``--conf spark.shuffle.ucx.memory.useOdp=true`

The problem still exists.

Update

We found an interesting phenomenon that when increasing the number of reducers from 224 to 448, the word count can produce correct result. Moreover, for 224-reducer configuration, it will always ends with SIGSEGV; for 448-reducer configuration, it will always produce correct result. From 224 to 448 partition, the message size from each mapper to each reducer is reduced. We guess it is possible that somewhere a fixed-size buffer has overflow. Hope this information is useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants