Vader performance from Cisco #1

thananon · 2016-07-29T16:22:45Z

This is a thread to discuss performance result from our machine.

16 x Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
128 GB Ram
Red Hat Enterprise Linux Server release 6.5 (Santiago)
gcc (GCC) 5.3.0

All test ran on a single node

(OUTDATED) - Newer result is available in the post below

Single threaded performance : 5-7% degradation
We see slight performance degradation in smaller size message. This is probably because of the atomics macro we used still doing atomics even in single threaded case. Probably @hjelmn 's pull request open-mpi/ompi#1911 will solve it.

Edit : The PR just got merged recently, newer result in a post below.
Edit : Master tested in these graph is at git hash : open-mpi/ompi@9807a6d

Multithreaded Performance : Big improvement
Command line : mpirun -np 2 -mca pml ob1 -mca btl vader,self --bind-to socket ./mr_th_nb -S -t x -s size

We see some performance gain over 1.10.3. While I'm not so sure why we have this gain even we're binding the threads to socket, this does look promising.

When I increase the thread number to 8, 1.10.3 can't run to completion at all but 2.0.0 and master still can.

@artpol84 Please see my command line args and see if I'm doing it correctly.

My lousy run script :

#!/bin/bash

OPT_DIR=$HOME/opt
declare -a imp=("1.10.3" "2.0.0" "master")

for MPI in "${imp[@]}"
do

    rm $OPT_DIR/mpi
    ln -s $OPT_DIR/ompi/$MPI/fast $OPT_DIR/mpi

    echo "Created new MPI symbol for $OPT_DIR/ompi/$MPI/fast"

    make clean >/dev/null
    make > /dev/null
    echo "Recompiled the benchmark, start testing ..."
    let "pow = 1";

    for i in {1..20};
    do
        let "pow*=2"
        mpirun -np 2 -mca btl vader,self -mca pml ob1 --bind-to socket ./mr_th_nb -s $pow -S -t $1 >> $MPI.$1t.result

    done
done

The text was updated successfully, but these errors were encountered:

thananon · 2016-07-29T18:26:50Z

This is the result after I rebased to the current master. (open-mpi/ompi@59d6537)

Single threaded case : Looks better 😄
Master and 1.10.3 seems to be on par now. Thank you @hjelmn .

Multithreaded case
I realized that I missed -b for finebinding on my first post. I added the flag and retest. It does look almost the same as the last graph.
Command : mpirun -np 2 -mca btl vader,self -mca pml ob1 --bind-to socket ./mr_th_nb -s $pow -S -t $1 -b

2 threads

4 threads

abouteiller · 2016-07-29T22:11:03Z

Can't see the figures (404)

thananon · 2016-07-29T22:14:37Z

@abouteiller You should be able to see them now. Sorry I restructured the file in the repo for a bit and forgot that I hotlinked them here.

thananon changed the title ~~Performance from Cisco~~ Vader performance from Cisco Aug 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vader performance from Cisco #1

Vader performance from Cisco #1

thananon commented Jul 29, 2016 •

edited

Loading

thananon commented Jul 29, 2016 •

edited

Loading

abouteiller commented Jul 29, 2016

thananon commented Jul 29, 2016

Vader performance from Cisco #1

Vader performance from Cisco #1

Comments

thananon commented Jul 29, 2016 • edited Loading

(OUTDATED) - Newer result is available in the post below

thananon commented Jul 29, 2016 • edited Loading

2 threads

4 threads

abouteiller commented Jul 29, 2016

thananon commented Jul 29, 2016

thananon commented Jul 29, 2016 •

edited

Loading

thananon commented Jul 29, 2016 •

edited

Loading