Skip to content
This repository has been archived by the owner on Jul 31, 2022. It is now read-only.

Vader performance from Cisco #1

Open
thananon opened this issue Jul 29, 2016 · 3 comments
Open

Vader performance from Cisco #1

thananon opened this issue Jul 29, 2016 · 3 comments

Comments

@thananon
Copy link
Member

thananon commented Jul 29, 2016

This is a thread to discuss performance result from our machine.

16 x Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
128 GB Ram
Red Hat Enterprise Linux Server release 6.5 (Santiago)
gcc (GCC) 5.3.0

All test ran on a single node

(OUTDATED) - Newer result is available in the post below

Single threaded performance : 5-7% degradation
We see slight performance degradation in smaller size message. This is probably because of the atomics macro we used still doing atomics even in single threaded case. Probably @hjelmn 's pull request open-mpi/ompi#1911 will solve it.

Edit : The PR just got merged recently, newer result in a post below.
Edit : Master tested in these graph is at git hash : open-mpi/ompi@9807a6d

vader_singlethread

Multithreaded Performance : Big improvement
Command line : mpirun -np 2 -mca pml ob1 -mca btl vader,self --bind-to socket ./mr_th_nb -S -t x -s size

We see some performance gain over 1.10.3. While I'm not so sure why we have this gain even we're binding the threads to socket, this does look promising.

When I increase the thread number to 8, 1.10.3 can't run to completion at all but 2.0.0 and master still can.

@artpol84 Please see my command line args and see if I'm doing it correctly.

vader_2t
vader_4t

My lousy run script :

#!/bin/bash

OPT_DIR=$HOME/opt
declare -a imp=("1.10.3" "2.0.0" "master")

for MPI in "${imp[@]}"
do

    rm $OPT_DIR/mpi
    ln -s $OPT_DIR/ompi/$MPI/fast $OPT_DIR/mpi

    echo "Created new MPI symbol for $OPT_DIR/ompi/$MPI/fast"

    make clean >/dev/null
    make > /dev/null
    echo "Recompiled the benchmark, start testing ..."
    let "pow = 1";

    for i in {1..20};
    do
        let "pow*=2"
        mpirun -np 2 -mca btl vader,self -mca pml ob1 --bind-to socket ./mr_th_nb -s $pow -S -t $1 >> $MPI.$1t.result

    done
done
@thananon
Copy link
Member Author

thananon commented Jul 29, 2016

This is the result after I rebased to the current master. (open-mpi/ompi@59d6537)

Single threaded case : Looks better 😄
Master and 1.10.3 seems to be on par now. Thank you @hjelmn .

new_vader_single

Multithreaded case
I realized that I missed -b for finebinding on my first post. I added the flag and retest. It does look almost the same as the last graph.
Command : mpirun -np 2 -mca btl vader,self -mca pml ob1 --bind-to socket ./mr_th_nb -s $pow -S -t $1 -b

2 threads

vader_2t_finebinding

4 threads

vader_4t_finebinding

@abouteiller
Copy link
Member

Can't see the figures (404)

@thananon
Copy link
Member Author

@abouteiller You should be able to see them now. Sorry I restructured the file in the repo for a bit and forgot that I hotlinked them here.

@thananon thananon changed the title Performance from Cisco Vader performance from Cisco Aug 3, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants