SYNOPSIS

       mpiexec [OPTION]... executable [args]...
       mpiexec [OPTION]... -config configfile
       mpiexec -server



DESCRIPTION

       Mpiexec  is  a replacement program for the script mpirun, which is part
       of the mpich package.  It is used to initialize  a  parallel  job  from
       within  a  pbs  batch or interactive environment.  It further generates
       the environment variables and configuration files necessary to  intial-
       ize a parallel program for the appropriate MPI message-passing library.

       Mpiexec uses the task manager library,  tm(3B),  of  PBS(1B)  to  spawn
       copies  of  the executable on all the nodes in a pbs allocation.  It is
       almost functionally equivalent to

             rsh node "cd $cwd; exec executable arguments",

       using the current working directory from where mpiexec is invoked,  and
       the shell specified in the environment, or from the password file.

       The  standard  input of the mpiexec process is forwarded to task number
       zero in the parallel job, allowing for use of the construct

             mpiexec mycode < inputfile

       This behavior can be modified using the -nostdin  or  -allstdin  flags.
       Standard output and error are also forwarded to mpiexec, allowing redi-
       rection of the outputs of all processes.  This can be turned off  using
       -nostdout  so that the standard output and error streams go through the
       normal PBS mechanisms, to the batch job output files, or to your termi-
       nal  in  the case of an interactive job.  See qsub(1) for more informa-
       tion.



OPTIONS

       All options may be introduced using either a  single  dash,  or  double
       dashes  as  are common in most gnu utilities.  Options may be shortened
       as long as they remain unambiguous.  Options that require arguments may
       appear as separate words in the argument list, or they may be separated
       from the option by an equals sign.


       -n numproc
             Use only the specified number of processes.  Default  is  to  use
             all which were provided in the pbs environment.

       -verbose
             Talk more about what mpiexec is doing.

             process back to the mpiexec process.   Output  on  these  streams
             will  go through the normal PBS mechanisms instead, to wit: files
             of the  form  job.ojobid  and  job.ejobid  for  batch  jobs,  and
             directly to the controlling terminal for interactive jobs.

       -comm type
             Specify  the  communication  library used by your code.  Each MPI
             library has different mechanisms for starting all  the  processes
             of a parallel job, thus you must specify to mpiexec which library
             you use so that it can set up the environment  of  the  processes
             correctly.   The  argument type must be one of:  mpich-gm, mpich-
             mx, mpich-p4, mpich-ib, mpich-rai, mpich2-pmi, lam,  shmem,  emp,
             none;  although  the code may not have been compiled with support
             for some of those.  If this argument is  not  specified,  mpiexec
             will  look  for the environment variable MPIEXEC_COMM which could
             specify one of those arguments.  If this fails,  the  compiled-in
             default communication library is chosen.

       -mpich-p4-shmem

       -mpich-p4-no-shmem
             The  MPICH/P4  library may be configured either to support shared
             memory within a multiprocessor node or not.  It is necessary that
             mpiexec  know in which way the library was configured to success-
             fully start jobs.  While this is generally chosen at compile time
             using  the  --disable-p4-shmem  configure flag, it is possible to
             choose explicitly at runtime with one of these flags.

       -pernode (SMP only)
             Allocate only one process per compute node.  For SMP nodes,  only
             one  processor  will  be  allocated  a job.  This flag is used to
             implement multiple level parallelism with MPI between nodes,  and
             threads within a node, assmuming the code is set up to do that.

       -npernode nprocs (SMP only)
             Allocate no more than nprocs processes per compute node.  This is
             a generalization of the -pernode flag that can be used to  place,
             for example, two tasks on each 4-way SMP.

       -nolocal (not MPICH/P4)
             Do  not  run  any  MPI processes on the local compute node.  In a
             batch job, one of the machines allocated to run  a  parallel  job
             will  run  the batch script and thus invoke mpiexec.  Normally it
             participates in  running  the  parallel  appliacition,  but  this
             option  disables  that  for special situations where that node is
             needed for other processing.

       -transform-hostname sed_expression
             Use an alternate hostname for message passing.  Processes will be
             spawned  using a separate hostname for their message passing com-
             munications.  This is necessary if you  use,  say,  one  ethernet
             card  for  PBS  hostnames,  and another ethernet card for message
             expected  to  be  used  only by power users at sites with complex
             network setups.

       -gige This option is deprecated, but still accepted and  synonymous  to
             the preferred option -transform-hostname=s/node/gige/.

       -tv, -totalview
             Debug using totalview.  The process on node zero attempts to open
             an X window to  $DISPLAY,  and  all  processes  are  attached  by
             totalview threads.  See totalview(1) for more information.

       -kill If  any  one  of the processes dies, wait a little, then kill all
             the other processes in the parallel job.   Your  message  passing
             library should handle this for you in most circumstances.

       -config configfile
             Process  executable and arguments are specified in the given con-
             figuration file.  This flag permits the use of heterogeneous jobs
             using multiple executables, architectures, and command line argu-
             ments.  No executable is given on the command line when using the
             -config  flag.   If  configfile is "-", then the configuration is
             read from standard input.  In this  case  the  flag  -nostdin  is
             mandatory,  as it is not possible to separate the contents of the
             configuration file from process input.

       -version
             Display the mpiexec version number and configure arguments.



MPI LIBRARY OPTIONS

       Different MPI libraries may support tuning  options  which  can  change
       their  behavior  or  performance.   Mpiexec does not explicitly support
       these, but it does pass the  environment  variables  used  to  set  the
       options,  for  example,  MPICH/GM has an option to set the maximum size
       for "eager" (as opposed to rendez-vous) messages.  In sh or bash,  this
       can be set with:

             GMPI_EAGER=16384 mpiexec mycode

       or in csh or tcsh:

             setenv GMPI_EAGER=16384
             mpiexec mycode

       Other   options  can  be  found  in  the  MPI  documentation,  such  as
       GMPI_SHMEM, GMPI_RECV, P4_SOCKBUFSIZE and P4_GLOBMEMSIZE.

       Although not an MPI library implementation,  the  "none"  communication
       device can be handy for running many copies of the same serial program.
       Programs spawned with this device are  provided  an  extra  environment
       variable, MPIEXEC_RANK, which they can use to generate a unique identi-
       fier in the context of the pseudo-parallel job.

       A node specification is a space-separated list of hostnames.  Each ele-
       ment  in  the list is interpreted using case-insensitive standard shell
       wildcard patterns (see glob(7) and  fnmatch(3)),  to  produce  multiple
       hostnames,  possibly.  It is not an error to specify nodes in the node-
       spec that are not actually part of the pbs allocation.  This  allows  a
       single generic configuration file to be used in multiple situations.

       Config file example
             node03 node04 node1* : myexe -s 4
             -n 5 : otherexe -f 2 -large

             If  processors  are available on the nodes, run the code myexe on
             node03, node04, and any machine with a hostname matching  node1*.
             Pick  up  to five other nodes on which to run otherexe, depending
             on availability and any -n arguments.

       Note that each node listed in a node specification is chosen only  once
       to run a given process.  If using multiprocessor nodes, and you do want
       to run two or more copies of the code on a given node, list  that  node
       twice  in the line, or duplicate the config file entry.  Also note that
       node-anonymous specifications (e.g., -n 6) may choose other  processors
       on a node that already has processes assigned; use the -pernode flag on
       the command line if you want node-exclusive behavior.

       There is no way to run  more  than  one  process  per  processor  using
       mpiexec.  You must explicitly spawn threads in your code if you wish to
       do this.  The presence of a -n argument on the command line limits  the
       total  number  of processors available to the configuration file selec-
       tion process, just as the flag -pernode limits the available nodes.

       It is not an error if some lines in the configuration file can  not  be
       satisfied  with the available nodes.  If, however, a -n <numproc> argu-
       ment requests more than can be satisfied, or if no tasks could be allo-
       cated, an error is reported.

       Finally,  the  order  of lines in the configuration file is the same as
       the order of tasks in the MPI sense when the process is started.   Com-
       ments  starting  with  '#'  to the end of the line are ignored anywhere
       they appear in the configuration file.



CONCURRENT MPIEXEC

       You can run invoke mpiexec multiple times in the same  batch  job,  one
       after  the other, sequentially.  But you can also run multiple mpiexecs
       in the same batch job concurrently.  In a 10-node PBS  allocation,  for
       example:

             mpiexec -n 5 a.out args1 < input1 > output1 &
             mpiexec -n 5 a.out args2 < input2 > output2 &
             wait

       Finally,  since  only  one mpiexec can be the master at a time, if your
       code setup requires that mpiexec exit to get a result, you can start  a
       "dummy" mpiexec first in your batch job:

             mpiexec -server

       It  runs no tasks itself but handles the connections of other transient
       mpiexec clients.  It will shut down cleanly when the batch job exits or
       you  may  kill  the  server  explicitly.   If the server is killed with
       SIGTERM (or HUP or INT), it will exit with a status of  zero  if  there
       were  no  clients  connected  at the time.  If there were still clients
       using the server, the server will kill all their tasks, disconnect from
       the clients, and exit with status 1.

       If  you  are  using mpich/p4, be aware that limitations in the mpich/p4
       library restrict all task zeros to be on the same node as  the  mpiexec
       process  itself,  hence  concurrency  is severely limited.  You can use
       -pernode to permit one concurrent job for each CPU in the node, though.



EXAMPLES

       mpiexec a.out
             Run  the  executable a.out as a parallel mpi code on each process
             allocated by pbs.

       mpiexec -n 2 a.out -b 4
             Run the code with arguments -b 4 on only two processors.

       mpiexec -pernode -conf my.config
             Run only one process on each node, using the nodes  and  executa-
             bles listed in the configuration file my.config.

       mpiexec mycode >out 2>err
             Using a sh-compatible shell, send the standard output of all pro-
             cesses to the file out, and the stdandard error to err.

       mpiexec mycode >& output
             Using a csh-compatible shell, combine  the  standard  output  and
             error streams of all processes to the file output.

       mpiexec mycode | sort > output
             Sort  the output of the processes.  Standard error will appear as
             the standard error of the mpiexec process.

       mpiexec -comm none -pernode mkdir /tmp/my-temp-dir
             Run the standard unix command mkdir on each of the SMP  nodes  in
             your PBS node allocation for this job.

       mpiexec -comm mpich-p4 mycode-p4
             Run  a  code  compiled  using  MPICH/P4,  even though your system
             administrator has chosen MPICH/GM as a default.


       To  specify  a default communication library, the variable MPIEXEC_COMM
       may be set to one of the accepted values for -comm as documented above.
       The  command-line  argument takes precedence over the environment vari-
       able, and if neither is set, the compiled-in default is used.

       Note that mpiexec does pass all variables in the environment  which  it
       was given, but PBS will not copy your entire environment for batch jobs
       at job submission time unless you use invoke qsub using  the  -V  argu-
       ment.



DIAGNOSTICS

       mpiexec: Warning: tasks <tasknum>,... exited with status <exitval>.
             One  or  more  of the tasks in the parallel process exited with a
             non-zero exit status.  This is the value a program returns to its
             environment  when  it  finishes,  either with "return exitval" or
             "exit(exitval)", or in FORTRAN, "STOP exitval".  Tradition  holds
             that a program which terminates correctly should return zero, and
             hence mpiexec warns if it sees otherwise.  Due to race conditions
             inherent  in  the  TM interface, sometimes mpiexec will report an
             exit value of zero even though it was actually otherwise.

       mpiexec: Warning: task <tasknum> died with signal <signum>
             One of the tasks in the parallel process exited due to receipt of
             an  uncaught signal.  The symbolic names of signal numbers can be
             listed with "kill -l".  Common ones are SIGSEGV (11)  and  SIGBUS
             (7),  both  of which generally indicate a program error.  Others,
             SIGINT (2), SIGKILL (9), and SIGTERM (15),  may  occur  when  the
             task is killed or interrupted externally.



ERRORS

       tm: not connected
             A  fatal  error  occurred  in  communications between the mpiexec
             process and the local pbs_mom.  This might occur due to  bugs  in
             pbs_mom, and is not recoverable.

       mpiexec:  Error:  PBS_JOBID  not  set in environment.  Code must be run
       from a PBS script, perhaps interactively using "qsub -I".
             It  is  not  possible  to run mpiexec unless you are within a PBS
             environment, either created in a batch or  interactive  PBS  job.
             See tha man page for qsub on how to submit a job.



EXIT VALUE

       Mpiexec  returns  to  its environment the exit status of process number
       zero in a parallel task.  With this,  scripts  which  use  mpiexec  can
       access  the  return value of the parallel program.  If task zero exited
       with a signal, as opposed to naturally with  STOP  or  exit(),  mpiexec
       returns 256 + signum, where signum is the signal that killed task zero.
       This is a convention inherited from PBS.

Man(1) output converted with man2html