This package works only under UNIX.  (Cygwin is an option on Windows,
but you will have to port it, then.)  The most recent version of this
package can always be found in ftp://ftp.ccs.neu.edu/pub/people/gene/pargapmpi/.
Hopefully, you will also find it among the share packages of the GAP
distribution.  It has been tested on Linux (ELF), Solaris 2.6 and OSF 1 (alpha).
To unpack:
  cd .../gap4b5/pkg/
  gunzip pargapmpi-beta.tar.gz
  tar xvf pargapmpi-beta.tar

It can be installed as described for a generic GAP share package:
  cd .../pkg/pargapmpi
  ./configure ../..
  make

Verify that .../pkg/pargapmpi/bin/pargapmpi.sh
   and .../pkg/pargapmpi/bin/procgroup have correct paths.

If you had trouble installing it, please see the next section of this
file.  Otherwise, try it out:

  cd .../pkg/pargapmpi/bin
  ./pargapmpi.sh
[ALTERNATIVE:  PATH/pargapmpi.sh -p4pg PATH/procgroup]

gap>  # This assumes your procgroup file includes two slave processes.
gap>  PingSlave(1);
gap>  SendMsg( "Print(3+4)" );
gap>  SendMsg( "Print(3+4,\"\\n\")" );
gap>  SendMsg( "3+4", 2);
gap>  RecvMsg( 2 );
gap>  FlushAllMsgs();
gap>  SendRecvMsg( "Exec(\"pwd\")" );
gap>  SendMsg("for i in [1..1000000] do for j in [1..1000000] do od; od");
gap>  SendMsg("Print(\"WAKE UP\\n\")");
gap>  ProbeMsgNonBlocking();
gap>  ParReset();
gap>  FlushAllMsgs();
gap>  SendRecvMsg( "a:=45; 3+4", 1 );
gap>  SendMsg( "a", 2 );   # Note "a" defined only on slave 1, not slave 2
gap>  RecvMsg( 2 );
gap>  SendMsg( "a", 1 );
gap>  RecvMsg( 1 );
gap>  myfnc := function() return 42; end;
gap>  BroadcastMsg( PrintToString( "myfnc := ", myfnc ) );
gap>  SendRecvMsg( "myfnc()", 1 );
gap>  FlushAllMsgs();
gap>  squares := ParList( [1..100], x->x^2 );
gap>  MSexample();
gap>  BroadcastMsg( PrintToString("MSList := ", MSList) );
gap>  ParTrace := false;
gap>  BroadcastMsg( PrintToString("MSList := ", MSList) );
gap>  ParEval( "MSList( [10..20], x->x^2 )" );
gap>  ParRead( "/home/gene/.gaprc" );


The ParGAP/MPI
share package was designed and written by Gene Cooperman,
College of Computer Science, Northeastern University, Boston, MA, U.S.A.

If you use ParGAP/MPI to solve a problem then please send a short email
to `gene@ccs.neu.edu' about it, and reference the ParGAP/MPI package
as follows:

\begintt
\bibitem[Coo99]{Coo99}
      Cooperman, Gene,
      {\sl Parallel GAP/MPI (ParGAP/MPI)}, Version 1,
      College of Computer Science, Northeastern University, 1999,
      \verb|http://www.ccs.neu.edu/home/gene/pargapmpi.html|.
\endtt

%======================================================================

0.& Check `.../gap4b5/src/gap.c'  for the following:
\begintt
        #ifdef GAPMPI
            /* GAPMPI module                                               */
            InitInfoGapmpi,
        #endif
\endtt
     An interim version of GAP was missing the line:  `InitInfoGapmpi',
       If you're missing it, add it and re-compile.  You can find a
       version in `.../gap4b5/pkg/pargapmpi/src/gap.c-4b5' or
       `gap.c-4.0' that has the three required
       `\#ifdef GAPMPI  ...  \#endif' for `gap.c'

1.& Do you have enough swap space to support multiple GAP processes?
    A simple way to check this is with the UNIX command, `top'.
    The Linux version of `top' sorts by memory usage if you type `M'.

2.& `make' tries to automatically create
     `pkg/pargapmpi/bin/pargapmpi.sh', and copy the
     parameters\penalty-1000 from `<GAP_ROOT>/bin/gap.sh'.
     <GAP_ROOT> was specified when you executed `./configure
     <GAP_ROOT>' to install ParGAP/MPI.
     This can be error-prone if your site has an unusual setup.
     If you execute `<GAP_ROOT>/bin/gap.sh', does gap come up?
     If so, compare it with pargapmpi.sh and check for
     correct settings in `.../pkg/pargapmpi/bin/pargapmpi.sh'?

3.& Did pargapmpi find your procgroup?
     [ It looks in the current directory, or for:
\begintt
          ... -p4pg PATH/procgroup
\endtt
   &    on the command line. ]

4.& Were the remote slave processes able to start up?  If so, could they
       connect back to the master?
       To test connectivity problems,
       try manually starting a remote slave by executing a line in the
       script.  Try a simple `rsh remote_hostname' to see if the issue
       is with security.

5.& It the previous step failed due to security issues, such as requesting
       a password, you have several options.  `man rshd' tells you the
       security model at your site (or possibly `man ssh' if you use that).
       Then read~"Problems with Passwords (Getting Around Security)".

6.& Is the procgroup file in your current directory set correctly?
     Test it.  If you are calling it on a remote host, manually type:
\begintt
       rsh <HOSTNAME> <BINARY>
\endtt
   & where <HOSTNAME> and <BINARY> appear exactly as in procgroup.
     For example:  `rsh denali.ccs.neu.edu /usr/local/gap4b5/bin/pargapmpi.sh'.
     In some cases, `exec' is used to save process overhead.  Also try:
\begintt
       rsh <HOSTNAME> exec <BINARY>
\endtt
   & If you plan to call it on localhost, try just:   <BINARY>

   & Note that if not all the slave processes succeed in connecting
       to the master, then Parallel GAP writes out a file,
       `/tmp/pargapmpi--rsh.\$\$', where \$\$ is replaced by the the
       process id of Parallel ParGAP/MPI.

7.& Is `pargapmpi' listed in `.../pkg/ALLPKG'?
     [ It's needed to autostart slaves.]

8.& Inside Parallel GAP, has MPI been successfully initialized?
     Try:  `MPI_Initialized();'

9.& A remote (slave) pargapmpi process starts in your home directory
     and tries to cd to a directory of the same name as your local directory.
     Check your assumptions about the remote machine.  Try:
       `SendRecvMsg("Exec(pwd)"); SendRecvMsg("UNIX_Hostname()");
       SendRecvMsg("UNIX_Getpid()");'

10.& If the connection dies at random, after some period of time:
       You can experiment with SO_KEEPALIVE and variants.  (man setsockopt)
       This periodically sends *null messages* so the remote machine
       does not think that the originating machine is dead.
       However, if the remote machine fails to reply, the local process
       sends a SIGPIPE signal to notify current processes of a broken socket,
       even though there might have been only a temporary lapse in
       connectivity.
       ssh specifies `KeepAlive yes' by default, but setting `KeepAlive no'
       might get you through some transient lapses in connectivity due
       to high congestion.
       You may also want to experiment with:  `setenv RSH "rsh -n"'

11.& Read the documentation for further possible problems.

%======================================================================

Note that this package modifies the GAP src and bin files, and creates
a new GAP kernel.  This new GAP kernel can be shared by traditional users
of the old, sequential GAP kernel, and by those doing parallel processing.

The GAP kernel will have identical behavior to the
old GAP kernel when invoked through the gap.sh script or the
bin/@GAParch@/gap binary.  The new GAPMPI variables will appear to the
end user _ONLY_ if the GAP binary was invoked as pargapmpi:  a symbolic
link to the actual GAP binary.  The script, pargapmpi.sh, does this.

So, in a multi-user environment, traditional users can continue to use
gap.sh without noticing any difference.  Only an invocation as pargapmpi.sh
will add the new features.

Comments and contributions to a GAPMPI user library, or any other type
of assistance, are gratefully accepted.
							Gene Cooperman
							gene@ccs.neu.edu
