From aa1f5565e8205a0cbb261f18509fdefb34d84943 Mon Sep 17 00:00:00 2001
From: Lucie Bakels <lucie.bakels@univie.ac.at>
Date: Thu, 18 Jul 2024 07:59:41 +0000
Subject: [PATCH] updated READMEs

---
 README.md          |   2 +-
 README_PARALLEL.md | 201 ---------------------------------------------
 2 files changed, 1 insertion(+), 202 deletions(-)
 delete mode 100644 README_PARALLEL.md

diff --git a/README.md b/README.md
index a50b282f..0bf4c755 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ Other references:
 
 * This repository contains versions of the Lagrangian model FLEXPART
 * Development versions
-* Issues on the FLEXPART model, [tickets](./-/issues)/[mail](mailto:flexpart-support.img-wien@univie.ac.at)
+* Issues on the FLEXPART model, [tickets](https://gitlab.phaidra.org/flexpart/flexpart/-/issues)/[mail](mailto:flexpart-support.img-wien@univie.ac.at)
 * Feature requests for future versions
 
 ## Getting started with Flexpart
diff --git a/README_PARALLEL.md b/README_PARALLEL.md
deleted file mode 100644
index a92d21aa..00000000
--- a/README_PARALLEL.md
+++ /dev/null
@@ -1,201 +0,0 @@
-
-FLEXPART VERSION 10.0 beta (MPI)
-
-Description
------------
-
-  This branch contains both the standard (serial) FLEXPART, and a parallel 
-  version (implemented with MPI). The latter is under developement, so not 
-  every FLEXPART option is implemented yet.
-
-  MPI related subroutines and variables are in file mpi_mod.f90.
-
-  Most of the source files are identical/shared between the serial and 
-  parallel versions. Those that depend on the MPI module have '_mpi' 
-  apppended to their names, e.g. 'timemanager_mpi.f90'
-
-
-Installation
-------------
-
-  A MPI library must be installed on the target platform, either as a 
-  system library or compiled from source.
-
-  So far, we have tested the following freely available implementations:
-  mpich2  -- versions 3.0.1, 3.0.4, 3.1, 3.1.3
-  OpenMPI -- version 1.8.3
-
-  Based on testing so far, OpenMPI is recommended.
-
-  Compiling the parallel version (executable: FP_ecmwf_MPI) is done by
-
-    'make [-j] ecmwf-mpi'
-
-  The makefile has resolved dependencies, so 'make -j' will compile 
-  and link in parallel. 
-
-  The included makefile must be edited to match the target platform 
-  (location of system libraries, compiler etc.).
-
-
-Usage
------
-
-  Running the parallel version with MPI is done with the "mpirun" command
-  (some MPI implementations may use a "mpiexec" command instead). The 
-  simplest case is:
-
-    'mpirun -n [number] ./FP_ecmwf_MPI'
-
-  where 'number' is the number of processes to launch. Depending on the
-  target platform, useful options regarding process-to-processor bindings
-  can be specified (for performance reasons), e.g,
-
-    'mpirun --bind-to l3cache -n [number] ./FP_ecmwf_MPI'
-
-
-Implementation
---------------
-
-  The current parallel model is based on distributing particles equally
-  among the running processes. In the code, variables like 'maxpart' and 
-  'numpart' are complemented by variables 'maxpart_mpi' and 'numpart_mpi'
-  which are the run-time determined number of particles per process, i.e,
-  maxpart_mpi = maxpart/np, where np are the number of processes. The variable 'numpart' 
-  is still used in the code, but redefined to mean 'number of particles
-  per MPI process'
-
-  The root MPI process writes concentrations to file, following a MPI
-  communication step where each process sends its contributions to root, 
-  where the individual contributions are summed.
-
-  In the parallel version one can choose to set aside a process dedicated
-  to reading and distributing meteorological data ("windfields"). This process will
-  thus not participate in the calculation of trajectories. This might not be
-  the optimal choice when running with very few processes.
-  As an example, running with a total number of processes np=4 and
-  using one of these processes for reading windfields will normally
-  be faster than running with np=3 and no dedicated 'reader' process. 
-  But it is also possible that the
-  program will run even faster if the 4th process is participating in 
-  the calculation of particle trajectories instead. This will largely depend on
-  the problem size (total number of particles in the simulation, resolution
-  of grids etc) and hardware being used (disk speed/buffering, memory
-  bandwidth etc).
-
-  To control this
-  behavior, edit the parameter 'read_grp_min' in file mpi_mod.f90. This 
-  sets the minimum number of total processes at which one will be set 
-  aside for reading the fields. Experimentation is required to find 
-  the optimum value. At typical NILU machines (austre.nilu.no, 
-  dmz-proc01.nilu.no) with 24-32 cores, a value of 6-8 seems to be a 
-  good choice.
-
-  An experimental feature, which is an extension of the functionality
-  described above, is to hold 3 fields in memory instead of the usual 2.
-  Here, the transfer of fields from the "reader" process to the "particle"
-  processes is done on the vacant field index, simultaneously while the
-  "particle" processes are calculating trajectories. To use this feature,
-  set 'lmp_sync=.false'. in file mpi_mod.f90 and set numwfmem=3 in file
-  par_mod.f90. At the moment, this method does not seem to produce faster
-  running code (about the same as the "2-fields" version).
-  
-
-Performance efficency considerations
-------------------------------------
-
-  A couple of reference runs have been set up to measure performace of the
-  MPI version (as well as checking for errors in the implementation).
-  They are as follows:
-   
-  Reference run 1 (REF1):
-    * Forward modelling (24h) of I2-131, variable number of particles
-    * Two release locations 
-    * 360x720 Global grid, no nested grid
-    * Species file modified to include (not realistic) values for
-        scavenging/deposition
- 
-
-  As the parallization is based on particles, it follows that if  
-  FLEXPART-MPI is run with no (or just a few) particles, no performance 
-  improvement is possible. In this case, most processing time is spent
-  in the 'getfields'-routine.
-
-  A) Running without dedicated reader process
-  ----------------------------------------
-  Running REF1 with 100M particles on 16 processes (NILU machine 'dmz-proc04'), 
-  a speedup close to 8 is observed (ca 50% efficiency).
-
-  Running REF1 with 10M particles on 8 processes (NILU machine 'dmz-proc04'), 
-  a speedup close to 3 is observed (ca 40% efficiency). Running with 16
-  processes gives only marginal improvements (speedup ca 3.5) because of the 'getfields'
-  bottleneck.
-  
-  Running REF1 with 1M particles: Here 'getfields' consumes ca 70% of the CPU
-  time. Running with 4 processes gives a speedup of ca 1.5. Running with more
-  processes does not help much here.
-
-  B) Running with dedicated reader process
-  ----------------------------------------
-
-  Running REF1 with 40M particles on 16 processes (NILU machine 'dmz-proc04'), 
-  a speedup above 10 is observed (ca 63% efficiency).
-
-  :TODO: more to come...
-
-
-Advice  
-------
-  From the tests referred to above, the following advice can be given:
-
-    * Do not run with too many processes.
-    * Do not use the parallel version when running with very few particles.
-      
-
-What is implemented in the MPI version
---------------------------------------
-
- -The following should work (have been through initial testing): 
-
-    * Forward runs
-    * OH fields
-    * Radioactive decay
-    * Particle splitting
-    * Dumping particle positions to file
-    * ECMWF data
-    * Wet/dry deposition
-    * Nested grid output
-    * NetCDF output
-    * Namelist input/output
-    * Domain-filling trajectory calculations
-    * Nested wind fields
-
- -Implemented but untested:
-
-    * Backward runs (but not initial_cond_output.f90)
-
- -The following will most probably not work (untested/under developement): 
-
-    * Calculation/output of fluxes 
-
- -This will positively NOT work yet
-
-    * Subroutine partoutput_short (MQUASILAG = 1) will not dump particles
-      correctly at the moment
-    * Reading particle positions from file (the tools to implement this
-      are available in mpi_mod.f90 so it will be possible soon)
-
-
-  Please keep in mind that running the serial version (FP_ecmwf_gfortran)
-  should yield identical results as running the parallel version
-  (FP_ecmwf_MPI) using only one process, i.e. "mpirun -n 1 FP_ecmwf_MPI".
-  If not, this indicates a bug.
-  
-  When running with multiple processes, statistical differences are expected
-  in the results.
-
-Contact
--------
-
-  If you have questions, or wish to work with the parallel version, please 
-  contact Espen Sollum (eso@nilu.no). Please report any errors/anomalies!
-- 
GitLab