Technical aspects of running Zonation
Zonation v4 runs on both on Windows (e.g. Windows XP and Windows 7) and Linux (tested only on Ubuntu 12.04 and 14.04 LTS) operating systems[3]</sup>. Typical hardware requirements vary per application, and they can and should be anticipated in advance.
Simplifying, computer resources should not be an issue with smallish applications up to a combination of some hundreds of features and a one million element landscape grid size. It should be possible to run such an application with pretty much any present day off-the-shelf laptop with a minimum of 4GB of RAM (random-access memory). Largest problems that have to our knowledge been run with Zonation are studies with tens of thousands of features and/or landscape grid sizes in the hundreds of millions. While exact computational requirements cannot be estimated, the following can usually be used for guidance.
First, it is the combination of (effective) landscape grid size, number of features in analysis, fill factor of the landscape grid and some analytical settings that determine memory needs. The size (product of dimensions) of the landscape grid should not exceed a billion elements. We will illustrate resource estimation with an example.
RAM requirements. A full grid of information takes 8 bytes per grid cell, meaning that a 100 million element grid requires 800 MB (megabytes) of RAM. A 10M element grid would need 80MB and so on. If a computer has 128 GB (gigabytes, 1000 MB) of RAM, a problem with 100+ of such layers can be analysed. It is possible, however, that the landscape grid is not full of information (an issue called fill factor). For example, assume that a narrow island like New Zealand is bounded in a box of dimensions of 2000 x 4000 elements, implying grid size of 8 million elements. However, if the analysis is about the terrestrial areas of New Zealand, then these areas cover only a small fraction of the grid (say 10%) and the rest is sea that should be coded as NoData. In this case, the fill factor would be 10% and ten times more layers could therefore be included in analysis compared to what would be needed if the grids were full of information. Furthermore, Zonation only stores information where features have occurrences, further reducing memory demands. Some analysis options add to RAM requirements. For example use of matrix connectivity temporarily doubles the memory usage for the layers that the transform is applied on, possibly leading to problems with runs that might otherwise work ok. You can check the memory usage of the Zonation process from the Windows task manager or its equivalent in Linux. Under no circumstances should the RAM memory of the computer be fully used up - if this happens, computations slow down so much that runs will effectively never finish.
Computation time. Computation time increases linearly with features and more than squared as function of landscape size. In our experience computation times start being a problem when the landscape has tens of millions of elements (grid cells) of information and hundreds of layers. If the memory usage of the Zonation process starts to be tens of GB, runs start taking hours. When memory usage is hundreds of GB, computational times grow to days even on comparatively high-powered desktop workstations or small servers. There are several general issues to note.
- If the landscape is big (many tens of millions of elements) the acceleration factor of analysis (warp factor) can and should in most cases be raised to 5000 or 10000.
- Server computers frequently have replicated memory buses, which allows more than one Z processes to be run in the time of one.
- Having many cores on the computer does not help for a single run, because it is the memory bus speed that is the bottleneck.
- Running a Zonation project frequently involves running many runs during which the analysis setup is developed, data errors are corrected or data is updated. Consequently, running Zonation projects is not an issue when single runs take seconds, minutes or hours. If this is the case, a large number of development runs can be done online or automatically overnight or over a weekend. When individual runs take more than a day, problems start emerging as cumulative computation times accumulate and runs do not complete overnight. When single runs take in the order of a week, one has to very carefully consider which runs can be afforded. Also, rerunning all due to a data update will take many weeks - not a desirable situation. Therefore, if single runs seem to take weeks or months, one should probably aggregate the landscape to a lower resolution: accumulating 2x2 blocks of grid cells into bigger grid cells will reduce memory needs to ~30% and the computation time by more than an order of magnitude.
- The vmat-feature can be used to save the state of Zonation after initial data transforms (connectivity, condition, etc.) have been completed. This reduces the time needed to initialize Zonation into minutes even with large problems, thereby saving days of time with large problems that require loading and processing thousands of very large grids. Please see the manual for details.
- Zonation has not been developed to distribute on a cluster. This means that even if you have access to a computational cluster, it does not help. A typical cluster constitutes of a large number of nodes with relatively low-performance central processing units (CPUs). Zonation will ever use only one core, so a cluster will not help. Nowadays many service providers, such as Amazon or Google, offer cloud-based virtual machines with reasonable CPU and RAM capabilities. These providers usually charge based on time used, so this is becoming increasingly an affordable option. The most straightforward and simple approach - given that you have the funding available - is to get a powerful-enough desktop workstation.
Storage requirements. Hard drive space is not usually a problem, but again can be anticipated. Hard drive space is needed when the landscape grid is very large, implying that output files (e.g. the priority rank raster) can be files of gigabytes in size. For large analyses, avoid using inputs or outputting results as ASCII grids, as these take much more space than packed binary formats such as compressed GeoTIFF-files. One should also be careful with the "save transformed features" setting, as use of this might cause one large grid to be output per each feature, thereby requiring even terabytes of free hard drive space. In general, check output file sizes after first Zonation runs and make sure they are not excessive in comparison to hard drive space available. Thus, you can verify that you have space for your Zonation analysis development variants.
Box 3. Common pitfalls in setting up and running Zonation.
While technical execution of Zonation analyses is not the focus of this document, here are a few pointers to things that relatively commonly go wrong in Zonation setups. Check these out first if an analysis does not seem to work as expected.
- Parameter name spelled wrong in the run settings file (i.e. the .dat file, see Zonation manual section 3.3.2.3). This causes the option specified by the parameter to not be used in the analysis without producing a warning. The run_info.txt file is a standard Zonation output, showing information that may help you verify which options were used and which were not.
- Parameter listed under wrong section (section headers are in square brackets, e.g. [Settings]) in the run settings file. Note that also the sections headers are case-sensitive. A misspelled section header will cause all the options defined under that section to be silently ignored.
- Zonation stops right in the beginning and an error message is given but you can't see what is wrong. If the error message says something about a file name, you probably have incorrectly spelled either a file or folder name. Check for odd characters or letters that are easily confused, (1, i, l, o, O, 0 etc.). Alternatively, check that folder names work correctly in absolute or relative paths - you might need to add or fix the path to the folder where the files are. If you are using relative paths (“../../” etc.), make sure you are referring to the correct level in the folder hierarchy.
- Parameter for dispersal kernel wrong by orders of magnitude. If, for example, the unit of measurement in an input raster file is meters but connectivity parameters are entered per kilometer, the scale of the connectivity response is wrong by three orders of magnitude (see e.g. Zonation manual section 3.3.2.2). If you switch on a connectivity option and either (i) no change occurs in the priority ranking, or (ii) you get highly aggregated or circular patterns, you might wish to check your connectivity settings or parameters.
- One or many input feature(s) include(s) no data at all, which may lead to odd errors. You should never use a feature layer that has no data, i.e. features with only zeroes and missing data in them. This may be noticed in some but not all contexts in Zonation. You can check this by searching for features that have a distribution sum of zero reported in the runinfo.txt file. Check section “Loading feature (e.g., species) data layers” and look for lines with pattern: Loaded biodiversity feature file #_N, file_path, non-missing cells: x, their sum: y Look for feature files which have y (distribution sum) with value 0.
Priority rank maps display geometric patterns that intuitively do not appear to be associated with your biodiversity feature distribution data. Check input files as chances are that there has been an error in the development of input files. Dubious geometric patterns such as lines or squares might also appear in analyses that have large areas with low variation in the input features – such as an analysis with only a small number of presence/absence features.
In general, when a new set of biodiversity feature layers, connectivity, costs, or any other options are entered into an analysis (or switched on), one would always expect a sensible change in the priority rank map. The change is not always large, but it should be there. If no change is visible, the change has most likely not taken place correctly. You may have edited, for example, a different settings or feature list file than you thought you were editing.
3. Everything else being equal, Zonation (version 4.0) will run approximately 1.8 times faster on Linux than on Windows using the same hardware. ↩