Data management

All the generally accepted best practices of data management certainly hold true when designing, implementing and executing a Zonation analysis. In fact, there are several reasons why you should probably spend more time than on average on thinking how you organize, process, store and document your data.

Iterative pre-processing. It is often not clear at the early stages of designing and building the model of spatial prioritization how the data available should be translated into low-level factors needed to address the high-level objectives of prioritization. Therefore, the data pre-processing involved in building the model of spatial prioritization is typically iterative and experimental in nature, producing potentially large quantities of intermediate files and final input data.

Large amounts of input data. For anything but the most simple and localized analyses, the resolution and number of input features will stack up to a large amount bytes that needs to be stored on your hard drive. Most analyses will not end up consuming terabytes of disk space, but there is usually enough data involved to justify some advance planning.

Documentation. Especially when you are using Zonation to support operative decision-making, you will have to be able to show clearly the whole pathway of how the input data is processed in accordance to the model of spatial prioritization and how the results relate to the inputs. This task becomes excruciatingly difficult unless the processing (both pre and post) steps and the data provenance are well documented.

Repeatability. If you are interested in developing a conservation prioritization process rather than executing a once-off analysis, you will most probably want to repeat the analysis at some point in the future. Systematic and well-documented data management will increase the repeatability of your analyses.

Unfortunately, we cannot offer any silver-bullets for how exactly you should go about in planning for data management, but especially for larger projects it worth considering the usual stages of data management planning:

Information about data and data formats. What types of data are included? How will the data be acquired? How will the data be processed? What formats are used? How are version control and back-ups handled?
Metadata on the content and formats. What metadata are collected? Are specific standards used? How will the metadata be collected and stored?
Policies of access, sharing and re-use. Are there any obligations (e.g. legal, or from data owners and funders) related to data usage? How will the data be shared? What/who are the intended future uses/users of the data? How will the data be cited?
Long-term storage and data management. Which data are stored in the long-term? Which persons or organizations should be contacted with inquiries about the data?
Budget. How much resources (human and monetary) are available for data preparation, management, documentation and preservation? How will the costs be distributed?

For many projects using Zonation, coming up with a full data management plan may seem like overkill, but even thinking about these issues is certainly useful. A decent data management plan together with an emphasis on proper documentation of the various processing and analysis stages can also mitigate the effects of turnover in the team responsible for the implementation. With this, we refer to an all too common situation where the main responsibility of executing the data processing - and often Zonation analyses as well - is given to only a few people or a single person. Without the supporting data management plan and metadata documentation the capability to repeat and develop the analyses in the future is compromised should the key personnel move to other positions potentially in other organizations.