Digital Services » Library & Learning Resources (LLR) » Open Access and Research » Research Data Management » Post-Project Data Preservation

Post-Project Data Preservation

Your research data is valuable. It’s therefore worth devoting some time to thinking about what you might preserve after the end of your project, and how you will do this. Planning for data preservation should start at as early a stage as possible, to enable the appropriate steps to be taken.

Why Preserve your Data?

The overriding reason for preserving data is that it is an important resource in its own right – and one which should not be abandoned once a project concludes. Researchers invest significant time and effort in collecting, collating, cleansing, and structuring data, and it is appropriate for this to be recognised.

Creation of a representative and well documented dataset is part of good research practice, providing a foundation for analysis and continued use. Preserving data allows the conclusions reached in the course of a research project (featured in journal articles, books, theses, conference presentations, and other outputs) to be supported or validated, and helps to make research reproducible.

It is very rare for the full value of a dataset to be mined in the course of a single project. Active curation transforms data stored for short term use into preserved data with a future: it ensures researchers will continue to be able to access and make use of it long after a project has finished. Preserving data allows further potential to be tapped in the future by the original creators or others.

Where appropriate, widening access to data gives researchers the opportunity to increase the visibility and impact of their work. If datasets can be cited, this helps the creators of the datasets get proper credit for their work.

Making data available for reuse by others is covered in more detail in the Sharing data section.

There are regulatory requirements to preserve some kinds of data (for example, information about patients from some medical studies) for a minimum period after the research concludes. Funding bodies, universities, and other institutions also recognise the value of data preservation and may consequently have policies covering this area.

Many funding bodies now require that data be preserved for a specified period (often between three and ten years) after the end of the project and made available for reuse where appropriate. Certain funders may also require you to use a specific repository for storing your data. You can find out about the policies of different funders in the Funder Requirements section.

Costs

Thought also needs to be given at an early stage to the costs of preserving data, so that these can be included in the funding application. For example, it may be necessary to budget for additional time and effort to prepare data for preservation, and some data archives levy a charge for deposits.

Most funding bodies will cover reasonable costs, as long as these are incurred during the lifetime of the grant. You can check with your funder what support is available.

Deciding which Data to Preserve

The creator of a dataset is usually best placed to decide what needs to be preserved. This will be based on a combination of:

Knowledge of and insight into the data
Consent or licensing agreements applying to the project
Funders’ and research institutions’ requirements for data management and preservation

If a project is jointly funded, there may be multiple sets of expectations which need to be met.

The absolute minimum will be preservation of the data that underpins the results or conclusions presented in the project’s other research outputs. However, many projects will produce additional data which is also well worth preserving.

Selection for long-term preservation should be based on:

What is needed to validate research outputs
Ethical, legal, or other regulatory reasons to retain or destroy data
How difficult or costly it would be to reproduce the data
Value for future reuse

When considering the potential for reuse, it’s worth thinking outside the confines of the original research. Some data may be of interest to researchers in other disciplines, or to members of the general public, or it may be of use for educational or training purposes.

Preparing Data for Preservation

The process of curating data for preservation involves several key concerns: ensuring the data remains usable for as long as possible, meeting regulatory requirements, and facilitating appropriate reuse. The last of these is dealt with more fully in the Sharing data section.

Data can only continue to be useful if it is possible to access it and then interpret it properly. This requires appropriate technical choices of file formats and good documentation that clearly describes data.

Describing Data

To ensure data remains comprehensible and to reduce the risk of misunderstanding, it is important that everything is properly labelled and documented. A preserved dataset may be accessed many years after its initial creation, when memories of how it was developed or put together have faded: documentation can then be invaluable to serve as a users’ guide and to provide context.

Documentation should aim to cover:

Information about when, where, and by whom the dataset was created, and for what purpose
A description of the dataset
Details of methods used
Details of what has been done to the data – for example, has it been cleansed, edited, restructured, or otherwise manipulated, and if so, how?
Explanations of any acronyms, coding, or jargon
Units of measurement
Annotation of any anomalies (or apparent anomalies) where the reason for these is known
Any other notes which will help aid proper interpretation

It is helpful to use informative file names, and for data to be structured in a way that makes it as easy as possible to navigate.

Datasets may sometimes need to be tidied or otherwise edited as part of the process of creating a preservation dataset. However, it is usually quicker and easier to document data as one goes along, rather than attempting to fill in all the gaps at the end of the project. Documentation written during a project to describe methodology, project progress, and other aspects of research activity can often be put to new use.

Redacting or Anonymising Datasets for Preservation

As a general principle, it is good to preserve as much data as possible. However, there are situations in which not all data can be kept. This may be for practical reasons (for example, because the quantity of data is such that it is not feasible to store it all), or because the data contains sensitive or confidential information which needs to be deleted after a certain point. Key considerations include:

Honouring any commitments made to research participants (e.g. on consent forms)
Compliance with data protection legislation
Institutional expectations around ethical practice

For example, GDPR specifies that personal data should not be retained for longer than necessary. Researchers may thus sometimes opt to remove personal identifiers from a dataset at the end of the project, so that an anonymised version of the data may be preserved. It should be noted, however, that deleting obvious identifiers (names, email addresses, and so on) may not be sufficient to fully anonymise a dataset: it may still be possible to deduce someone’s identity by combining other pieces of information (a postcode and a rare medical condition, for example). Additionally, some types of data, such as video recordings, are very difficult to anonymise adequately. Data creators will need to consider what can realistically be achieved without significantly reducing the value of the dataset, and then plan a suitable preservation strategy in light of this.

The questions of which data should be preserved and of which data should be shared with a view to reuse need to be considered separately. Data which needs to be preserved but is not suitable for sharing can be stored in a secure archive. In some cases, it may be appropriate to have multiple versions of a dataset: for example, an anonymised one which can be shared openly, and one retaining more personal information to which access is restricted. Making data available for reuse by others is covered in more detail in the Sharing data section.

Options for Preserving your Data

One of the best ways of preserving research data for the long term is to deposit a copy in a specialist data archive, also referred to as a data repository. A data archive is a place to securely hold digital research materials (data), along with documentation that helps explain what they are and how to use them (metadata). The application of consistent archiving policies, preservation techniques, and discovery tools further increases the long term availability and usefulness of the data.

A data archive is for stable (completed) versions of the data: it is not a research workspace, or a place for storing data that is still actively being worked on. This means that data is most often deposited towards the end of a research project.

Data archives are designed to store data in an actively curated environment for a significant period, and to disseminate details of that data. They therefore offer significant benefits over attempting to host a preservation dataset on personal or departmental drives: in particular, they relieve the individual researcher of the responsibility of making sure the data remains available, and instead allow this to be handled by a body specialising in the curation of data.

For data which is suitable for reuse, they are also one of the best ways of ensuring that data is made available to as wide an audience as possible. Funders often encourage or mandate the use of data archives, as do journal publishers, as they allow data to be linked to from publications.

UWTSD policy requires researchers to deposit their research data in a suitable subject-based research data archive where available, except in circumstances that would breach Intellectual Property Rights (IPR), commercial considerations, ethical, confidentiality, or other obligations, including the UK GDPR. Search the Re3data registry of research data repositories, or the catalogue available at FAIRsharing to locate an archive related to your discipline.

If a suitable archive is not available, you can archive data to the University Repository.

The deposit should be made upon completion of the research or upon the publication of results, whichever is sooner.

Research data offered to a repository will be supported by a Data Management Plan. The plan should specify measures taken to comply with the UK GDPR and Data Protection Act (2018) and a named contact responsible for any queries over the data being deposited.

If one of the outputs of a research project is a website, it can sometimes be appropriate to host a copy of the data there. However, while this may be an effective way of sharing the data with a wider public, it is not advisable to rely on this as the sole method of preserving data for the long term. Maintaining a website after a project concludes presents a number of challenges and it is hard to predict how long a project website will remain viable for. If at all possible, an additional copy of the data should therefore be deposited in a data archive.

Compared to archival preservation, cloud storage or local storage will mean you will have to take far more responsibility for preservation and curation to ensure continued accessibility of content over time. Your data will also be significantly less discoverable, and is unlikely to be assigned a DOI. It is therefore strongly recommended that researchers consider depositing a copy of their data in a data archive.

Data Destruction

While there are certain circumstances in which data needs to be deleted, this should never be assumed to be the default option: it should only be done where there are compelling reasons.

If data does need to be deleted, it is essential that it is done properly with particular concern for confidentiality and security. Every day deletion practices (for example, moving files to the Recycle Bin and then emptying it) are not sufficient. The UWTSD Records Management Policy should be observed in this respect.

Care must be taken to ensure that all copies of the relevant dataset (including any backup copies) have been identified and dealt with appropriately. Documenting the steps taken may be helpful in case of any future queries.

Further guidance:

If you are Leaving the University

If you are leaving UWTSD, it is your responsibility to make arrangements with your Institute or professional service regarding where your data will be stored, and who will have access to it after you leave the University. You may be required to leave a copy of the data in the care of the University for an appropriate period, to ensure legal or other regulatory compliance, or to meet any funder or other contractual requirements.