There are many decisions to make about managing your data before you even start creating/collecting it, including choosing hardware and software, and addressing issues with intellectual property rights and ethics. Decisions made at the beginning will affect how you can access, use, or preserve your data in the future.
Research data can exist in many forms, dependent on research area / discipline, including:
(list adapted from Leeds University) |
![]() |
In planning a research project, it’s important that you consider which file formats you will use to store your data. In some cases, this will be dictated by the software you’re using or the conventions of your discipline, but in other cases you may have to make a choice between several options:
Popular formats such as those produced by Microsoft Office products (e.g. Word documents or Excel spreadsheets) are likely to have reasonable longevity, but be aware that they are proprietary (owned by someone) and so will not necessarily exist forever or remain easily readable. You may be better off storing important information in open, non-proprietary formats – for example, PDF/A rather than Microsoft Word, CSV rather than Excel, TIFF rather than Photoshop files, or as XML rather than a database.
Some images formats are better for particular purposes than others. For example, TIFFs preserve digital image information well, but users cannot view them with internet browsers and they take up a lot of computer storage space. Taking this into consideration, TIFF image files would make suitable master copies for archival purposes, particularly if the image content is important. For smaller images which are to be used for web delivery and for embedding in documents, JPEG format is suitable. JPEGs are compressed using 'lossy', which keeps the files from being too large. Each time a particular JPEG image is compressed, it loses some of its information, so over time, the image becomes blurry. This process means that JPEGs are not considered for archival processes.
The link below directs to the Digital Preservation Coalition's Handbook, which provides useful information on all aspects of digital preservation.
Once you create, gather, or start manipulating data and files, they can quickly become disorganised. To save time and prevent errors later on, you and your colleagues should decide how you will name and structure files and folder. Including documentation (or 'metadata') will allow you to add context to your data so that you and others can understand it in the short, medium, and long-term. Good metadata should be both computer and human-readable.
Agreeing on a logical and consistent naming convention at the beginning of your project will make it easier to find and correctly identify your files, prevent version control problems when working on files collaboratively, and generally prevent errors in research. Organising your files carefully will save you time and frustration and prevent duplication or errors by helping you and your colleagues find what you need when you need it.
It is useful if your department/project agrees on the following elements of a file name:
Very few documents are drafted by one person in one sitting. More often there will be several people involved in the process and it will occur over an extended period of time. Without proper controls this can quickly lead to confusion as to which version is the most recent. Here is a suggestion of one way to avoid this happening:
Use a 'revision' numbering system. Any major changes to a file can be indicated by whole numbers, for example, v1 would be the first version, v2 the second version. Minor changes can be indicated by increasing the decimal figure for example, v1.01 indicates a minor change has been made to the first version, and v3.01 a minor change has been made to the third version.
When draft documents are sent out for amendment, they should return carry additional information to identify the individual who has made the amendments. Example: a file with the name 20100816_dataman_v1_sj indicates that a colleague (sj) has made amendments to the first version on the 16th August 2010. The lead author would then add those amendments to version v1 and rename the file following the revision numbering system.
Include a 'version control table' each important document, noting changes and their dates alongside the appropriate version number of the document. If helpful, you can include the file names themselves along with (or instead of) the version number.
Agree who will finalise documents, marking them as 'final.'
Many researchers fear that by sharing their data they will lose their competitive edge, that others will misinterpret or misuse their data or that their research methods will be open to scrutiny. However, there also benefits to be gained though sharing your data. For example it:
If you plan for data sharing from the beginning of your project, you can decide on a method of providing access that you are comfortable with.
Issues of intellectual property rights, commercial potential or of privacy can all affect whether you can or should share your data.
Sensitive and confidential data can, however, often be shared ethically if informed consent for data sharing has been given, subjects' identities are anonymised (if needed) or consideration is given to access restrictions.
These measures should be planned from the beginning of your research to ensure that you are not limiting future opportunities to share your data.
The UK Data Archive has an excellent guide on consent, confidentiality and ethics as part of their Managing and Sharing Data guide, and they provide brief guidance and tool reommendations for Anonymisation.
Please note: The University does not authorise or approve the use of DropBox. It should never be used for confidential, personal or sensitive data.
Digital Services recommend Microsoft Teams as a secure space for data(set) and document storage, and as an online location to interact with colleagues. For example, a Teams site can help you:
Contact Digital Services for advice on using the service to safely collaborate with research partners.
Your data are only truly open if other people can access them, understand them and reuse them. It is recognised good research practice by researchers and institutions to manage and retain data, fulfilling any legal requirements that may exist following the conclusion of research projects. This requires active preservation to ensure that the files continue to be readable over the long term, making this an important feature of the research data lifecycle. You should ensure that the repository you choose has active preservation procedures for digital data curation. The Digital Curation Centre highlight data preservation as a key aspect to consider when planning a new research project, particularly with data that are unique and irreplaceable if destroyed or lost. Without the ability to refer to verifiable data, your research may not be judged as sound.
There are numerous Digital Repositories and data centres with varying content types (e.g. articles, data sets, images, etc) and disciplinary foci. The majority of them share data openly with the public, or the research community.
OpenDOAR (Directory of Open Access Repositories) maintains an online list of open access digital repositories, and has a content search tool.
re3data.org is the Registry of Research Data Repositories, providing a global registry of data repositories from different academic disciplines, and its use is particularly recommended in the European Commission’s “Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020”.
Online stores of discipline or subject-specific data ('data centres') abound, but there is currently no definitive list of these.
Some examples of popular data centres include:
At Coventry University, we have the option of utilising the institutional repository for the storage of data and datasets. A short demonstration video 'Uploading Datasets to the Pure Repository' shows the process of adding data to the repository. On validation, deposited datasets will be automatically archived for long-term storage by Preservica, which is connected to the repository. View the Adding a Dataset to Pure tab for the Repository's Terms of Use and Take Down Notice.
Raise the impact of your research. Digital repositories allow you to make data easily accessible to more people than ever before. The more people who can use your data, the more public good it can do and the more it can do to enrich your field of research. Open online access makes new collaborations and uses of data possible. In some areas (e.g. Archaeological excavation data), the data is often unique and many researchers feel a moral compunction to make it available to others (and, of course, to ensure its long-term preservation).
Raise your research profile. The more other researchers cite your data, the more they will know and admire your work. As the trend toward online open access rises, the prestige associated with data citations is growing. In addition, making some data available can increase the credibility of your analyses.
Keep your data safe and readable in the long term. Many researchers hold on to an old computer from a decade or two ago because it is the only way to access their old files, created in formats that are now obsolete. Once these computers break, the files are essentially lost. Many repositories store and back up your treasured research products and will, if appropriate file formats are used, attempt to move the data into new file formats as the original formats become obsolete. So long as the repository exists, your materials will remain readable and usable.
Your funder may require it. This is more and more common. You can find summary of funders’ open access requirements using the SHERPA/JULIET database. Even if your funder does not require that you deposit your data, a plan to deposit your data may strengthen your bid.
If I published my paper/data in a peer-reviewed journal, can I still deposit it in an open digital repository?
This depends on the journal (especially for papers), but the majority do allow it. Contact your journal for more information, or, you can find summary information on journals’ copyright policies using the SHERPA/RoMEO database.
You can also ask your repository support team for help with this. Coventry University's Research and Scholarly Publications Team is happy to help you find an answer, regardless of your target repository.
Choosing what to keep and what can be disposed of or deleted is always going to involve a subjective judgement, as nobody knows exactly what information is going to be wanted in the future.
All we can do is think the matter through carefully, abide by the policies we need to (e.g. from funders) and document decisions made and the reasons for them. It won’t be a perfect process, but should at least be a sensible one.
There are some good reasons why selection is worth doing:
These following questions, based on material devised by the Digital Curation Centre, can help you decide what you should keep and what can be deleted:
Once you've sorted through your files and asked these questions you then need to:
Follow these instructions to upload your data to Coventry University's institutional repository.
1. Navigate to [link to be shared when the activity is in progress]
2. Log in with username and password.
3. To add a new dataset, click on the green 'Add new' button in the top right-hand corner.
4. Select 'Dataset' from the submission options.
5. Start adding metadata to the record, including the title of the data and a description. If the data have been collected over a set period of time, add the dates in.
6. Add the names of the people involved in the data collection / creation to the record, including the role that they played in the research activity and the location in which they work.
7. Include the Organisational Unit by which the dataset is managed (this is usually the same as the location in which the researcher works).
8. Under 'Data availability', add the publisher(s) and if available, the Digital Object Identifier.
9. Upload the dataset and associated files in the 'Electronic data' box, any useful information to accompany the files, and the date that the dataset and files were made available. Ensure to include the correct reuse license as required by the funder / university.
10. Select the access options to the dataset - this will depend on the funder's and/or the University's open data requirements so it is important to refer to their documentation.
11. Adding contact details will ensure others can can request further information about the dataset, if required.
12. Include other useful information, if it is appropriate to the research (temporal coverage, geo location, legal/ethical).
13. Ensure the Visibility is set to 'Public - No restriction' and save the record for validation. It will then be checked by the Research and Scholarly Publications Team to ensure that the metadata details fulfil funder / university requirement.
To ensure that you understand your own data and that others may find, use and properly cite your data, it helps to add 'documentation' or 'metadata' (data about data) to the documents and datasets you create. This encompasses all the information necessary to interpret, understand and use a given dataset or set of documents.
It is good practice to begin to document your data at the very beginning of your research project and continue to add information as the project progresses. Include procedures for documentation in your data planning. There are a number of ways you can add documentation to your data:
Information about a file or dataset can be included within the data or document itself. For digital data sets, this means that the documentation can sit in separate files (for example text files) or be integrated into the data file(s), as a header or at specified locations in the file. Examples include:
This is information in separate files that accompanies data in order to provide context, explanation, or instructions on confidentiality and data use or reuse. Examples include:
The addition of a README file to a metadata record can supplement information relating to data. A template file is available via the Adding a Dataset to Pure tab.
This is structured information which can be used to identify and locate the data that meet the user's requirements via a web browser or web based catalogue. Catalogue metadata is usually structured according to an international standard and associated with the data by repositories or data centres when materials are deposited with them. Examples are:
You may be creating and collecting entirely new data for your project, but you can often draw on a wealth of data already available to complement or enrich your own research. Given the proper attention to Intellectual Property Rights (IPR), Data Protection and ethics, you may be able to process existing raw data to create entirely new research outputs.
There are many reasons why you may wish to reuse data:
When reusing existing data, you must ensure that you use the data in the way that the data owner has specified. A method which authors and creators of outputs use to provide a clear, regulated method of providing permissions for reuse of works is through Creative Commons licenses. Those interested in reusing the data then follow the license conditions.
The following table visually explains Creative Commons Licenses and their permissions.
The following links will help you locate data repositories that may be of use to you.
Intellectual Property Rights (IPR) (e.g. copyright, patents, etc) affect the way both you and others can use your research outputs.
Failure to clarify rights at the start of the research process can lead to unexpected limitations to:
It can also cause you legal trouble.
Further information on IPR can be found on the University's IPR webpages
Frequently Asked Questions
Are research data or data derivatives protected by copyright law?
Copyright law sometimes protects data and other research products (provided that you share them with the proper copyright statement or end-user agreement), but it depends on the nature of your data or files.
The University has a Copyright webpage, which provides information and contacts for who to consult on copyright questions in various situations (e.g. research grants and funding, commercialisation and intellectual property, etc).
A seminar was held in 2011 (hosted by CRASHH and the Incremental Project). Andrew Charlesworth (Centre for IT & Law, University of Bristol) gave a presentation that addressed some copyright issues:
'Intellectual Property Rights and Research Data - Focus on copyright' [32 mins 6 secs].
He also participated in a short interview on the same subject [2 mins 34 secs].
What are my intellectual property rights with regard to research data at Coventry University?
This depends on whether you are a student, post-doc, PI/project director, your relationship with the University, your role in the project, and your agreement with other parties (funders, study participants, corporate partners, etc). Advice can be sought via the University's webpages and by contacting colleagues listed below, under the 'Who can help me with Copyright and IPR?' question.
Can I use materials that I find online?
It depends on how those materials are licensed. IPR is usually in play, even if you don't see a "©" or 'all rights reserved' notice. When in doubt, contact the University Copyright Officer (contact information in FAQ below) for advice, or ask the website administrator or publisher who distributed the content for permission directly.
The Web2Rights project has produced a useful IPR & Legal Issues Toolkit for the web.
How can I make it easier for others to re-use the materials that I produce?
One relatively simple way to make it easier for others to re-use tools, data, or other content that you produce is to add a Creative Commons license.
For example ‘By-Attribution, Non-Commercial’ is a common Creative Commons license – when you mark your file, image, or information with this, it means that anyone can use your information in any way they like, so long as they attribute it to you and don’t use it for commercial purposes. Creative Commons licenses are often used for materials released online, but you can also include these in printed materials if you don't have a publisher who owns the rights. For additional information and Creative Commons license options, visit the www.creativecommons.org.
To license something with a Creative Commons license, you don't need to file any paperwork -- just publish (in print or on the web) your materials along with a notification that you are using a particular license.
IMPORTANT NOTE: Creative Commons licenses are 'irrevocable' so don't add a Creative Commons license unless you are sure that (1) you have the right to publish this information, and (2) you won't want to re-voke it later on for any reason.
Who can help me with Copyright and IPR?
For information on Copyright, please contact:
Phil Brabban
University Librarian and Group Director of Learning Resources, LIB
Telephone: 024 7688 7519
E-mail: p.brabban@coventry.ac.uk
For general questions on IPR or to discuss Intellectual Property Disclosure Forms, contact:
Mandy Tipple
Business Development Support Office
Mobile: 07974 98 4387
E-mail: ipr@coventry.ac.uk
Director of IP Services
Brian More
Mobile: 07974 98 4928
E-mail: ipr@coventry.ac.uk
For questions touching on commercialisation, contact:
Tim Francis
IPR Commercialisation Manager
Mobile: 07557 42 5047
E-mail: ipr@coventry.ac.uk
What rights do other people have to request my work - i.e. Freedom of Information Act (FOI)?
The Freedom of Information Act of 2000 (FOIA) gives all members of the public the right to request any information produced with public money, but there are some exemptions.
For information about FOI at Coventry, see the CU FOI page.
Further Reading
Web2Rights IPR & Legal Issues Toolkit Information on intellectually property rights pertaining to Web 2.0 internet resources.
Alex Ball has created a presentation for the Digital Curation Centre/University of Bath on Derestricting Datasets: How to License Research Data
As members of a publicly-funded university, you may receive requests for information under the Freedom of Information Act 2000 (FOI) or Environmental Information Regulations 2004 (EIR).
Deadline
Once the University has received an information request, it has 20 working days to respond to an FOI request and up to 40 for an EIR request. Both FOI and EIR include a number of exemptions and exceptions respectively against disclosure. This is because the legislation recognised that not all official information ought to be disclosed. For example to protect information such as confidential, sensitive data or personal information. If you are unsure about disclosure, consult the University's FOI officer foia@coventry.ac.uk.
Article Processing Charge (APC) - Fee which may be payable to the publisher to publish via the gold open access route. When an article is published in a traditional subscription journal, the author pays an APC to make their individual article freely available from the journal website, without restriction or charge to the reader.
CC-BY Licence - Creative Commons Attribution Licence. This is the most liberal of the CC licences. As long as the original author(s) receives attribution, this allows anyone to copy, distribute or transmit the research, adapt the research and make commercial use of the research. RCUK requires this licence is used if the gold open access route is selected. The Wellcome Trust encourage its use, and will cover the costs of any APC where an article is published under this licence.
Corresponding Authors - The author responsible for manuscript correction, correspondence during submission, handling of revisions and re-submission of the revised manuscript. On acceptance of the manuscript, the corresponding author is responsible for co-ordinating any application for payment of a Gold Open Access Article Processing Charge (APC).
Creative Commons Licences - Creative Commons licences can be used in open access publishing to help authors retain copyright while allowing others to copy, distribute, and make use of their work. There are several different Creative Commons licences, which allow different types of re-use. See the Creative Commons website.
Curve open - CURVE Open is the University's repository for educational resources and open access items other than research publications. The aim of this open access institutional repository is to showcase University research and teaching, increasing accessibility to, and raising the visibility of our authors work.
Open Access - Open access is the practice of providing free, unlimited online access to scholarly works and research outputs in a digital format, with limited restrictions on re-use. A key driver behind OA has been to make publicly-funded research accessible to tax-payers.
FL320, Lanchester Library
Coventry University
Frederick Lanchester Building
Gosford Street
Coventry, United Kingdom
CV1 5DD
Open Access and Pure Deposits - oa.lib@coventry.ac.uk
Research Data Management - rdm.lib@coventry.ac.uk
024 7765 7568
We hold regular drop-ins at all of our Research Centres. To find the next one, please see the full calendar of our drop-ins.