Creating Your Data

There are many decisions to make about managing your data before you even start creating/collecting it, including choosing hardware and software, and addressing issues with intellectual property rights and ethics. Decisions made at the beginning will affect how you can access, use, or preserve your data in the future.

Research data can exist in many forms, dependent on research area / discipline, including:

Documents (text, Word), spreadsheets
Laboratory notebooks, field notebooks, diaries
Questionnaires, transcripts, codebooks
Audiotapes, videotapes
Photographs, films
Test responses
Results from experiments
Slides, artifacts, specimens, samples
Artistic works, including dance, music performances, recordings,
Sculpture, painting, design
Collection of digital objects acquired and generated during the process of research
Database contents (video, audio, text, images)
Models, algorithms, scripts
Contents of an application (input, output, log files for analysis software, simulation software, schemas)
Methodologies, workflows
Standard operating procedures and protocols

(list adapted from Leeds University)

Choosing Formats

In planning a research project, it’s important that you consider which file formats you will use to store your data. In some cases, this will be dictated by the software you’re using or the conventions of your discipline, but in other cases you may have to make a choice between several options:

what software and formats you or colleagues have used in past projects.
any discipline-specific norms (and any peer support that comes with them).
what formats will be easiest to share with colleagues for future projects.
what formats are at risk of obsolescence, because of new versions or their dependence on particular software.
what formats it will be possible to open and read in the future.
what formats will be easiest to annotate with metadata so that you and others can interpret them days, months, or years in the future.

What formats are best for preserving files in the long term?

Popular formats such as those produced by Microsoft Office products (e.g. Word documents or Excel spreadsheets) are likely to have reasonable longevity, but be aware that they are proprietary (owned by someone) and so will not necessarily exist forever or remain easily readable. You may be better off storing important information in open, non-proprietary formats – for example, PDF/A rather than Microsoft Word, CSV rather than Excel, TIFF rather than Photoshop files, or as XML rather than a database.

What image format should I use?

Some images formats are better for particular purposes than others. For example, TIFFs preserve digital image information well, but users cannot view them with internet browsers and they take up a lot of computer storage space. Taking this into consideration, TIFF image files would make suitable master copies for archival purposes, particularly if the image content is important. For smaller images which are to be used for web delivery and for embedding in documents, JPEG format is suitable. JPEGs are compressed using 'lossy', which keeps the files from being too large. Each time a particular JPEG image is compressed, it loses some of its information, so over time, the image becomes blurry. This process means that JPEGs are not considered for archival processes.

The link below directs to the Digital Preservation Coalition's Handbook, which provides useful information on all aspects of digital preservation.

Digital Preservation Handbook - File formats and standards
A chapter on file formats and standards, from the Digital Preservation Handbook, by the Digital Preservation Coalition.
Citation: Digital Preservation Handbook, 2nd Edition, http://handbook.dpconline.org/ Digital Preservation Coalition © 2015 licensed under the Open Government Licence v3.0.

Organising your data

Once you create, gather, or start manipulating data and files, they can quickly become disorganised. To save time and prevent errors later on, you and your colleagues should decide how you will name and structure files and folder. Including documentation (or 'metadata') will allow you to add context to your data so that you and others can understand it in the short, medium, and long-term. Good metadata should be both computer and human-readable.

Naming and Organising Files

Agreeing on a logical and consistent naming convention at the beginning of your project will make it easier to find and correctly identify your files, prevent version control problems when working on files collaboratively, and generally prevent errors in research. Organising your files carefully will save you time and frustration and prevent duplication or errors by helping you and your colleagues find what you need when you need it.

Use folders ‐ group files within folders so information on a particular topic is located in one place.
Adhere to existing procedures ‐ check for established approaches in your team or department which you can adopt.
Name folders appropriately ‐ name folders after the areas of work to which they relate and not after individual researchers or students. This avoids confusion in shared workspaces if a member of staff leaves, and makes the file system easier for new staff or subsequent projects to navigate.
Be consistent - When developing a naming scheme for your folders it’s important that once you’ve decided on a method, you stick to it. If you can, try to agree on a naming scheme from the outset of your research project.
Structure folders hierarchically ‐ start with a limited number of folders for the broader topics, and then create more specific folders within these.
Separate ongoing and completed work ‐ As you start to amass lots of folders and files, it’s a good idea to start thinking about separating your old documents, from those you are currently working on.
Try to keep your ‘My Documents’ folder for files you're actively working on, and every month or so, move the files you're no longer working on to a different folder or location, such as a folder on your desktop, a special archive folder or an external hard drive.
Backup - Ensure that your files, whether they are on your local drive or on a network drive, are backed up.
Review records ‐ assess materials regularly or at the end of a project to ensure files aren’t kept needlessly. Put a reminder in your calendar so you don't forget!

What do I need to consider when creating a file name?

It is useful if your department/project agrees on the following elements of a file name:

Vocabulary – choose a standard vocabulary for file names, so that everyone uses a common language.
Punctuation – decide on conventions for if and when to use punctuation symbols, capitals, hyphens and spaces.
Dates – agree on a logical use of dates so that they display chronologically i.e. YYYY-MM-DD. Order - confirm which element should go first, so that files on the same theme are listed together and can therefore be found easily.
Numbers – specify the amount of digits that will be used in numbering so that files are listed numerically e.g. 01, 002, etc.

How should I name my files, so that I know which document is the most recent version?

Very few documents are drafted by one person in one sitting. More often there will be several people involved in the process and it will occur over an extended period of time. Without proper controls this can quickly lead to confusion as to which version is the most recent. Here is a suggestion of one way to avoid this happening:

Use a 'revision' numbering system. Any major changes to a file can be indicated by whole numbers, for example, v1 would be the first version, v2 the second version. Minor changes can be indicated by increasing the decimal figure for example, v1.01 indicates a minor change has been made to the first version, and v3.01 a minor change has been made to the third version.

When draft documents are sent out for amendment, they should return carry additional information to identify the individual who has made the amendments. Example: a file with the name 20100816_dataman_v1_sj indicates that a colleague (sj) has made amendments to the first version on the 16th August 2010. The lead author would then add those amendments to version v1 and rename the file following the revision numbering system.

Include a 'version control table' each important document, noting changes and their dates alongside the appropriate version number of the document. If helpful, you can include the file names themselves along with (or instead of) the version number.

Agree who will finalise documents, marking them as 'final.'

What are the benefits of sharing my data?

Many researchers fear that by sharing their data they will lose their competitive edge, that others will misinterpret or misuse their data or that their research methods will be open to scrutiny. However, there also benefits to be gained though sharing your data. For example it:

Allows independent validation of results
Increases the impact and visibility of research makes best use of investment by avoiding replication
leads to new collaborations and partnerships
advances research when datasets are combined in new and innovative ways

If you plan for data sharing from the beginning of your project, you can decide on a method of providing access that you are comfortable with.

Are there occasions when I shouldn’t share my data?

Issues of intellectual property rights, commercial potential or of privacy can all affect whether you can or should share your data.

Sensitive and confidential data can, however, often be shared ethically if informed consent for data sharing has been given, subjects' identities are anonymised (if needed) or consideration is given to access restrictions.

These measures should be planned from the beginning of your research to ensure that you are not limiting future opportunities to share your data.

The UK Data Archive has an excellent guide on consent, confidentiality and ethics as part of their Managing and Sharing Data guide, and they provide brief guidance and tool reommendations for Anonymisation.

Sharing data with collaborators

Please note: The University does not authorise or approve the use of DropBox. It should never be used for confidential, personal or sensitive data.

Digital Services recommend Microsoft Teams as a secure space for data(set) and document storage, and as an online location to interact with colleagues. For example, a Teams site can help you:

Coordinate projects, calendars and schedules.
Discuss ideas, and review documents or proposals.
Store and share research data with international collaborators

Contact Digital Services for advice on using the service to safely collaborate with research partners.

Eduroam
Eduroam (education roaming) provides university members with secure wireless internet access at other participating institutions worldwide (including institutions in Europe, Asia-Pacific, and North America).
UK Data Service Impact Case Studies
Real examples of impact achieved through sharing data in a range of sectors

Contact Us

📍 Where to find us:

Lanchester Library
Coventry University
Frederick Lanchester Building
Gosford Street
Coventry, United Kingdom
CV1 5DD

📞 Phone:

024 7765 7568

✉️ Email:

oa.lib@coventry.ac.uk
rdm.lib@coventry.ac.uk

Follow us on Twitter:

   @CoventryRSP
   @CUPublications
   @CovOpenPress

Deposit in Pure

RSP - RDM