What is data documentation?
Data documentation provides the contextual information needed to discover, understand, access and reuse research data. Without this information it may be impossible for future users, including yourself, to understand your data
Why document data?
Data documentation is essential for the reproducibility and replication of research findings and the re-analysis of data. Ensuring that data are adequately documented and described supports research transparency and facilitates data sharing and reuse. Documenting your data also minimises the risk of your data being misused or misinterpreted.
What information should I include?
You should begin documenting your data at the beginning of your project and continue adding information as you go. It is much easier to capture information as the project progresses than trying to remember what you have done at a later date.
What information should I include?
Data can be described at different levels:
Project-level documentation
Project-level documentation provides information about the aims of the study, what the research questions were, methods of data collection, instruments used, how the data were processed, who collected the data and when, and how the data can be accessed.
File-level documentation
File-level documentation provides descriptions of the contents of a folder or dataset including details of data types, file formats used, and relations between files contained in the folder or dataset. A README.txt file is a form of documentation commonly used for this purpose.
Variable-level documentation
Variable-level documentation provides definitions and explanations of variables, values, units of measurement, missing values and any other codes or abbreviations used. This information can be embedded within a data file or documented separately as a data dictionary or codebook or included within a README file.
Source: MANTRA (Research Data Management Training): Documentation and metadata
How should I document my data?
Examples of documents that can used to describe your data (and software) include:
- Research/laboratory notebooks: Notebooks (physical or digital) provide a structured environment to organize your research data, notes, and observations. Further information can be found on our web page Electronic laboratory Notebooks (ELN)
- README files: Simple text files which can be used to provide information about the organization and content of your data files. Cornell University have published a Guide to writing "readme" style metadata which includes a README file template that you can download and use to document your data.
- Data dictionaries/codebooks: A data dictionary is a structured document detailing variable names, data types, units of measurement, and potential values. The Centre for Open Science provide guidance on How to make a data dictionary.
- Code comments: Human-readable explanatory notes or annotations added to code to provide additional information about the code's functionality, purpose, and logic.
- Other supporting documentation: Documents generated during the research project which might provide additional context to enhance data intelligibility and reuse (e.g. protocols, survey tools, blank consent forms, participant information sheets)
Additional information on how to document and describe research software can be found on our web page: Making research software open and shareable
What is metadata?
Metadata is commonly defined as ‘information about data’. Metadata and documentation are sometimes used interchangeably but metadata is also used to describe information that is structured and machine readable. Some research communities make use of domain specific metadata standards.
You can find examples of disciplinary metadata standards and other community standards (e.g. ontologies, vocabularies) on these websites:
- Metadata Standards Catalog – A directory of metadata standards for research data
- FAIRsharing.org – A registry of standards, databases and other resources