In this section

Choosing file formats

As a general rule of thumb, while actively working with a dataset you should use whichever file format best suits the way you work. In most cases, this will be dictated by the software that you prefer to use. If you have some flexibility, perhaps because your software supports several formats or you are writing your own software, consider using an archive-suitable formats described below.

When you have finished working with a particular dataset, you should transform it to a more stable, standard format for archive. It is increasingly common to find old files which are completely unreadable now, just because the software that created them is no longer available.

Ideally, your archival format should be at least one of:

readable using free tools (ideally plain text): so it can be accessed without a potentially-expensive license
a well-documented standard: so a wide variety of software is available to access it
a de facto standard in your research area: so the majority of researchers you share it with can be expected to have access to the right software

If possible, try to choose a format that allows you to describe and document the data directly within the file.

Examples of file formats
Category	Formats	Comments
Text	Plain text, HTML, Rich Text Format, Markdown/RST/Textile/etc.
	PDF/A	Only use for scans or if page layout is critical
Tabular/numeric	Comma-/Tab-Separated Values, XML	Human-readable with just a text editor
	NetCDF, HDF5, FITS	Particularly good for complex or hierarchical data structures, and embedding metadata
Images	TIFF, PNG, JPEG2000	Avoid GIF and standard JPEG
Movies	MP4, Ogg Video	Prefer open codecs wherever possible
Sound	FLAC, Ogg Audio	Prefer open codecs wherever possible
See more examples from the UK Data Service