Preferred/accepted data formats for submission to the repository

Type of data Preferred formats Other accepted formats
Quantitative tabular data,
Databases
Tab, comma or column delimited text file (*.csv), (*.tab), (*.txt) with additional setup file with data definitions.
Self-describing formats such as JSON, structured texts or mark-up files with metadata such as JSON, Structured text or mark-up file with metadata such as *.xml.
DDI-XML Datei
OpenDocument- Spreadsheets

  • MS Access (*.mdb , *.accdb)
  • MS Excel (*.xls, *.xlsx)
  • SPPS (*.por, *.sav)
  • STATA (*.dta)
  • SAS (*.sas, *.sas7bdat)
  • Syntax files (*.sps)
  • dBase (*.dbf, *.ods)
  • Column Binary-Format
  • Geospatial data Formats supported by good open source software libraries such as GDAL, OGR und GeoTools

  • ESRI Shapefile (*.shp, *.shx, *.dbf *.sbn)
  • Georeferenced TIFF (*.tif,*.tfw)
  • GIS Attribut table
  • MapInfo interchange format (*.mif) for vector data
    Qualitative text data
  • PDF/A (*.pdf)
  • eXtensible Mark-up Language XML with definition DTD or Schema (*.xml)
  • Rich Text Format (*.rtf)
  • Text file [Unicode, UTF-8] (*.txt)
  • Hypertext Mark-up Language (HTML, HTMLbook)

  • MSWord (*.doc, *.docx)
  • OpenDocument Text (*.odt)
  • WordPerfect (*.wpd, *.cwp, *.vwp)
  • HTML (*.htm, *.html)
  • Image data TIFF version 6 uncompressed files (*.tif)
    Portable Document Format PDF: only archive format (PDF/A-1, A-2, A-3)!
  • JPEG (*.jpeg, *.jpg)
  • TIFF (other versions *.tif,*.tiff)
  • JPEG 2000 (*.jp2)

  • older PDF-files – but not older then version 5
    Digital Audio Data
  • Free Lossless Audio Codec FLAC (*.flac)
  • MPEG-1 Audio Layer 3 (*.mp3 only for spoken word)
  • Audio Interchange File AIFF (*.aif)
  • Waveform Audio WAV (*.wav, *.ogg)
  • Digital Video Data
  • MPEG-4 High Profile (*.mp4)
  • Motion JPEG 2000 (*.jp2)
  • JPEG2000 (*.mj2)
  • Documentations and Skripts
  • Richt Text Format (.rtf)
  • Open Document Text (*.odt)
  • HTML (*.htm, *.html)
  • Plain text (*.txt)
  • Portable Document Format PDF: only archive format PDF/A

  • MSWord (*.doc, *.docx)
  • MSExcel (*.xls, *.xlsx)
  • XML marked-up text with corresponding DTD schema
    older PDF-files – but not older then version 5

    The choice of suitable file formats is a very important criterion to ensure interpretability and usability of the data over time.
    The repository recommends using preferred formats whenever possible. Other formats are accepted, but the readability and usability of these formats is particularly threatened by changes in the hardware and software environments. The SADAR team cannot guarantee the reusability of data that is not delivered in our preferred formats. In addition, it is possible to convert specific data formats into preferred formats - after consultation with the data producers. This list is not comprehensive and can be expanded with additional formats after consultation with the data producers.

    Regardless of data formats, it is important to ensure that your research datasets are submitted to the repository in an organized manner and with sufficient documentation to ensure that third parties can interpret and reuse them.

    Administration: SADAR Repository - University and State Library of Saxony-Anhalt - version 1.2 – Status: January 2024