Digital Archive

This is a follow up blog post for Digital Data in Environmental Archaeology 1: Preservation.

This is a short account of the reasons why I think that environmental archaeology data should be stored and disseminated as open data (i.e. data that is freely available in accessible formats, usually digital, under licences that allow it to be re-used). I’ve provided an outline of some of the methods that I have used below.

Research is a process that builds on the results of the past, and in the case of environmental archaeology it can often be a useful process to incorporate results from many different sites into one larger dataset, and to analyse this to see if new patterns and insights emerge. To move the study of environmental archaeology forward, I think it is important to ensure that results are stored and disseminated in a way that allows other researchers to re-use data.

Making data accessible

If a researcher wants to re-use archaeobotanical data from one of my reports, no doubt they could re-type all the information that is available in printed formats or in PDFs. But it would be much better if the data was made available digitally. Much of the raw data in environmental archaeology (certainly in archaeobotany) is prepared in spreadsheets and I have spreadsheets that date back to 1998. How long will I be able to access these using more modern software packages? And is it realistic to expect me to convert and update the files each time there is a new iteration of spreadsheet software?

Fortunately many software packages have some built in backwards compatibility. The best way to ensure that the data in my spreadsheets (and in databases) is readable into the future is actually to convert it into a very old format, a .csv file. Comma Separated Value files (.csv) provide a very simple means of structuring data. CSV is a de facto standard for saving tabular data and it supported by a huge number of applications. This means that if you save your tabular data as a .csv file, most programmes will be able to access the data (and the more accessible your data, the more likely it is to be preserved into the future).

For more details on .csv formats, see http://data.okfn.org/doc/csv

How to convert your spreadsheet to a .csv file

The easiest way to save your data in. csv format is to open your preferred spreadsheet application, click on “Save as” and scroll down the list of options until you find .csv. This file should contain all your basic data, organised simply and clearly (leave pie-charts out). It should be kept as the preservation copy of your data.

N.B. Preserving text files is different. Save your report as a .pdf, as this is a relatively stable and supported format. For added accessibility it is a good idea to save text as .txt files (go to “Save as” and select the .txt option). This will preserve the text but won’t preserve any added graphs and images, and it won’t preserve formatting.

Licensing your data so that it is available for re-use

Open data is distributed so that it can be re-used. This usually means publishing your data under an open licence, such as one of the Creative Commons licences. These are licences that provide an extension to copyright, allowing you to give permission in advance for people to re-use your material, and allowing you to stipulate the conditions under which this re-use can take place. Creative Commons offer several different ways for you to share your material, from a completely open licence (CC-0) to more restrictive licences that stipulate that the material must be cited as your original content (CC-By).

For more details about Creative Commons licences, see http://creativecommons.org/licenses/.

How to assign an open licence to your work

If you use repositories such as Zenodo or Figshare the service asks you to assign a licence to your material as part of the upload process. Alternatively, you can download the appropriate text and HTML code for each licence from the Creative Commons website (http://creativecommons.org/choose/).

About the author

Penny Johnston is an archaeobotanist with an interest in digital data and preservation. She has her own blog (http://archbotarchive.blogspot.ie/) about her digital archiving practices/experiments, but this, like the archive, has languished somewhat over the past year or so because of time constraints. However, there is a lots of information there about archaeobotanical remains from Cork, and these are all disseminated online in accessible and open formats, using Creative Commons licences.

This is a short blog post about preserving digital data, from the perspective of an environmental archaeologist. There is a follow up post about open data in environmental archaeology. All of this comes from my personal experience of creating, sharing and trying to preserve digital data.

This post was originally published on 30 September 2015. It was updated on 5 October 2015 to include links to a follow up post.

Preserving data

For many years I worked on archaeobotanical material from Irish excavations. I identified and counted seeds, and presented my results in a table at the end of a technical report. The results were usually prepared so that they could be presented as appendices in excavation reports, and they were supposed to be printed and read as hard copy reports. Even when the reports were digital, they were usually a digital version that mimicked the paper report, e.g. a pdf, with the look and the format of the printed page preserved.

(N.B. This is not the best way to preserve data! That’s because it makes it difficult for others to re-use or manipulate the results. For details about how to make environmental archaeology data open, see Digital Data in Environmental Archaeology 2.)

Over the years I have moved house and changed jobs and, in the meantime, methods of storage of digital data changed (all my backups for my work in 2002 were on floppy disc). I lost the digital versions of a few reports, and some files became corrupted. This is why paper is still the preferred preservation medium for lots of different data types.

“Born-digital data are in most danger of being lost to future generations” (O’Carroll and Webb, 2012, 8).

I started to worry about preserving my digital data. I began to adopt a preservation policy that involved the principle of LOCKSS (Lots Of Copies Keeps Stuff Safe).

Using repositories

One way to do make multiple copies of your data is to disseminate it online. But even when you upload a report or a dataset online you can’t ensure that the platform that you upload to will continue hosting your data forever. This is a problem across research institutions, and it has led to a call for the development of reliable repositories (with the resources to sustain data in the long term) and a system of Persistent Identifiers (PIDs) or handles.

The easiest way to assign a PID to your dataset is to upload it to a trusted repository. These will keep multiple copies of your data on their servers. There are a handful of trusted repositories for archaeological data, and a review of these is available on the website of the meta journal, Journal of Open Archaeology Data.

I have used both Figshare and Zenodo to upload my data (these are both trusted repositories that offer free services). The repositories assign a PID to the files, and this also means that it is easy for someone else to reference your work and acknowledge your contribution, as the repository generates a citation for the data (for example, one of my datasets that has been uploaded to Figshare is cited as: Johnston, Penny (2014). Plant remains data from Derrybane 2, Tipperary Ireland. Figshare http://dx.doi.org/10.6084/m9.figshare.1080723).

For more information on PIDs, see http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/persistent-identifiers

Using these services not only provides me with a step towards digital preservation, but it also means that it is much easier for me to share my data with other researchers. Making data open and accessible so that others can re-use it is the topic of my next blog post.

References

O’Carroll, A., & Webb, S. (2012). Digital archiving in Ireland: national survey of the humanities and social sciences. National University of Ireland Maynooth. (See http://dri.ie/digital-archiving-in-ireland-2012.pdf).

About the author

Penny Johnston is a PhD candidate in UCC. She is interested in digital preservation, and has started to archive her back catalogue of archaeobotany reports online, so that others can re-use her work without having to ask her permission.

EAI

Promoting the Practice of Environmental Archaeology in Ireland

Tag Archives: Digital Archive

Digital Data in Environmental Archaeology 2 – Open Data

Making data accessible

How to convert your spreadsheet to a .csv file

Licensing your data so that it is available for re-use

How to assign an open licence to your work

About the author

Digital Data in Environmental Archaeology 1: Preservation

Preserving data

Using repositories

References

About the author

Penny Johnston is a PhD candidate in UCC. She is interested in digital preservation, and has started to archive her back catalogue of archaeobotany reports online, so that others can re-use her work without having to ask her permission.