The document describes the way in which Figshare stores content and can push files to Arkivum storage for preservation. This operation is performed to ensure the ownership and longer term preservation of files for Figshare’s Institutional Partners, even in exceptional situations such as service outages or Figshare ceasing to exist. An optional workflow, which evicts files from the Figshare storage is also described.
Files uploaded to Figshare follow a mostly linear workflow with the following steps:
The preview of the file is being generated and is stored on a separate Figshare storage instance.
The file is copied from the temporary storage to its final storage location, which can be either supplied by Figshare (e.g., Amazon Simple Storage System) or the institution. For the purposes of this document we will denote this as the figshare final storage, no matter the type of the actual technical implementation.
5. The file is being mirrored on one or more 3rd-party storage solutions. For the purpose of this document we will assume that Arkivum is the chosen solution and that this step is mandatory, regardless of it being optional in the generic Figshare implementation.
All these steps are being carried out by a separate Figshare system, in charge of running asynchronous tasks. Its functioning is being monitored and logged, failed operations being retried in order to ensure workflow completion; the file model entry in Figshare’s database also includes the current state in the workflow.
The steps above are summarised in Fig. 1.
Fig. 1: The Figshare file lifecycle
When an Arkivum integration is being employed, a number of extra operations need to be carried out in order to ensure correct functioning. Namely, Figshare needs to perform the following Arkivum REST API calls:
Given the successful completion of these steps, Arkivum can ensure the availability and integrity of the files, independent of the state of the file on Figshare’s final storage.
Please note: there may be once-off fees for setting up an eviction process.
Once a file is fully replicated by Arkivum (green status on) it can be evicted from both the local appliance and the Figshare final storage, if required. One use-case for this is being able to store more data on the Figshare storage than what is actually being contracted.
Such files can be evicted using a least recently used (LRU) strategy. This works as follows:
When choosing the halting condition, the impact on end users needs to be considered. Namely, if a file is not present on the Figshare final storage, when a user requests it for download from an item page the following operations need to be carried out in the background before the file can actually be downloaded:
Once these are completed, the file can be immediately downloaded and it is marked with the current timestamp in order to ensure the correct functioning of the LRU algorithm. The process can take from a few minutes up to a few hours, depending on the size of the file(s). This is additionally dependent on various factors, such as source internet speed and other network conditions.
If a user requests a file which is not present in the Figshare storage, the following message will be displayed: "The file/s for this item are located in archival storage. The download will take some time to retrieve, please check back later.". The user can close the browser window and check at a later time if the file(s) are available for download.
The workflow above applies when the institution chooses to use Figshare storage (usually Amazon Simple Storage Service) or its own local storage infrastructure, but not Arkivum (i.e. use Arkivum as both the final repository storage and the preservation solution). If Arkivum is employed also as Figshare storage, the second step in the retrieval workflow above does not apply anymore; that means that after the file is copied from the Arkivum datacenter to the local appliance, Figshare will redirect any download request to the appliance directly.
In order to implement any of the workflows above, the institution must provide Figshare with the following:
The points above can be agreed on during, or at any point after, the implementation process of Figshare for Institutions is complete, preferably after the storage decisions are finalised.
Share this article: