figshare help

Migrate records with administrative batch management

Institutions can use the batch management tool to migrate repository items into their Figshare repository. Items can be linked to user accounts and uploaded to a specific group. Mapping and formatting metadata from an old system to the Figshare system is usually the most time consuming part. This help page provides some guidance on this process as every institution will have unique migration needs.

IMPORTANT NOTE: Please ensure that the metadata that you intend to publish using batch management is thoroughly tested on your stage environment, to prevent the need to make changes after initial publication. Before publishing and/or updating items using batch management, it is very important to be aware of changes to metadata that cause item versioning. Figshare does not support removal of versions. While admin users can unpublish individual items (in order to update and republish without creating a new version), removal of versions in batch is not supported. In the event that you unexpectedly create multiple versions via batch management, our Support Team will charge for assisting to remove versions in batch, and the timeframe for carrying out the work cannot be considered as urgent.

Getting Started

Migrating to different groups

User accounts and author linking

Adding funding information

Adding files

Adding related materials

Add existing DOIs or Handles

A possible migration workflow

Getting Started

At this point you should have your repository set up with the groups and custom metadata fields you will need for the migration. We advise using your stage instance to create dummy draft items in each group that you will be migrating into. In these dummy items, fill in any custom metadata fields, add categories, keywords, funding, and create some examples of embargo files. You can use the batch management tool to download the metadata from these dummy items to use as a template for uploading.

Three other pieces of advice: 

  1. Ask your Implementation Manager or send a support ticket to enable the existing DOI/Handle upload for both Stage and Production
  2. Ask your Implementation Manager or send a support ticket to temporarily reduce the wait time between batch uploads while you are conducting tests and migration (by default this is set to 60 minutes)
  3. Carefully check formatting in your CSV file before uploading- especially date formats as some spreadsheet programs change these automatically

Migrating to different groups

You may want to use separate CSV files to upload items by group rather than migrating items to different groups using one CSV file. This will make it easier to make sure the metadata in the CSV is formatted correctly and that the custom fields are filled out properly.

You can get a list of your group ids from this endpoint: https://docs.figshare.com/#private_institution_groups_list (Make sure you create an API token from a top level administrator account and paste it in the top left field on that API documentation page.)

Put the group id in the group_id column in the CSV and items will automatically be associated with that group. 

User accounts and author linking

TL;DR: use account_id to put records in an author’s repository account so they can edit the record. Use user_id to affiliate a specific author with a record whether they own the record or not.

In Figshare, there can be two ids associated with a researcher. If a researcher has an account in your repository (whether created by SSO, HRFeed, or manually) they will have an account id. If a researcher is listed as an author on a repository item, whether they have an account or not, they will have a user id. You can use the account id to give edit access to a researcher for migrated items. A researcher with an account will also have a profile. But please note that the profile page uses the person’s user id since that is how they are associated to records, whether they ‘own’ the record or not. You can use the user id when uploading metadata to make sure items show up in researcher profiles. This also enables better reporting because it reduces duplicated author names across repository items.

Migrating items into researcher accounts

If you want researchers to have edit access to an item, you need to put their account id in the account_id column in the CSV. If you do not add an account_id, the item will be uploaded to the administrator account that is running the batch upload. 

You will need to have the researcher accounts created before migration in order to get the account_id (to provide edit access), and the user_id (to link them as an author if needed). The best way to do this is create the accounts manually through the API and add the SSO id in the “institution_user_id”. You can also create accounts through an HRfeed or, once your repository is launched, you can ask researchers to login through SSO which will automatically create their account. You can then retrieve the account_id and user_id as needed.

You can see the account_ids either in the User Report or from this API endpoint: https://docs.figshare.com/#private_institution_accounts_list.

Connecting migrated items to author accounts and profiles

You could upload author names as ‘first name’ and ‘last name’ for each item but this is not recommended for existing authors. Each first/last name combination will receive its own database entry - lots of duplicates! -  and will make reporting more difficult than it needs to be.

To connect an item with an existing author account, simply add the author to the CSV item using the user_id. You can see the user_ids either in the User Report or from this API endpoint: https://docs.figshare.com/#private_institution_accounts_list. Add the authors in the ‘authors’ column in this format: 

[{"id": 1438453}, {"id": 1438451}, {"id": 701402}, {"id": 1438455}]

If you add any other data (like “first name”) after the id value, it will be ignored.  Adding authors in this way will automatically connect the item to the author information stored in the database including ORCID and CRIS/RIM id. The item will show up in each author’s profile if they have an account in any Figshare powered repository.

Important Notes:

If your repository will be integrated with a CRIS/RIM system like Symplectic Elements, you will need to add authors using the user_id so that the item can be harvested into the CRIS/RIM system properly.

If you want an item in your repository to show up in a user account in another Figshare repository, like in figshare.com or at another university, you will need to get that author’s user id from their profile page. This person’s user id is the number at the end of the profile URL: https://figshare.com/authors/_/473204.

Adding funding information

You can add funding information as free text by including the grant name in the funding column using this format:

[{"title": "My grant 1"},{"title": "My grant 2"}]

You can also link grants from large funders to the grant item in the Dimensions database. At this time, this is a two step process. You need to find the id for the grant in the Figshare system using this API endpoint: https://docs.figshare.com/#private_funding_search (you’ll need to add your API token to the field in the top left). If you find the grant, add the id to the funding column like this:

[{"id": 9621728},{"id": 3058082}]

The two grant items for those ids are pictured below.

Adding related materials

Links to related materials, like a published paper, dataset, or different version of a paper, are added in the related_materials column. As with authors and funding, the content needs to be formatted as JSON. This is an example:

[{"identifier": "10.1038/s41550-020-1208-y", "title": "The ecological impact of high-performance computing in astrophysics", "identifier_type": "DOI", "relation": "IsSupplementTo", "is_linkout": 1}]

The ‘identifier_type’ field is sourced from DataCite’s RelatedIdentifier list. The ‘relation’ field is sourced from DataCite’s list of relation types. The ‘is_linkout’ field determines if the linked title shows up in a call-out box on the record page and can take the value 0 (zero) or 1 (one). Up to five links can be shown as call-out boxes.

Adding files

Files need to be available to the Figshare system whether from a web server, another service like Dropbox, or an FTP location. Add a column called ‘files’ to the CSV and add file URLs like this:

["https://journals.aqs.org/pdf/10.1103","ftp://mirror.easyname.at/ubuntu-releases/robots.txt"]

This would upload two files, each from a different location, into one item. Notice that the second file is coming from an FTP server. Ideally, the files are already publicly available through your legacy repository and you can use the URLs from there.

Add existing DOIs or Handles

You may want to upload items that already have a DOI or a Handle. Add the DOI or Handle in the appropriate column without the URL information (e.g. 10.1636/P10-15.1).

A possible migration workflow

Every institution will have unique migration needs. Please use this as just an example. This workflow assumes the records will be migrated into one administrator account for editing but the items are linked to existing researcher user_ids.

  1. Create custom metadata fields and create dummy records in your stage instance.
  2. Set up researcher accounts manually or through HRfeed or SSO login.
  3. Use batch management to download the metadata from your stage instance. The resulting CSV is the template for upload. Note that private link is ignored when uploading.
  4. Create a mapping for your current metadata to the metadata fields in your new Figshare repository. You may need to combine some fields or split some fields.
  5. Set up a way to format the metadata to match the CSV downloaded via batch management. For example, use a spreadsheet program or script.
  6. Format the author names for upload: either add the user_ids or split the names to first and last name with all the right formatting (you can use the concatenation feature in spreadsheet programs to format). You may need to do this in a separate file and then add the formatted author info to the ‘authors’ column.
  7. If you are also adding funding ids, add the formatted values to the ‘funding’ column.
  8. Transfer your metadata using your mapping into a new CSV.
  9. Add a column titled ‘files’ and add the public URL or FTP location for the file(s) belonging to each metadata record. Follow the formatting requirements!
  10. Split your metadata file into separate CSVs by group. This way you can double check group specific custom metadata fields, categories, etc. 
    • Fill in group id in each group CSV under the group_id column
  11. Test migration in Stage using a subset of each group’s records (10 to 100 records)
    Important: When uploading the metadata CSV file:
    • Check all the date formatting before saving the CSV as your spreadsheet program may change the formats. They should all be YYYY-MM-DD. 
    • If you have special characters in your content, save the file as UTF-8. For example, in Excel choose the option 'CSV UTF-8 (comma delimited)'
  12. Check the records and make any adjustments. Retest if needed.
    • It might be helpful to use this Jupyter Notebook to delete a list of records from an admin account to start with a clean slate each time
  13. Migrate records to Production
    • Going group by group, upload the CSV file and publish the records

Share this article: