StatusThis document has been SUPERSEDED by Loading data into ReDBox via Alerts IntroductionIn some cases you may have collection descriptions in RIF-CS format that you'd like to "drag back" into ReDBox. One reason that this may be a requirement for you is that your organisation manually added/created RIF-CS collections to Research Data Australia. This How-to describes how to load the collection metadata into ReDBox. Importantly, this how-to focusses on a one-off process and assumes that you won't need to do this on an ongoing basis. If you need to have a permanent RIF-CS import, you can use this process or you may wish to consider the OAI-PMH harvester plugin. This document won't really help you with this last option. There are a few considerations that you'll need to concern yourself with when planning an import:
The system described here can be adapted to Mint for importing Service, Activity and Party records. However, be mindful that these records often have a source of truth (such as an HR system) and it may be better to add identifiers to those records rather than via an import. Naturally, the number of records that need importing will help you determine how much time you should spend trying to automate a process. One final note, this How-to is all about IMPORTING and not UPDATING. If you want to have another system update an existing ReDBox record then you're up for a lot more work. Background reading
Key files and folders
ProcedureBefore you start this procedure, please make sure you have ReDBox running. 1. Obtain the RIF-CS metadataYou'll need to get the metadata either via the ANDS ORCA system or from an OAI-PMH feed. The first option is easiest: (thanks to Grant from Flinders for providing the details below) Assumptions:
Procedure:
2. Break up the RIF-CS fileThe downloaded file will contain several collection records so you'll need to break it up into 1 collection per file. The python script below should do this for you (but you'll want to check it first). You'll also notice that the file will break out the different RIF-CS object types - this is handy if you have a mix in the XML file. Don't forget that you only want to import COLLECTIONS into ReDBox. A utility script for splitting up the file is provided at https://github.com/redbox-mint-contrib/config-samples/blob/master/util/rifsplit.py:
Once you've run the script across your XML export you should have a series of files in the "output/" directory. Each filename is prefixed with the object type and you only need those with "collection_". 3. Configure the alerts systemReDBox contains an alerts system for handling incoming metadata that's not coming from something like the web forms or another harvest plugin. In a nutshell:
IMPORTANT NOTE: The filename used for the import file is IMPORTANT and must be UNIQUE. This means that if you (lazily) try to use "import.xml" for every file you import, it will work once and then just disappoint you on a repeating basis. Why? Well, the filename is used to create a identifier in ReDBox and you'l also be unable to keep a record of processed files if they all have the same name. The alerts system is configured in the home/system-config.json file. The following two subsections are of interest here:
The houseKeeping section configures the house keeping jobs - basically it wakes up and certain times and runs a set of tasks. In my config, I have the alerts-poll running every 1 minute - this is useful for development but very silly in production. The alerts system config points to a directory in which to find incoming alerts (imports) and maps file extensions for XML files. Why do this? Well, XML files can use different schemas etc. Whilst we could write a script that digs into the file and handles the schemas, this is far more complex than just saying "If you give me a .rif I'll handle it one way and a .xml another". This also means that imports coming from different systems using different schemas, just need a different extension, something added to xmlMaps and handling code in alerts.py. The config above points to rifXmlMap.json for a mapping file. A sample of this file is given below: There are a few sections to this file:
Once you have configured the system and are ready to go, you need to add your collection files to home/alerts and wait for housekeeping to pick them up. Once the ingest has been run, you'll see results in the failed, processed and success subfolders as well as information in the log file. Notes
Finishing upOnce your record has been ingested (and you've checked the log to make sure there were no errors) you should be able to login to your ReDBox system and see the newly imported record in the ALERTS section. If the record looks OK, you can check the green tick to accept the record into the normal workflow. From here you can pass the work onto those who review metadata. |