Bill Thompson spoke at the Open Data Cities Conference in April, and a group of us started wondering if we could have a field trip to the BBC Archives to find out more. Alex, along with Bill and Honor, organised this for us, and yesterday a group of (mostly) Brighton dwellers headed off to the BBC Archive centre at Perivale for the afternoon.
We arrived, met Bill and after making a cup of tea, sat down to an introduction from Bill. Amongst the things mentioned (my notes are a bit sketchy) were:
- that there are number of different Archives, some of which are specialist (i.e. documents). This made me realise that the internal newspaper, Ariel, probably has an archive somewhere, and therefore a reference to me is in there somewhere (I was in a November 1972 issue)
- the lack of physicality of an item means that spotting it being missing is more difficult - i.e. in a shelf of videos, a gap is noticeable, whilst on a filing system it's a bit harder to spot - thought needs to be given to spotting the gaps
- Bill highlighted three items from the space. Two of these (John Peel's Record Archive and Faber: 60 years in 60 poems) I'd interacted with and one (Tweet Music: The Listening Machine) I hadn't (but now have).
Then followed a tour of the building - stopping by the film vault (cold and dry), the video vault (warmer and not so dry) and the vinyl vault (normal temperature). The vinyl vault (pictured below) had such an amazing and wonderful rich smell, which instantly took me back to Hull's Central Library in the 80s when they had a music library storing vinyl records in plastic covers as well - lovely!
We also saw some film transferring/checking going on and watched a Steenbeck editing table being used to transfer 35mm film into a digital format. All fascinating.
I made a few notes as we were wandering, they are even harder to read than the earlier set:
- archives of film/video are made to LTO tapes now - there are lots of machines to read these, they write to a known standard and should last for a long time. D3 was a format that the BBC used at one point, but only them an Channel 9 (Australia) used them for broadcast, which means that machines/spares are hard to come by and so this format is being format-shifted currently
- format-shifting, i.e. moving D3 to LTO, is expensive and time consuming. Adoption of new kit/storage solutions need to be thought about carefully to avoid needing more of this than is necessary due to technological advances
- 53TB of data is generated per day (if they're running full 24 hour shifts to capacity) by just one room of people/machines
After our tour, we returned to the main room, made more tea, and had an overview by Bill on the Genome project (scanned Radio Times since 1923) - my key notes were:
- 4.9 million transmissions/programme records have been created
- 8.5 million (nom-unique) contributor records have been created
- 120,000 articles have been scanned
- discussion is ongoing over the licence for this data
- data has been cleansed by hand, redacting data based on a duty of care (removing names/addresses that could cause inconvenience) and making judgement calls on terminology (what was acceptable in 1950s Britain may not be acceptable now)
We then sat down and had a bit of a round table conversation about Archiving (or should that be collecting) as it relates to pure digital technologies, to other types of data.
The notes I made included:
- the internet archive wayback machine is a good thing, but it isn't rich in what it stores, often being little more than the homepage
- the space has been created with Amazon storage/hosting in mind so that the entire virtual environment can be cloned/recreated
- however, any site that depends on any external service (i.e. twitter content, external user generated content, etc) will be difficult to archive in it's entirety, also any that allows user customisation
- there is a need for a digital archiving strategy and this needs to be taught in schools today - maybe as part of a digital literacy policy - by getting children used to tagging, evaluating data stores etc the next generation will have an incredible amount of skills that can be built upon
- Mydex exists today as a personal data store. Other companies will come along and offer these kind of services
And that was that, an excellent afternoon at the BBC archives, well worth the trip and yet more food for thought and questions to ponder.