These FAQ questions about audiovisual preservation were written by Richard Wright for PrestoCentre in 2013 and updated in 2019. They provide an introduction to the wealth of information from the Presto Projects and PrestoCentre, now part of the AVA_Net Library. They are meant to give brief answers to the basic questions.
Do you have analogue content sitting on shelves? (Specifically, content in audio and video formats — film is always special and will be covered separately) Then you aren’t preserving that content, you are waiting to preserve it. It will HAVE to be digitised, because all analogue audio and video formats are obsolete. With obsolescence comes the twin handmaidens of destruction: 1- risk: when you do want to digitise, you may not be able to find the equipment or the knowledge needed to play back your old tapes. Also they will have aged further, bringing more playback problems and outright failures. 2- cost: digitisation may seem expensive, but it has never been cheaper than it was in 2010 to 2012. Costs were dropping from 2000 to 2012 or so (we would like to think the Presto projects influenced that), but now (2013) the difficulties with getting equipment, spares, repairs and skilled operators has started to hit. Costs are now rising — service providers have been saying so, publicly, on the AMIA email list: AMIA list serve. If you are waiting, the first question to ask is: exactly what are you waiting for? You are probably waiting for one (or more) of these five enablers:
- knowledge of what to do
- knowledge of who can do the work
But the time for waiting is over. Here’s how to get the five enablers:
- knowledge of what to do — PrestoCentre will tell you. Just read the answer to the FAQ “I want to preserve audiovisual content. How do I get started?” All the other FAQ answers also provide knowledge of what to do, how to do it and what it costs.
- knowledge of who can do the work — if you have under 1000 items of any kind of material, the advice (from IASA and from PrestoCentre) is to use a service provider: this is a company which has the equipment and knowledge, and because they are dedicated to the work they can set up an efficient preservation factory workflow. The PrestoCentre has a registry of service providers, and is working on an evaluation system.
- permission — there are two main sorts: legal (copyright) and institutional (your boss)
- legal: generally you do not need permission to make preservation copies, and new rules on orphan works makes it legal to make preservation copies of anything which no longer has a clear copyright owner. There are problems with commercial music and cinema films, but are you mandated to preserve them, or do you just hold what amount to access copies? Access copies don’t need digitisation — it could be better and cheaper to simply buy digital replacement.
- institutional: the first thing you will be asked is: “what’s the business case?”. Read the answer to the FAQ “I need a business case for preservation. How do I do that?“
- staff and equipment — finding these for old formats will just get harder and harder. There are organisations in every country for radio and TV amateurs, and they may be able to provide volunteers. As the pool of skilled staff and equipment shrinks, your solution will be to use a service provider.
- funding — these are hard times, but putting it off will only make it harder. Use the business case, believe in it, and believe in the value of your collection and how its value can increase through greater access. Ultimately getting funding for digitisation will be like any other fund-raising operation. There are people with fund-raising skills — you may already have them in your organisation. You just need to focus them on the urgent issue of digitisation of analogue content.
Digitise now or lament forever
I want to preserve my audiovisual content. How do I get started? The basic steps for planning, funding, and carrying out a preservation project are:
- make a map of the collection, and sort contents into groups according to type and condition;
- arrange the groups in priority order according to the urgency of their preservation needsd (triage);
- work out what needs to be done for each group: a preservation strategy; and
- state exactly how the work will be done: who does it and by when: a preservation plan.
These steps are explained in short PrestoCentre Tutorials, beginning with Mapping your AV Collection on mapping and triage. The map need not be complicated. The tutorial shows the BBC archive’s collection of film in a simple table of five rows (one for each group) and five columns (for type, age, storage history, genre or value and finally condition). The tutorials also cover a strategy for the whole collection: what it’s for, who it serves, what it could become if digitised. Why? It’s because of the need for funding, and the need to justify funding. It’s also because mapping and triage shouldn’t be just about physical condition. The value of items in the collection is significant, and a useful map also shows (again, in broad groups, not individual items) differences in value — which contribute to the value dimension of the triage. The tutorial Making a preservation strategy is a guide to considering an overall collection strategy and using it to make a straightforward preservation strategy. This is again just a table, and for BBC archive film it is again 5 by 5: the five groups from the map, with columns for type, condition, action needed, timescale and whether the work will be done in-house or outsourced. A presevation strategy is firmly based on preserving — and extending — what the collection does, not just based on preserving the media.Preserving value (and creating access) are far more important than ‘copying tapes’, with a much better chance of attracting funding. Finally, from the preservation strategy a detailed roadmap of work can be set out, costed, funded and carried out: the preservation plan. A preservation strategy should be relatively permanent, but plans can be short term. If longer commitments cannot be obtained, preservation plans may be annual — though there are definite economies of scale in funding blocks of work that may take several years to carry out. The tutorial Planning your preservation project covers this final stage, with again an example from the BBC film collection. This time the plan is a 5 by 6 table, with headings for Type of material, Preservation Action, Service Provider, Batching, Outcome and Quality Control. Batching is the amount of work that will be undertaken at one time — in this case monthly shipments to the three external contractors. The outcome could also be called outputs or deliverables: the tangible products of the work. Quality control is only summarised in the table — the final project will need a detailed quality specification which could run to several pages. There are two areas that are outside the scope of a digitisation project but are vital to the overall preservation of a collection:
- Conservation: everything that happens (over years and decades) between preservation actions such as tape format migrations, or before digitsation. Tutorial: Conservation of analogue AV content
- Restoration: changing what an archive holds to reduce the effects of damage and deterioration. Most techniques are now digital, so most restoration activity is now an optional process following digitisation. Tutorial: Restoration of AV material
Finally, if the definitions of preservation, digitisation, conservation and restoration are all overlapping and fuzzy, there is a tutorial which sets out clear definitions and differences: Tutorial: Introduction to Preservation You have now reached the end of the one-page answer to How do I get started? Each of the answers is just a page, and each gives references to further information. So get started!
It is a rare institution which has a standing budget for everything needed for preservation. The shortfall is usually of two sorts: 1) maintenance: all content needs continuous attention, including air conditioning and dehumidification for content on shelves, media checking and repair for analogue content, file and media (storage) checking for digital content and updates of the catalogue (metadata) whenever content is used or modified. There may be a standing budget, but it rarely covers things that go wrong: environmental system fails, storage space is inadequate, format have become obsolete. 2) interventions or preservation actions: the major steps that are needed to keep a collection usable. The major actions are migrations (transfers) from old media to new media, digitisation to move from analogue to digital content, and migrations from one digital storage method to a new one. The good news is that digitisation is likely to be the most expensive and time-consuming of all preservation actions — and after that everything gets easier and cheaper because the processes involved can be highly automated. A tape robot can migrate from old data tape to new with no human interaction beyond supplying the overall instructions, inserting new blank tapes and taking out old tapes. Transfers happen at intervals measured in years or even decades, so it is the rare institution that has a standing budget to pay for this work — hence the FAQ about a business case. Consider a car that has a major problem. It is not enough to just say “OK, we’ll get it fixed”. It may not be worth fixing; it may be more economical to replace it. There may be options for fixing it, with associated costs and risks. There are certainly options for replacing it: purchase, lease — or in some cities there is the option of taking out a membership in a group ‘car club’ scheme — a kind of cloud service, really. where you ‘pay as you use’. Or do you need a car at all? Or do you need a lorry or a bus or a motorcycle? The point of a business case (finance case) is not to simply ask for money. The real point is to answer all the questions about what is really needed, and why it is needed — and then what the benefits, costs and risks are for the various option — and then ask for money. PrestoCentre has a 100-page report from PrestoPRIME on “Audiovisual preservation strategies, data models and value-chains”. The report is by people who have looked into audiovisual preservation issues and options for 20 years. Here is a basic procedure for building a business case: 1) start with the collection strategy. Everything begins with what a collection is good for, and what it could become if transformed by a preservation action such as digitisation. The PrestoCentre Tutorial Making a preservation strategy covers a collection strategy as well. There is more information on the what digitisation can achieve in the Tutorial Why digitise AV material? 2) justify the investment: the collection strategy sets out the value of the collection. A preservation action can do two things: preserve value and extend value. At this point there are two broad paths: heritage value and economic value. We live in times where anything that can’t be reduced to a monetary value is ignored, because “it’s all about the bottom line”. This view has to be resisted by all public and heritage institutions, because there is another well-defined and quantifiable value: public value. Economists have dealt with public value for two centuries, and it is invoked for everything that the market cannot deliver — from clean air to honest judges. It is very important to use ‘preservation of the public value’ of a collection as an argument, and to not be ashamed or intimidated by not being able to put a price on that value. For more information on public value, we suggest section 1.1.1 of the full 100-page report “Audiovisual preservation strategies, data models and value-chains” which describes how the BBC justified its investment in digitisation based on preserving the public value of everything the BBC had done (which sits in the archive if it sits anywhere). The whole issue of public value was presented, in a broadcast archive context, in the PrestoSpace annual audiovisual preservation progress report for 2007 – Deliverable D22.8, Annual Report on Preservation Issues for European Audiovisual Collections (2007) – pp27-35. The main way to extend the value of a collection is to extend the access. This area – creating access – is where one can also create interest and support, because digitisation does genuinely revolutionise the potential for access. Much more information on access was made available by the EUScreen project, including their 2014 report On-line publication of audiovisual heritage in Europe. 3) set out the options, as you would for a car that has a fault. At this point we move beyond what can be answered in one page, but one simple statement can be made: for analogue audio and video there is only one option: digitise now! The equipment is disappearing fast — it is literally now or never. 4) get costs of the viable options. See the separate FAQ How much will it cost? 5) if not already part of point 2), also list the benefits of each option — and (very important) set out the risks. If you repair the car, something else could go wrong (a risk); a new car may have a three year guarantee (a benefit, or avoidance of a risk). Working out which option is best is a balance of costs, risks and benefits — making sure that maintaining public value and creating access are included as major benefits where appropriate. That’s all for a one-page answer. A wealth of support and experience is available through the AVA_Net library — just search for Business models. My advice: don’t put it off, don’t be daunted, start now!
The question could be bounced back: how long do you want to preserve something? Clearly it will cost more to keep something for 1000 years than for 10 years. The first major surprise in dealing seriously with long-term costs is that ‘forever costs’ can be calculated, and they are not infinite. How can that be? If something costs so much per year, then for an infinite number of years it costs an infinite amount, right? Yes, but. First of all, if the costs decrease year on year, then the sum is finite — a basic fact of mathematics that seems mysterious, but it is no more mysterious than the fact that the series 1 + 1/2 + 1/4 + 1/8 + 1/16 can be extended indefinitely, and the sum still never reaches 2 — because at each stage the amount being added is half the amount needed to reach 2. The second reason is related, but has to do with increase rather than decrease in value. Money, in our economies, generally grows. A lump of money can be put somewhere where it earns interest. If the principle isn’t touched, that earning carries on indefinitely: a money pump. So even for a fixed cost per year (rather than a shrinking one) — the calculation of the forever cost comes down to the calculation of the amount of money needed to be put aside to earn enough interest (enough net present value) to pay the annual costs, forever. This answer may already sound complex, but without these complications we’d have to fall back on: “well, to keep something a long time will cost a lot of money” which isn’t insightful. Preservation is everything needed to preserve access, to follow the UNESCO and CCAAA definition (1). That covers a lot of ground. In order to be more specific, the PrestoSpace terminology divides all this into two areas:
- actions that happen all the time, or very frequently. Keeping the air conditioning running is an example. These actions could also be called maintenance, to stress the idea that they have to happen, and can’t (safely) be put off.
- one-off or infrequent preservation actions, but still necessary to the existence of the items.
The first group of actions could be called conservation — and indeed such actions were called conservation in the Presto series of projects. The only problem is that in other archive and museum fields conservation can be used for interventions: taking a book up to ‘the conservation workshop’ to give it a new binding, or taking an oil painting to have the canvas rehung and reframed. So this answer will stick with maintenance. Digitisation is a major example of a preservation action, and costs for digitisation are the subject of a seperate answer in this series. [see the FAQ on digitisation costs] Maintenance would cover areas familiar to every librarian or archivist: the building, the shelves, environmental controls, cleaning and general upkeep, and associated staff costs. How much does all that cost? As a book on a shelf has a lot in common (so far as maintenance costs are concerned) with a tape or film on a shelf, we can turn to data from the general library world. In the UK there was a study of shelf costs compared to digital storage: LIFE-SHARE. For shelf costs they in turn relied on a paper published in 2010 in the USA: On the Cost of Keeping a Book (2). We now come back to net present value, because that is how Courant and Nielsen were able to deal with long-term costs such as replacing a library building after 40 years. Their result gave an annual cost per book of between US $1 and $4, depending upon whether it was the more expensive open-access shelves, or the high-density ‘stacks’ that can’t be browsed and are harder to access, but make cheaper storage. Audiovisual archives could be expected to be closer to the $1 figure, because of their use of high-density shelving. These annual costs may seem high, but they are in accord with figures from major archives associated with the PrestoCentre. If your situation doesn’t require covering the full cost of the building, your effective costs could be as much as 75% less (because 60% to 75% of the costs in the Courant and Nielsen study were for the building itself). The second major surprise is that digital is now cheaper. This is clearly the case in the book world, as shown by Courant and Nielsen, and by the JISC-British Library LIFE studies and by LIFE-SHARE. A scanned book is typically a few gigabytes, and organisations such as the Hathi Trust will store it for from $0.19 [2019 price] per year depending upon scan resolution and whether the scans are monochrome or colour. Store it how, with what guarantees and what indemnities if it all goes wrong? And what happens if they go bust, or a supplier of a key component goes bust? These are the real concerns in digital preservation: definition of a service that really meets the needs, covers the risks, and sets out all the information where all parties can see and understand it. The issue of defining what we mean by ‘digital preservation’ and ‘storage as a service’ is the area of real concern — not the current price of LTO tapes or hard drives. Organisations doing their own digital storage also need to define what they mean and define the service they want to deliver to themselves. It may be even more vital for such an organisation, because they may be novices. There is a wealth of PrestoCentre Resources in the AVA_Net library. A detailed paper is Audiovisual preservation strategies, data models and value-chains. (1) Edmondson, Ray “Audiovisual archiving: philosophy and principles” UNESCO, 2004. V3 is 2016 (2) Paul N. Courant and Matthew “Buzzy” Nielsen, in The Idea of Order: Transforming Research Collections for 21st Century Scholarship, Council on Library and Information Resources (2010)
There are people who use the word preservation interchangeably with digitisation. This is a mistake. Preservation never ends. Eternal vigilance is the price of preservation, to paraphrase. Digitisation is, to avoid confusion, one of a range of preservation actions (one-off interventions) in the course of the existence of something being preserved. So this answer is for the narrow question about the cost of converting analogue audio or video into a digital form. There are various ways to run a digitisation project, and there are various ways to measure cost. All this leads to confusion. In the interest of clarity, the following answer will try to make simple statements. Most any of these could be qualified with a range of ‘yes, but’ considerations of varying degrees of importance, relevance and complexity. These statements come from a twenty-year background in audiovisual digitisation, so they should be worth something, providing the reader also remembers two things:
- the answers are simplifications;
- things change; some things will get cheaper, others more expensive. This answer is written in February 2013.
Major factors affecting cost of a digitisation project include:
- what kind of audio, video or film is being digitised: cassettes are cheaper than open-reel; gramophone recordings take a lot of manual handling; film digitisation takes very expensive equipment. As gross simplifications, audio costs at least €20 to €50 per hour for open-reel tapes (with no problems); video costs twice as much, and film costs ten times as much. [2019: film digitisation has gotten cheaper; now perhaps ‘only’ five times the price of video digitisation.]
- condition of the material: problem material takes more operator time, to reset things and try again. The cost can be limited (at the price of a higher failure rate) by limiting the time spent trying to play problem material. One common method is to limit operator time to twice the duration of the actual material. This will limit the time spent on each item, but says nothing about the overall time. One reason most projects of 1000 items or more begin with a test run or pilot sample is to estimate the percentage of problem material. Condition matters! See bottom of page.
- amount of material: this is rather obvious, but it’s not strictly linear as there are economies of scale in larger project (over 1000 items)
- workflow: this factor relates to amount of material, because an efficient workflow doesn’t make much difference on small projects. On large projects (over 1000 items) it is quite possible to save 50%, and some projects have quoted savings of 70%, on the base price per item.
- quality of the work: cheap equipment, untrained operators, no calibration or checking of equipment, cheapest digital storage for the result — can all reduce the cost, but if the digitisation is not of preservation quality then the project is a waste of money, and may destroy, forever, the chance of doing a proper digitisation. It will be hard to get funding to ‘do it right the second time’.
- checking the quality: some projects quote 30% of the total cost dedicated to quality control. That is at the high end, but somewhere between 15 and 30% extra has to be allocated for quality checking.
- metadata: the database or catalogue for a collection has to be updated — or in the worst case there is no documentation and something has to be created as part of the project. Bar codes and automation can cut the cost of logging new digital items into a database, but only for quite large projects where it is worth creating software to connect to an existing database and automatically update it for completed items. At least 10% should be added for metadata. If there is no catalogue and part of the project is to make one, the cost is probably double: the person doing the transfer is not a cataloguer so you need two people. Even if the person can do both jobs, they can’t be done at the same time (it has been tried; it can’t be done).
- method of measuring costs: the cost that matters (for getting a project funded) is the cost an institution has to pay. If computing resources, transport or cataloguing are done by existing staff and systems, they may be invisible for the purposes of the project. In the limit, if there is a technical person already on staff who can do a few items per day, digitisation may be seen as free. It is, formally. The only problem is that the workflow for such projects is usually as inefficient as possible (no division of labour, because there only is one person doing everything) and so the project will proceed very slowly. As with any project, the throughput (items per year, basically) has to be assessed, and compared with the requirements of the project. Anything that will take more than five years is risky, because nobody knows the availability of equipment and operators even just five years from now. The situation in video is desparate, audio is getting desperate, and film has a whole range of difficulties. The ‘free’ digitisation could be the road to ruin, if it leaves material undone five years from now.
The costs so far are for making a digital file from analogue content. Where does that file go? This used to be daunting but digital storage costs have come down by about a factor of 100 in the last decade, to the point where 1000 hours of high quality audio (24 bit, 96 kHz) can be stored on €150 of hard drives (for four terabytes) — and so three copies are under €500. [2019: €80 for 4TB, €250 for three copies.] Video will take ten times more storage (for lossless compressed standard definition video) and 25 times more for uncompressed. High definition video is currently a jungle of formats, but lossy compression at an ‘archive quality’ of 400 megabits/sec translates into five hours per terabyte of storage. If more than roughly ten terabytes of storage are needed (30 with double backups) — then LTO datatape is the preferred storage medium. There is an overhead of a few hundred Euros to buy a tape drive, but after that the cost of datatape is less than the cost of hard drives, the reliability is considerably better and the energy cost of datatapes on shelves is of course much less than the cost of spinning discs (though discs on shelves also consume no energy). [2019: many public institutions are joining forces for shared storage services. ‘The cloud’, which simply means using Internet to access rented storage, is now the future of storage: letting somebody else worry about hardware, backups, redundancy and maintenance — providing it is somebody you can trust because of a demonstrable track record, transparent costs and guaranteed security.] Condition Matters! Rule of thumb: problem material takes four times as long as problem-free material, leading to these results:
- 10% problems means 30% more work;
- 20% means 60% more: a $10 000 job will instead cost $ 16 000;
- 30% means 90% more: nearly doubling the cost of the whole project;
- 40% means 120% more and so on.
This answer assumes that the full question is: I have content on shelves; I want to put it into files and put those files on mass storage of some sort; what standards should I follow?This answer started out covering four areas where people might ask “what is the standard, or at least recognised best practice?”
- the digitisation itself: sampling rate, quantisation, the complexities of colour video, the scanning of film;
- the encoding of the results of the digitisation, because there are many options;
- the file format to hold the resultant audio and video; again, several options; and
- digital audiovisual media: the problem of material that is already digital, but is NOT in a file format. This covers everything from audio CDs to the latest forms of digital videotape — a 30 year span of digital media of many types, with a surprising amount of complexity
To keep answers to one page and in an effort to provide clarity, there are now four FAQs, one for each of the above areas. The following answer is for the question: what standards do I follow for the archive quality digitisation of analogue audio and video, and of film (real film)?
Audio: this is the easy one. There is good documentation and there is a strong concensus of opinion. The hard part is actually getting the best possible playback in order to do the best to meet the standard! Archive preservation standard for audio:
- Quantisation: “24 bit” = 24 bits in the number representing each sample.
- Sampling rate: at 48 kHz minimum; 96 kHz is recommended.
- Don’t compress, at all
- Save as BWF, the Broadcast Wave version of the WAV file
Source of the standard: IASA Technical Committee, Guidelines on the Production and Preservation of Digital Audio Objects, ed. by Kevin Bradley. Second edition 2009. (= Standards, Recommended Practices and Strategies, IASA-TC 04). Quibbles:
- 24-bits: you’ll be lucky to get 19 to 21 bits of true data, but for simplicity (and to have a multiple of 8) people refer to 24-bit quantisation
- 48kHz minimum: the CD standard, 44.1 kHz is tolerated; it has practical advantages when 44.1 kHz is a working format for an archive or institution
A useful practical document on playback and digitisation from the British Library Sound Archive is the Endangered Archives Programme Guidelines for the preservation of sound recordings
Video: there may be arguments about what to do to preserve video, but there are two clear standards for how video should be coded as a digital signal. 1) SD: Video from the 1950’s to the 1990’s was mainly standard definition = SD, which meant 525 lines in the US (and many other countries, particularly those with 60 Hz electricity systems), and 625 lines in the UK (and countries using 50 Hz electricity). The SD standard was agreed in 1980 and has been the basis for digital video equipment ever since. It is ITU-R Recommendation BT.601. “Rec 601” video is what comes out of the digital connectors on professional video cameras and videotape equipment, and is the standard signal networked across television production. Or was, because since the 1990s video has been changing from SD to high definition = HD. Rec 601 has variants, but for digitising analogue SD content the version to use is 10-bit data, 4:2:2 allocation of samples to the three components of colour video. Any professional ‘capture card’ will deliver this standard. It has a full bit rate of 270 megabits per second (Mb/s), but that is tied to a real-time signal which has ‘blanking intervals’ which were needed by cathode-ray television sets. For files, blanking no longer applies so the data rate can be chopped to 200 Mb/s. Analogue video in archives will be SD, and should be digitised to Rec 601. All the arguments about analogue video have to do with encoding and file formats, answered in the FAQ “How should audio and video be encoded for preservation?” and the FAQ “What file format(s) should I use?” There are also complexities about material which ‘digital but not in files’, such as digital videotape (DV, Digibeta, IMX and more). How to deal with that material is in the answer to the FAQ How do I preserve digital media? 2) HD: There is NO analogue high definition video, so any archive digitisation of analogue video should follow the SD standard, Rec 601. But there is an HD equivalent to Rec 601: ITU-R Recommendation BT.709. There is a lot of complexity regarding HD, which is one reason why dealing with digital media has its own FAQ. All HD is digital, therefore it is either on digital media (eg HD-CAM SR tape) or is already in files on mass storage.
Film: there are many complexities about film: a range of gauges and image formats, dozens and dozens of particular types of film, at least half a dozen possible versions of ‘the same thing’ ranging from shooting negatives through cut negative to interpositive, internegative and various kinds of prints. There is no standard for the digitisation of film for preservation, because film is many things, unlike audio. Whether film needs digitisation for preservation is hotly debated. Certain positions are clear:
- film that is demaged will have to be digitised to be restored, because all the powerful restoration processes are digital;
- film that is suffering colour fade or vinegar syndrome can be kept below freezing to slow the chemical reactions, which is not preservation so much as putting off the inevitable digitisation. The expense of sub-zero storage just steals from a digitisation budget, and the time spent in storage just increases the cost of that inevitable digitisation;
- film in broadcast archives will never be used without digitisation. Older broadcasters have a lot of film: the Presto 2002 survey of 10 major European broadcasters found that 1/3 of their television archives were on film, not videotape
There are technical guidelines and examples of good practice. When the Dutch national audiovisual archive NISV started a major project of film digitisation, they looked at many options and have contributed their findings to the PrestoCentre: Film Scanning Considerations — which also has a six page digest.–The requirements in broadcasting for film scanning were investigated by the European Broadcasting Union, in two reports also available from PrestoCentre:(1) Preservation and reuse of film material for television;(2) Archiving: experiences with telecine transfer of film to digital formats.–The underlying issue of what image information is on a negative and what scanning technology is needed to recover that information was studied by the ITU standards body in 2001-2002. Their work and other highly technical studies of all the components of the ‘optical chain’ between object and film, between film and viewer and between film and scanner are reviewed in a report from the scanner company DFT: What Digital Resolution is Needed to Scan Motion Picture Film: 4K, or Higher?. However there is evidence that for ‘technically perfect’ still-image negatives, there is information that can be gained from an 8k scan instead of a 4k scan: Understanding Image Sharpness–One way to deal with complexity is to ignore it. Here is a simple answer to a complex issue; the plan is for the complexities (and rebuttals!) to be dealt with in more detailed documents that will be referenced here as they are produced.–Standard practice for digitisation of film at archive preservation quality:
- 16mm film: scanning at 2k with 10-bit quantisation using a log scale (the NISV approach)
- 35 mm film: scanning at 4k with 14-bit linear quantisation recommended (the DFT approach). For exceptionally high-value and ‘technically perfect’ items, scan at 8k.
The PrestoCentre recommendation for all analogue content, including film: digitise now; it will only get harder and more expensive in the future.
There are two related answers: this one about encoding, and a separate one about file formats. The reason is: they are two separate issues, though not separate enough! The overlap causes a lot of confusion. Audio from a microphone and video from a camera are signals which can be represented by a continuous line. The line is proportional to pressure for sound audio, and to light intensity for monochrome (black and white) video. Colour video is actually three ‘separate lines’. The variation in sound and in light is continuous (analogue), but the variation can be coded digitally by sampling: a number representing ‘the height of the line’ is calculated so many times per second. The result is a sequence of numbers, and that is the simplest form of digital encoding of an original analogue continuous signal. Why complicate it then? For video, the answer is that colour video is already complicated, being essentially three parallel phenomena being represented by numbers (red, green, blue; equivalently luminance plus two colour dimensions to represent a colour wheel). Some decision has to be made about how to put the three sets of numbers together, and that decision is part of the encoding. A video signal is a rastor: so many numbers per row, so many rows per image. Another complication is interlacing: doubling the number of images per second by sending half the information (the odd lines) and then sending the other half (the even lines). In video, a sequence of numbers has to be correctly interpreted to be divided properly into rows and images, and so colour information and the shuffling of interlacing can all be decoded. The other reason for complicating the encoding is to squeeze the data. Sampling audio at professional rates (24 bit samples, 96k samples per second) produces 4.6 megabits per sec (Mb/s) — and sampling video at Rec. 601 (see FAQ on standards) produces 270 Mb/s — which can be cut to 200 Mb/s if storing the data in a file (because the zeroes during the blanking intervals can be stripped out). These data rates are a real challenge: for capture, for moving between devices over networks, for broadcasting. Consequently technology was developed to reduce the data rate while keeping as much as possible of the information. The inherent predictability of the information (the fact that a dark area is more likely to be next to more of the same than to something very different) can be measured. The parameters of the measurement can be kept instead of the data, and a savings can be made. Older readers will remember when it was standard to ‘zip’ files to make the most of floppy discs that only held a few hundred kilobytes. It is still standard to compress audio and video. There are many ways to do this, and each is a different kind of encoding. Kinds of compression: the zip compression used for general computer files was completely reversible: it just saved space, and didn’t throw any information away. After uncompressing, the result would be bit-for-bit identical with the original. That is lossless compression, and can also be applied to audio and video. The problem is, it doesn’t save much space: usually the result of lossless compression is 1/3 the size of the original (3:1 compression) or maybe 1/4 at best (4:1 compression). At a sacrifice of information, huge compression factors can be achieved. We use compressed audio everytime we use a mobile phone, and the coding is roughly 10 kilobits/sec, a 70:1 compression ratio compared to CD quality, and 300:1 compression compared to the full archive standard of 24-bit samples, 96k samples per second. The video seen on the Internet is typically compressed by factors ranging from 200:1 to over 1000:1. All of this reduction in data rate has a cost: information is lost, quality is reduced. For archiving, where a basic principle is to maintain quality, it is not good practice to introduce lossy compression. Encoding and file types: It became standard practice to develop a different file type for each encoding, which is where the overlap started. Real Audio (from the company Real), Window Media audio, jpeg images, mpeg video and so on were in files with extensions ‘ra’, ‘wma’, ‘jpg’m ‘mpg’ and so on. As encoding schemes proliferated, that approach (one file type per encoding method) was heading for nonsense, so gradually file types developed that could hold multiple kinds of encoded data — with metadata inside the file to self-identify the coding. A simplification — but also a complication because it was no longer obvious what encoding was actually being used. An application could read the metadata, but a person could only see the file name, and file types like AVI and MOV (and even WAV) can hold many kinds of encoding — as well as holding both audio and video and subtitles and possibly even time code. File formats got so complicated (powerful, the developers would say) that people started to call them wrappers, to emphasize that the file could hold many things: video, multiple channels of audio, subtitles, time code, other metadata. Recommended file formats are given in the answer to What file format(s) should I use? Now that the basics of encoding, coding types vs file types and compression have been covered, here are the recommendations for encoding of audio, video and film: Encoding for audio: just use the sequence of numbers from the digitisation, with no further encoding or compression of any sort. This encoding is sometimes referred to as linear PCM. The standard for digitisation of audio is in FAQ on standards: 24 bit, 48 or 96 kHz. The file format for audio is WAV, and the Broadcast version of WAV (usable by all applications that can use standard WAV) is recommended for its extra metadata. See the FAQ on file formats. If you are taking digital audio from a carrier (CD, DAT, minidisc or even the sound from a videotape) and putting it in a file in an archive, see the FAQ on digital media. You should to clone the original — if you can — but there are other complexities. Other encodings: for delivery and access you may want a compressed version, such as MP3. Whether this is made at the time of digitisation, or made on-demand at time of use, depends upon each particulary installation. Small collections with all the audio online uncompressed can make MP3’s on demand. If your master uncompressed audio is kept offline on data tape, then make the MP3 at time of digitisation and put that online. Encoding for video: there are three cases: 1- original media is analogue, so either code uncompressed or lossless compressed If using lossless compression, which one? The answer is: it doesn’t matter, regarding principles, so use whatever is most practical. But — proprietary methods should always be avoided. There is a wealth of information on the advantages and pitfalls of all common encodings available from the US Library of Congress: If you decide to change to another lossless encoding, that doesn’t matter as the files can be converted automatically and painlessly. Probably the most common lossless compression for video is now JPEG2000, used by the Library of Congress and many others, and also used in the Digital Cinema standard. JPEG2000 is now widely used as an encoding for still images. Document scanning produces still images, generally using JPEG2000 — so this format is widely used and understood in the library world. See http://www.dlib.org/dlib/july08/buonora/… 2- original media is digital (but NOT a file): see the FAQ on preserving digital media. 3- born-digital files (the following also applies to born-digital audio and digital cinema files) Why is there any problem? A file comes in, you keep it. Actually, there are two main problems areas: 1) you also want to produce a standard version; the original could be an oddball format, or your whole approach to digital archive may rely upon making an archive format; 2) you need to keep the metadata, not just the video. Your archive won’t know anything about the technical attributes of the video, or about the descriptive metadata (if there is any) unless you can pull out and interpret that metadata, and then put it into the catalogue of the digital repository or archive. Recommendations for born-digital files coming into a digital collection (repository, archive): 1- Keep the original encoding in its original wrapper; this is always possible, so should always be done. 2- Make a standard format if desired. The quality won’t improve and it may take up much more space, but it might be a key step to the overall operation of a digital collection. so MASTER could be original or could be the standard version (as for 1.) 3- Make delivery and access versions as needed. It may also be pragmatic to make a mezzanine version; this is not the master copy, but it is the high-quality copy from which access copies in lower quality are made, in a way that is computationally efficient. Encoding for film: (the following only covers the images, not sound on film) Scanning of film can be done by two kinds of equipment: telecine and datacine. The output of telecine equipment is a video signal, which is is NOT what is needed for archiving. Video has only 500 to 600 scan lines, and even HD video is only 1080, only suitable for preservation digitisation of reduced-quality 16mm content, such at the telerecordings (film made from a video input) in broadcast archives. The output of an older telecine machine will be an analogue video signal which has to be digitised to be digitally preserved. This case is technically equivalent to digitisation of any other analogue video, covered in the previous section. Generally, film digitisation for preservation will us a datacine machine. The output is a file (or a whole lot of files): typically the results of scanning are available in the DPX format (with one file per frame, and a folder for the whole scanned film). The data in the DPX should be uncompressed images, and this is what should be saved. The other common archival film encoding is lossless JPEG2000. So, two main options 1) encoding: uncompressed; file format: DPX 2) encoding: lossless compresses JPEG2000; file format: many options, see FAQ on file formats. There may be many other output possibilities from a datacine machine, and it is tempting to use one which fits with existing processes. This is almost certainly a mistake, from the archiving and preservation viewpoint. Production in broadcasting will probably be based on a video format, which will be a big drop in resolution from a 4K or even a 2K scan, will probably be lossy-compressed and could even introduce all the mess of interleaved data. Most broadcast production uses a video encoding using MPEG-2 at 25 or 50 Mb/s for SD video and uses MPEG-4 encoding at 100 to 400 Mb/s for HD, using MPEG-4. These are lossy compression encodings, and so should never be used for holding the master version of the results of film scanning. Digital cinema relies on various version of the JPEG2000 encoding format, with the distribution version being considerably compressed (lossy compression). With production (or customers of a footage library, or researchers at an archive) using, for their general work, an encoding that is very different from the master version, there needs to be a way to satisfy both the archive requirement (save the full data out of the scanner) and the business requirement (deliver to users something they can use). The solution is multiple versions of the material — which is nothing new as film archives have always had a range of versions: master negative; prints from the master; intermediate negative (interneg) made from the master and used for making prints, to preserve the master; prints from the interneg; prints from prints and possibly several more. There are options for efficient production of encodings that suit the users. The new versions can be made at the same time as the original scan, and held along with the DPX or JPEG2000 in the archive. Alternatively, new versions can be made on demand. Another option is more technical. It may take a lot of time and computing to make new versions from the uncompressed DPX, so if a lot of this work has to be done it can be computationally efficient to make a high-level intermediate file format (kept with the DPX in the archive) and produce more compressed versions from that. This is the mezzanine format approach to digital archive management. In digital cinema The EDCine project recommended a master version in lossless JPEG2000, but that version is not computationally efficient for making lossy versions. So the archive would also hold a high-level lossy JPEG2000 mezzanine version. Any desired compression level can then be made from the mezzanine version in an efficient manner — as long as the final result is still a form of JPEG2000, and not some entirely different encoding family (such as MPEG). There is a lot of complexity around the digital distribution of cinema productions. The industry standard is the Digital Cinema Package (DCP). It has lossy-compressed JPEG2000-encoded files for the images (using the MXF wrapper) and has separate MXF files for the sound, plus extensive digital rights protection to prevent any but the intended use. Any PrestoCentre information on DCPs will have to await a further series of FAQs.
This answer is one of four answers to related FAQs:
- What standard(s) should I follow?
- How should audio and video be encoded for preservation?
- What file format(s) should I use?
- How do I preserve digital media, like CD, DVD, DAT and all the different kinds of digital videotape?
The encoding answer covers a lot of ground, from compression to how to archive born-digital content. While a great deal can be said about what makes one format preferable to another, and about the general risks and characteristics of files, we don’t have to say any of that here, for two reasons:
- this answer is about what PrestoCentre recommends, not about the rationale;
- there is extensive information about the preservation considerations (the sustainability) of all common (and many uncommon) encodings and file types under the general category of formats on the US Library of Congress preservation website.
Therefore this answer should be short! Audio: The file format for audio is WAVE (.wav), and the Broadcast version of WAVE is recommended for its extra metadata. Broadcast WAVE also has the extension (.wav) but is commonly referred to as BWF. All BWF files should be usable by all applications that can use standard WAVE files. The metadata for broadcast wave files is supported by all standard professional audio edit software, and by all service providers who deal in professional formats. The latest and recommended version of BWF is version 2. Version 1 fixed a problem with large files, and version 2 adds the the metadata for ‘loudness’ standardised within broadcasting (ITU-R BS.1770, EBU R128). Video: There are four main options:
- MXF which is a professional (and maybe professional only) standard within both broadcasting and digital cinema. It is also used by the Library of Congress and by many other institutions which have used the robotic SAMMA system for digitisation of analogue video cassettes.
- FFV1 is an open-source, licence-free form of coding using lossless compression, applied to individual frames. It is currently undergoing development by the Internet Engineering Task Force (IETF) to improve its specification and support tools. https://datatracker.ietf.org/wg/cellar/charter/ FFV1 is the ‘new kid on the block’ but is gaining strong support in the heritage, university and general non-commercial world, particularly with improved specification and support tools as a result of the IETF work. Commonly FFV1 encoding is wrapped in the Matroska wrapper file format, as it is also open source and licence-free.
- MOV is the Quicktime wrapper, associated with Apple computers and also with the MPEG-4 standard. It can hold uncompressed video as well as a wide range of compressed encodings, and it supports time code.
- AVI is the wrapper from the Microsoft camp, developed in the early 1990’s. It supports uncompressed video and a range of compressed encodings. It does not support time code. In the US it has been used by NARA, the National Archives and Records Agency — with no problems because their analogue originals did not contain timecode.
PrestoCentre recommends MXF and MOV. The reason for not recommending AVI are given by the Library of Congress, which quotes a Wikipedia article Audio Video Interleave listing areas (aspect ratio coding, time code, variable frame rate, MPEG-4 encoding) where AVI is does not support the full range of digital video archiving requirements. While most people using JPEG2000 for video will put it in an MXF wrapper (so that audio, timecode and metadata can all be in the one wrapper), for still images where JPEG2000 is very widely used, in particular to hold the results of book and document scanning, it is common for the resultant JPEG2000 encoded image data to be held in a JPEG2000 file format. Film: While modern scanning equipment will produce many output formats, the requirement of ‘saving the best available’ dictates taking an uncompressed image output from the scanner. That can be saved in various wrappers, but in practice two are generally used:
- DPX is standard in digital cinema production and in digital restoration as the working format, but it is perfectly usable as an archive format. Audio and video are separate, and indeed each image is in a separate file, but the DPX standard includes metadata to bundle the whole collection of files in a meaningful way, understood by professional-level edit systems.
- MXF is the standard for distribution in digital cinema, wrapping JPEG2000 encodings. There are various kinds of JPEG2000 (J2K). The EDCine project recommended lossless J2K for the archive master, a slightly compressed lossy J2K encoding as a mezzanine, and then the distribution digital cinema package DCP could readily be produced, as well as more heavily compressed versions for other uses. MXF is the choice when there is a need to keep a single file as ‘the object’, rather than a collection of files as is the case with DPX.
DCP is the format (again, a bundle of files, not a single file) defined by the Digital Cinema Initiative DCI as the distribution format for digital cinema. It is now reaching archives, and so the requirement to ‘store the original artefact’ can be seen as requiring making the DCP an archive element — the ‘archive original’ if not the ‘archive master’. Unfortunately the digital rights managment lock-ups associated with DCP distribution create archive problems, that are really just emerging in 2013. Problems with DCP (and their solutions) may need to be a separate FAQ. If the PrestoCentre recommended file formats don’t work for your situation, you can do two things:
- tell us why, so we can make our information more comprehensive. There should be a comment box on this page.
- consider alternatives not listed above. The Library of Congress has a comprehensive list of formats for ‘moving images’, covering codecs and wrappers.
This is an answer to one of four related FAQs:
- What standard(s) should I follow?
- How should audio and video be encoded for preservation?
- What file format(s) should I use?
- How do I preserve digital media, like CD, DVD, DAT and all the different kinds of digital videotape?
The first three answers concentrated on digitisation of analogue content, and getting it into the file-based world sitting on mass storage in IT systems. However there is a lot of content which is technically digital (the audio and video are represented by numbers, not analogue signals) but sitting on shelves, and so also needs to be brought into the file-based world. To avoid confusion, moving content off digital media and into files should NOT be called digitisation. The word originating in moving music tracks from audio CDs is ripping, and that is the term used here. The problem: it isn’t only analogue carriers that face obsolescence, degradation, damage and loss of playback equipment and expertise. There is also a range of dedicated digital media, beginning with the audio CD in the early 1980s and continuing right through to the latest Blu-Ray disc. The media basically include:
- Audio: CD disc, DAT tape
- Video: DVD disc, Blu-Ray disc, a range of digital videotapes (D1, D2, D3, D5, DV, DVC, DVC-PRO (d7), Digital-S (D9), DigiBeta, mini-DV, HD-CAM, HD-CAM SR, DVC-PRO HD, HDV)
- Film: cinema “film” is now distributed digitally as DCP files with all sorts of rights protection, on similarly lock-up hard drives. If they can be cracked open, the sound and images are already in files. The thing that makes dealing with DCP similar to the above video and audio formats, is the fact that a DCP doesn’t easily allow access to the files.
The good news: the starting point is already digital, so the ingest (or migration or transfer) process should be capable of more automation (and lower cost) that for working with analogue originals. The bad news: it gets complicated. See Digital Tape Preservation Strategy: Preserving Data or Video? By David Rice and Chris Lacinak – December 2, 2009 The basic complication is that there is no way to know what the bits are on the actual tape (there can be similar problems on CD and DVD, depending upon the playback device). For tape playback equipment (such as a mini-DV camera in playback mode) there is built-in correction of read errors, and the correction gets more sophisticated with more professional playback devices. But – often these devices have two outputs: one for a digital video signal that is as corrected as the device can make it, and an additional ‘digital data’ output that can have extra information to show where correction has been applied, making it easier to know what going on, and what data really was on the original media. So: here’s a two-headed monster. Two digital versions coming off one playback device — at the same time. Which is best for archiving? The obvious answer is to save the digital data version because it has more information — but that simple answer ignores the complications of a full workflow for a transfer. There may be quality analysis software that runs on a reconstructed Rec 601 video signal, for instance. As a generality: save the data version. Viewing and any further analysis should then be done from that data version. It is only legacy equipment that was originally designed for digitisation that has any real problems with a workflow based on digital data instead of digital video. Principles A= Save the original bits (a basic principle) B= Make an “archive preservation standard” version – if you want one master digital format for your archive and/or Make something useful: make a version that runs in all the applications you need (editor, player). If you can’t make one version that suits all your purposes, you may need to also make a ‘mezzanine’ version: a single starting point for the production of any other formats that you need. Files sit on mass storage; moving between coding types and file types can be automated. So it is not usually a major problem to have to change formats, or to produce a useful version on demand rather than at time of digitisation. The only caveat is to always go upward in quality when archiving: making an uncompressed master from DV doesn’t lose anything (except space) — but making a DV master from an uncompressed digital original would be definitely wrong (because DV is encoded at an 8:1 lossy compression). Practice Audio: ripping the data straight from audio CDs is standard, because usually the CDs are played in a computer CD player rather than in a dedicated device like a CD Walkman. The Walkman will have circuits to keep the audio from skipping while you jog. The computer doesn’t expect you to be computing while you jog, so no de-skipping circuits — and no problem about what to save because there is only the ‘digital data’ version. DAT tapes are different. These can only be played in dedicated equipment which may have extensive processing to keep the signal steady despite read errors. The AES-EBU output on professional equipment is thus a digital audio output, not raw data — but it will be the only output! High end professional equipment (eg Sony 7030) have a separate output to indicate uncorrected errors. There is a range of error types, which gets us into ‘known unknowns’ and ‘unknown unknowns’ territory which is beyond this one-pager. Minidisc could be ripped to get ‘the bits on the disc’ — but that takes special software, and more special software to play the result (because minidisc encoding isn’t any standard file encoding recognised by any standard playout software). So the PrestoPRIME recommendation is to archive minidisc content by capturing the uncompressed digital output of a minidisc player. This is the SP/DIF output that uses an optical cable. Even portable players generally have an optical output, which can be captured with USB sound cards costing as little as US$20. So in this case PrestoSpace recommencds capturing of the ‘digital audio’ not the ‘digital data’. Video: we’ve said ‘save the original bits’ but there will still be two cases. For the DV format, the original bits make a DV file that can be easily worked with by most audio applications — so the DV file is ‘something sensible and useful’. But if you have anything else in the collection besides DV tapes, DV is not a good choice for an overall ‘archive preservation standard’ version. Many digital storage media dedicated to video can be cloned (and exact copy can be made on a computer): DVDs and any other optical media, memory cards, hard drives and the entire family of DV related videotape. The clone (the original bits) should be saved. Howeve these formats use a wide range of encoding methods, many are proprietary and all use lossy compression. The best route to a ‘master format’ if one is needed will depend on the particular type of video on the particular type of carrier —technical advice will be needed! We hope PrestoCentre can develop a simple roadmap, but the terrain is complicated! For many other formats (eg Digibeta, HD-CAM SR) the original bits simply cannot be accessed. There will be a Rec 601 or REC 709 digital output, and that should be saved. HD formats are just beginning to hit the archive, and there are real problems. As with DVDs, there is a range of native formats (from tapes, memory cards and hard drives) that all should be cloned (where possible), and all will probably be problems in the future. Because of the data sizes, saving HD as uncompressed is a hard case to sell in 2013. HD-CAM SR has a native datarate of 400 Mb/s, but many broadcasters are pushing the archive to save at the production format of 100 Mb/s. This violates two basic archive principles: save the original; don’t reduce the quality. The solution to archiving HD will have to be the subject of a future FAQ, when a solution emerges. Film: the basic issue is unlocking a protected DCP (Digital Cinema Package), but once unlocked, the content is compressed in a lossy fashion. The workflow for acquisition of digital cinema content for true archiving and preservation (not just for access copies) needs to find a way to acquire the original digital materials, not just the DCP version. Further information on DCP packages, their problems and solutions, will have to be the subject of a separate FAQ.
Many discussions about storage concentrate on devices: the pro’s and con’s of hard drives, or hard drives on shelves, or data tape of various formats — and companies are still advertising optical media for archive storage. Since these answers are meant to be short — the short answer about storage devices is that they all have problems. Asking which kind of storage to use is the wrong question. The issue is: how to manage content — on whatever kind of storage. Every option has risks, and the key is active storage management, which is another form of a maintenance requirement. The essence of preserving any collection, analogue or digital, on shelves or ‘in the cloud’, is a continuous programme of maintenance. Preservation has to happen every day. Here are some basic principles: 1) two copies is the absolute minimum requirement for protection against risk. Even the slightest risk will eventually result in loss of content if there is only one copy. 2) checking is necessary on a regular basis, to make sure you still have (at least) two good copies. 3) checking requires fixity information; a sizeable collection isn’t checked by manually opening files and watching them and listening to them. Checking needs to be automated, and relies on a calculation that ensures simply that ‘no bits have changed’ since the file was originally checked by some more intelligent process. Given a commitment to multiple copies and regular checking, there are still decisions about cost and performance. If material has to have fast access then disc storage or a tape robot are needed. If an archive can get by with access within hours rather than seconds, data tape can be kept on shelves. The PrestoPRIME project gathered statistics from many large studies of storage devices, and came up with a clear finding: data tape is cheaper and more reliable than disc drives. [Threats to data integrity from use of large-scale management environments, p37] PrestoPRIME also produced an online Storage Planning Tool that will show how often content needs to be checked in order to reach a desired quality assurance level. This tool will deal with complex storage strategies mixing discs and tape (or mixing any storage method that has known performance statistics, including cloud services if they have known and verifiable reliability statistics) and allowing ‘what if’ calculations to estimate the cheapest way to achieve a set level of reliability. The approach is statistical: it needs to know the failure rates for a given technology. From the failure rate, the frequency of checking and the number of copies — the cost and effectiveness of a strategy can be calculated. The result will be the number of files damaged or completely lost, over a period of time (such as 20 years). There is no point at all asking for a guarantee of no loss — that can’t even be calculated. If there is any risk at all, then there is a finite probability of loss. The whole statistical approach to quality assurance, in archives as anywhere else, is the ‘number of nines’ in the probability that content will NOT be lost: it could be ‘4 nines’ = 99.99% safe. or ‘7 nines’ = 99.99999% safe or even ‘9 nines’ = 99.9999999 % safe. Storage strategies and costs can be calculated to achieve a required ‘number of nines’ — but there is no way to apply a statistical approach to achieving an assurance of 100%. “I don’t ever want to lose anything’ is a statement of an ideal or an aspiration. “I must have ‘7 nines’ of risk protection” is a statement that an engineer can work with, to design a storage and checking strategy that meets ‘7 nines’.