|
The scanned articles have been converted by an Optical
Character Recognition software program to make the resulting PDF files
searchable.
The scanner is a Canon CanoScan 9900 F and the resolution varies between 400
and 600 dpi depending on the quality of the original. The smaller the text,
the higher the resolution, to help the OCR software identify the text with a
minimum of errors.
The OCR program is Adobe Acrobat Capture. Unlike most OCR software, this can
preserve the scanned image as is, instead of converting it to an entirely
new file. The searchable text is a layer behind the image, as it were. To
simplify matters for the software and at the same time keep the size of the
files down, the scanned pages have been converted to b&w (instead of
grayscale). An exception has been made for pages that contain illustrations
or photographs. In such cases the text has been converted to black and white,
but we have retained the grayscale for the page as a whole. This explains
why some of the PDF's (those with images) are much bigger than others.
One of the advantages of keeping the original scanned picture is that
OCR-related errors are less critical, since they are not visible on the
screen.
We are well aware that errors occur in the underlying texts. Only words that
the OCR program has flagged as "suspects" have been corrected. Further
proofreading would have been too time-consuming. This, of course, makes
searchability less than 100% so we ask you to keep that in mind as you work
with these files. Needless to say, we will appreciate any corrections. Lena Forsgren |
|