Module 5, Cont: Metadata (again)

Posted by arlekeno on August 21, 2012

Starting with METADATA (an overview) Dr Warwick Cathro NLA

My definition is that “an element of metadata describes an information resource, or helps provide access to an information resource”.

I love this line, ” Information stored in the “META” field of an HTML Web page is metadata, associated with the information resource by being embedded within it. The indexing data held by Web crawlers is also metadata (though not very good metadata) – linked to the information resource through the URL.”  Just coz its a potshot at web crawlers 😛

good def of precision and recall:

At library school, we learnt to measure information retrieval in terms of recall and precision. If we miss a lot of relevant information, we have poor recall. If we get flooded by a lot of irrelevant information, we have poor precision. In certain circumstances (such as searches for patents) very high recall is essential. However, in most circumstances, searchers would be content with a small number of relevant documents, and would be willing to scan through a few dozen citations to identify them. Recall and precision factors of 10-20% are often acceptable for most purposes.

Then talks a bit about how web pages work, pretty interesting stuff.

I can’t understand the warwick framework at all!

The Minimalist/structuralist issue:

Ok, in short, people who want it simple, universal and easy to use. V those who want it more complicated and harder to use  but more accurate.

Qulaifiers which may ( or may not) improve use of Dublin Core.

In this context, it is necessary to explain that there are three kinds of qualifiers. One kind, known as TYPE, refines the meaning of the field. Thus, “personal” and “corporate” are TYPEs which, if present, narrow the meaning of the CREATOR field. (This is usually expressed in so-called dot notation, as “Creator.Personal” or “Creator.Corporate”). Another kind of qualifier, known as SCHEME, explains the meaning of the data contained in the field. For example, “LCSH” is a SCHEME which helps to interpret the content of the SUBJECT field. These qualifiers – SCHEME and TYPE, along with a third one which denotes the language of the content of the field – are known collectively as the “Canberra Qualifiers”.

Specific structural proposals

In my view, probably ten of the fifteen Dublin Core elements could use unqualified free text as their default value, with a SCHEME being an optional addition. Something like five elements appear to require either a SCHEME or an authorised list of values as the default standard. One of these, RESOURCE IDENTIFIER, we have already discussed. The other four which probably require some structure in their default mode are DATE, RESOURCE TYPE, LANGUAGE and COVERAGE. The use of free text words in these five elements will probably fail to deliver satisfactory search precision. For example, a RESOURCE TYPE can be expressed in many different ways (article, paper, contribution, etc.) and without a controlled


As the quantity of information on the World Wide Web multiplies rapidly, it will become increasingly difficult to retrieve information, with reasonable precision and recall, using the major search and harvesting engines. The use of metadata, combined with the use of improved harvesting processes, has the potential to improve retrieval of these information resources.

ActivityVisit Metadata and use the find facility in your web browser (e.g., in Internet Explorer use Ctrl+F) to search on the word Warwick. The result from this search lead you to three kinds of occurrences of the word Warwick.

  • What are they?
  • Were the results of this search very precise?
You might also like to use the find facility to search on other web pages which contain a lot of text, and identify the kinds of issues which arise.

Well, The Name of the Author, the name of the University and the name of the framework. I think it is pretty precise ( provided you spell the name right). And it was not too long a doc.

Going on with the notes. The problem of Homonyms is the one I didn’t think about. The Synonym or alternate spellings though seemed pretty obvious after all the work with subject headings and alternating between spell checkers. Anyway, Fulll text searches not 100%.

Anyway, many of these problems can be dealt with by Searching surrogates (abstracts, or catalogue records). If you’ve someone to make


Searching on Keywords: From the title, cheap, easy, but not standardized. And not always clear from the title. E.g. How do I know a Bridge too far is about WW2?  but useful if you know the title-ish

I just did the Activity searching for keywords in title were pretty good, occasionally you get a few odd ones turn up, but mostly it worked.

Finally controlled Vocabs ( e.g. LoC SH) good results, but prescriptive and a lot of time to make.

TO Chapter 8  of HIDER: Alphabetical subject access mechanisms:

PG 134. Advantages and Disadvantages of Controlled Vocabulary indexing, derived indexing and free indexing languages.

PG 135. Definitions for Exhaustivity, Specificity, coextensive entry, relevence-recall-precision, and pre-cordinate and Post-cordinate . VERY IMPORTANT!!!

What the hell is a scope note?


Excercise 3:



Indexing Note
May subdiv. geog.
Specific Example Note
See also names of individual abbeys*, e.g. Westminster Abbey.
Broader Term
Church architecture
Church history
Religious communities
Related Term


Abominable snowman

Used For
Broader Term
Unexplained phenomena
Related Term

Aboriginal children

Audiovisual aids

Specific Example Note
Use names of subjects with the subdivision Audiovisual aids, e.g. Social sciences – Audiovisual aids.


Specific Example Note
Use types of substances* and names of chemicals* with the subdivision Analysis, e.g. Food – Analysis.

Ok, I think I have it, ANalysis can only be used as a subdivision, Abduction is non approved, but Abbeys can be used.

From the SCIS overview on Scope notes. SN (Scope Notes)
Scope Notes are included in the list where needed to explain the meaning of the heading for the purpose of using it in the catalogue. The differences between similar headings or the limits of the application of the heading are often explained. IN (Indexing Notes)
Indexing Notes provide instructions to the cataloguer on the use of the heading, for example allowing subdivision following the examples in the model headings or allowing subdivision geographically. SEN (Specific Example Notes)
As it is not practical to include all possible headings in a list of this type, several examples of headings, which have been constructed according to instructions given in a Specific Example Note, are included throughout the list. A decision has been made not to include these headings as narrower terms under the main heading but to include them instead as examples.

Exercise 4
Use scope notes at headings to assist you in determining the subject heading for each of these topics.a. Animated films featuring animals
b. Handwriting as an expression of the writer’s character
c. Native plants of Australia
d. Teacher education
e. Designing gardens
f. A collection of myths and legends
g. Films made by children.

a) Film Animation – animals
b) graphology, c) Native plants – Australia. d) Teachers – Training d) landscape gardening, f) Folklore. g) Children as film makers.

3.1.2 UF and USE references
The two symbols UF (Used For) and USE provide the user with as many access points as possible to an allowed heading, alternative terms or similes. They may be represented in the list as non-allowed terms that direct the user to the allowed term. The UF and USE symbols facilitate this access. BT (Broader Term)
The symbol BT provides the user with allowed headings in the list which are more general in concept than the main heading. They identify the broader context(s) of the main heading. NT (Narrower Term)
The symbol NT provides the user with allowed headings that identify a more specific facet of the main heading. RT (Related Term)
The symbol RT provides the user with allowed headings that are associated with the main heading in some way other than hierarchically.


