Metadata

1. Introduction: Metadata was traditionally used in the card catalogues of libraries. As information has become increasingly digital, metadata are also used to describe digital data using metadata standards specific to a particular discipline. The term “metadata” was coined in 1968 by Philip Bagley, in his book “Extension of programming language concepts” where it is clear that he uses the term in the ISO 11179 traditional sense, which is structural metadata i.e. data about the containers of data; rather than the alternate sense “content about individual instances of data content” or metacontent, the type of data usually found in library catalogues.

Metadata shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives. A metadata about a book may contain information about the name of the book, author of the book, publisher of the book, year of publication, ISBN, and a short summary of the document.

 

2. Definition: Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. Metadata is data that describes an object. Metadata summarizes basic information about object, which can make finding and working with particular instances of object easier. Metadata provide information about one or more aspects of the object, such as - means of creation of the object, purpose of the object, time of creation, author of the object, location of the object, and standards used in creating the object.

The American Library Association (ALA) Committee on Cataloging: Description and Access (CC:DA) presented the formal working definitions for the three terms-metadata, metadata schema and interoperability-

Metadata are structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities.

A metadata schema provides a formal structure designed to identify the knowledge structure of a given discipline and to link that structure to the information of the discipline through the creation of an information system that will assist the identification, discovery, and use of information within that discipline.

Interoperability is the ability of two or more systems or components to exchange information and use the exchanged information without special effort on either system.

NISO's Understanding Metadata" (2004) defines metadata as: "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource".

 

3. Types of Metadata: There are three main types of metadata:

a) Descriptive Metadata: Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, author, keywords, abstract and so on.

b) Structural Metadata: Structural metadata is about the design and specification of data structures or data about the containers of data. Structural metadata indicates how compound objects are put together. For example, how pages are ordered to form chapters.

c) Administrative Metadata: Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; two that are sometimes listed as separate metadata types are:

i) Rights management metadata, which deals with intellectual property rights,

ii) Preservation metadata, which contains information needed to archive and preserve a resource.

4. Uses of Metadata: The following are some of the uses of metadata-

a) Systematic Arrangement: Metadata helps in bringing similar resources together, distinguishing dissimilar resources and thus helps in the organization of electronic resources based on audience or topic.

b) Discovering Relevant Information: The main purpose of metadata is to facilitate in the discovery of relevant information (resource discovery) by allowing resources to be found by relevant criteria.

c) Continuous Identification of Information: Metadata helps identifying resources by providing additional information about the data such as ISBN, Persistent URL (PURL), Digital Object Identifier (DOI) and so on.

d) Facilitating Interoperability: Using defined metadata schemes, shared transfer protocols, and crosswalks between schemes, resources across the network can be searched more seamlessly. Cross-system search, e.g., using Z39.50 protocol; Metadata harvesting, e.g., OAI protocol.

e) Archiving and Preservation: Metadata is key to ensuring that resources will survive and continue to be accessible into the future.

 

5. Areas of Use of Metadata in Library and Information Science: Metadata particularly useful in images, audio, video, where information about its contents (such as transcripts of conversations and text descriptions of its scenes) is not directly understandable by a computer, but where efficient search is desirable. In the library environment, the metadata is used in following instances-

a) Uses in Cataloguing: Metadata has been used in cataloging in the libraries. Library catalogues used 3x5 inch cards to display a book’s title, author, subject matter, and a brief plot synopsis along with an abbreviated alpha-numeric identification system which indicated the physical location of the book within the library’s shelves. Such data help classify, aggregate, identify, and locate a particular book.

b) Uses in Integrated Library Management System: Libraries employ metadata in the library catalogue of books, periodicals, DVDs, images, audio, and video as part of an Integrated Library Management System. These data are stored by using the MARC metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question.

c) Uses in Digital Libraries / Institutional Repositories: More recent and specialized instances of library metadata include the establishment of digital libraries including institutional repositories as an integral part of institutional repository software.

 

6. Metadata Standard: Metadata standards are requirements which are intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. There are literally hundreds of metadata schemas to choose from and the number is growing rapidly, as different communities seek to meet the specific needs of their members. Each metadata schema usually has three main characteristics- A limited number of elements, the name of each element, and the meaning of each element. Following are some of the metadata standards used in libraries-

a) MAchine Readable Cataloging (MARC): MARC is a standard for the representation and communication of bibliographic and related information in machine-readable form. It was developed by Henriette Avram at the US Library of Congress during the 1960s to create records that can be used by computers, and to share those records among libraries.

b) Dublin Core: The Dublin Core Metadata Element Set arose from discussions at a 1995 workshop sponsored by OCLC and the National Center for Supercomputing Applications (NCSA). As the workshop was held in Dublin, Ohio, the element set was named the Dublin Core. The Dublin Core metadata (https://www.dublincore.org) terms are a set of vocabulary terms which can be used to describe resources for the purposes of discovery. The Dublin Core set of metadata elements provide a small and fundamental group of text elements through which most resources can be described and catalogued. It can describe physical resources such as books, digital materials such as video, sound, image, or text files, and composite media like web pages. Metadata records based on Dublin Core are intended to be used for cross-domain information resource description and have become standard in the fields of library science and computer science. Implementations of Dublin Core typically make use of XML and are Resource Description Framework based.

The original Dublin Core Metadata Element Set consists of 15 metadata elements known as the Dublin Core Metadata Element Set. They are Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights. The fifteen-element Dublin Core achieved wide dissemination as part of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and has been ratified as IETF RFC 5013, ANSI/NISO Standard Z39.85-2007, and ISO Standard 15836:2009.

Dublin Core Metadata Registry (https://www.dublincore.org/groups/languages/registry/) is a database where metadata are stored and managed often called as metadata registry. A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method. The Dublin Core Metadata Registry is designed to promote the discovery and reuse of properties, classes, and other types of metadata terms. It provides an up-to-date source of authoritative information about Dublin Core® Metadata Initiative (DCMI) metadata terms and related vocabularies. The registry aids in the discovery of terms and their definitions and shows relationships between terms.

The continuing development of the Dublin Core and related specifications is managed by the Dublin Core Metadata Initiative (DCMI). The Dublin Core Metadata Initiative (DCMI) is an open organization engaged in the development of interoperable metadata standards that support a broad range of purposes and business models.

c) Metadata Object Description Schema (MODS): MODS is an XML-based bibliographic description schema developed by the United States Library of Congress Network Development and Standards Office. MODS were designed as a compromise between the complexity of the MARC format used by libraries and the extreme simplicity of Dublin Core metadata.

d) Metadata Encoding and Transmission Standard (METS): The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.

e) XML Organic Bibliographic Information Schema (XOBIS): XOBIS is a XML schema for modeling MARC data. XOBIS primarily concerns information objects and their relationships. An information object (represented by a name, an identifier, optional variant name(s) and perhaps descriptive text) is best characterized by its set of formal relationships to other information objects. Such named relationships may carry attributes for type, strength, duration, etc.

f) Online Information Exchange (ONIX): ONIX is an international standard for representing and communicating book industry product information in electronic form. ONIX is an XML-based standard for rich book metadata, providing a consistent way for publishers, retailers and their supply chain partners to communicate rich information about their products.

g) ISO standard Digital Object Identifier (DOI): A digital object identifier (DOI) is a character string (a “digital identifier”) used to uniquely identify a digital object, such as an electronic document. Digital Object Identifier provides a system for the identification and hence management of information (content) on digital networks, providing persistence and semantic interoperability.

h) ISO/IEC 11179: ISO/IEC 11179 (formally known as the ISO/IEC 11179 Metadata Registry (MDR) standard) is an international standard for representing metadata for an organization in a metadata registry. ISO/IEC 11179 Standard describes the metadata and activities needed to manage data elements in a registry to create a common understanding of data across organizational elements and between organizations. ISO/IEC 11179 says that it is concerned with traditional metadata.

i) Resource Description Framework (RDF): RDF is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

j) PREservation Metadata: Implementation Strategies (PREMIS): PREservation Metadata: Implementation Strategies (PREMIS) was an international working group concerned with developing metadata for use in digital preservation. In 2003 the Online Computer Library Center (OCLC) and Research Libraries Group (RLG) established the PREMIS working group, which consisted of a multi-national roster of more than thirty representatives from the cultural, government, and private sectors, in order to define implementable, core preservation metadata, with guidelines/recommendations for management and use.

k) ISO 2709: MARC 21 defines a standard for the markup of bibliographic data.  ISO 2709 defines how the marked up record is formatted so that it can be read by computer programs and can be transferred among computers.  ISO 2709 is usually referred to as the MARC communications format.

l) Anglo-American Cataloguing Rules: The Anglo-American Cataloguing Rules are a national cataloging code first published in 1967. AACR2 stands for the Anglo-American Cataloguing Rules, Second Edition.

m) International Standard Bibliographic Description (ISBD): ISBD is a set of rules produced by the International Federation of Library Associations and Institutions (IFLA) to create a bibliographic description in a standard, human-readable form, especially for use in a bibliography or a library catalog.

7. Metadata Harvesting: Metadata Harvesting is an automated, regular process of collecting metadata descriptions from different sources to create useful aggregations of metadata and related services. OAI-PMH is a popular protocol for metadata harvesting.

a) Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH): The Open Archives Initiative (OAI) is an attempt to build a low-barrier interoperability framework for archives or institutional repositories containing digital content. It allows service providers to harvest metadata from the data providers. The collected metadata thus obtained is used to provide value-added services.

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed by the Open Archives Initiative. It is used to harvest (or collect) the metadata descriptions of the records in an archive so that services can be built using metadata from many archives. An implementation of OAI-PMH must support representing metadata in Dublin Core, but may also support additional representations. The OAI Protocol has been widely adopted by many digital libraries, institutional repositories, and digital archives.

A number of software systems support the OAI-PMH, including GNU EPrints from the University of Southampton and DSpace from MIT. Commercial search engines have started using OAI-PMH to acquire more resources. Google has started to accept OAI-PMH as part of their Sitemap Protocol, and they are using OAI-PMH to harvest information from the National Library of Australia Digital Object Repository. In 2004, Yahoo! acquired content from OAIster (University of Michigan) that was obtained through metadata harvesting with OAI-PMH. The mod_oai project is using OAI-PMH to expose content to web crawlers that is accessible from Apache Web servers. A number of large archives support the protocol including arXiv and the CERN Document Server.

 

8. Creation of Metadata: Metadata can be created either by automated information processing, using template or by manual work. The metadata can be created and collected at point of creation of a resource or at point of publication. There are many such tools available and the number continues to grow. Such tools can be standalone or part of a package of software, usually with a backend database or repository to store and retrieve the metadata records Elementary metadata captured by computers can include information about when an object was created, who created it, when it was last updated, file size, and file extension.

a) Automated Means: TEI Software (https://tei-c.org/) provides a list of links to software for creating, managing, and processing TEI documents in SGML or XML.

b) Using Templates and Other Tools

i) DC Dot: In DC-Dot’s Dublin Core metadata editor (http://www.ukoln.ac.uk/metadata/dcdot/) one can submit any webpage’s URL and get a suggested metadata record in XHTML format, then use the template to edit the record. This service will retrieve a Web page and automatically generate Dublin Core metadata, either as HTML tags or as RDF/XML, suitable for embedding in the section of the page. Different output formats are available.

ii) DCmeta: DCmeta (http://www.dstc.edu.au/RDU/MetaWeb/generic_tool.html). Developed by Tasmania Online. It is based on SuperNoteTab text-editor and can be customized.

ii) HotMeta: HotMeta (http://www.dstc.edu.au/Research/Projects/hotmeta/). A package of software, including metadata editor, repository and search engine.

c) Manual Creation: Manual creation tends to be more accurate, allowing the user to input any information they feel is relevant or needed to help describe the file.

 

9. Metadata Issues: In metadata three most common issues are –

a) Metadata are Mostly Written Incorrectly: Incorrectly written metadata can’t be read in the same way it was written. If you don’t use the right tools to correctly describe your files, it doesn’t matter how meticulous you are—the files will be ineffective. Other systems will be unable to correctly import and see your asset’s information.

b) Little Metadata Information are Provided About an Asset: Due to a lack of resources (time, workforce, funds); a lack of software with built-in automated cataloguing functions; or a lack of convenient tools to help a user quickly describe an asset’s content, metadata are not properly used to describe an asset. Assets embedded with little or no information can mean difficult and tedious searches.

c) Metadata may Leads to Loss of Assets: Metadata helps others to illegally use your data. So there is a potential for lost assets and makes your assets invisible for searches.

 

10. Conclusion: Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. The primary purpose of metadata is to facilitate discovery of electronic resources.  Metadata helps information seekers to provide with a more refined search than is possible with conventional search engines without the use of metadata.  Metadata harvesting is an automated, regular process of collecting metadata descriptions from different sources to create useful aggregations of metadata and related services.


How to Cite this Article?

APA Citation, 7th Ed.:  Barman, B. (2020). A comprehensive book on Library and Information Science. New Publications.

Chicago 16th Ed.:  Barman, Badan. A Comprehensive Book on Library and Information Science. Guwahati: New Publications, 2020.

MLA Citation 8th Ed:  Barman, Badan. A Comprehensive Book on Library and Information Science. New Publications, 2020.

Comments