Semantic Interoperability and Communities of Practice
Norm Friesen norm.friesen@ualberta.ca; February 5, 2002

Abstract

The vision of reusable digital learning resources or objects, made accessible through coordinated repository architectures and metadata technologies, have been gaining considerable attention within education and training communities.  However, the pivotal role of metadata in this vision --and in more general conceptions of the semantic Web-- raises important and longstanding issues about classification, description and meaning.  These issues are of special importance in indexing educational resources, where questions of their application and relevance to particular learning contexts often supersede more conventional forms of access such as author, title or date.  This paper will look at the exceptional role that metadata plays on the Web --namely as an intervention of human intentionality and meaning in a context that is otherwise dominated by syntax, protocol, string matches and search algorithms.  It will look at questions of semantic interoperability from the perspective of communities and practice, as these are relevant to the Web in general and to its educational application in particular.  

Metadata as Meaning

Access or discoverability of Web-based resources has typically been facilitated through the use of search services or engines such as AltaVista or Hotbot.  In the simplest terms, these services make Webpages and Websites discoverable by finding matches between the character-combinations or "strings" entered by the searcher, and those occurring somewhere in the textual contents of Web documents.  The problems that this technology presents to users in general and educators in particular are both familiar and manifold:  tens or hundreds of thousands of "matching documents" are retrieved in response to almost any search string; educationally appropriate resources are difficult to find and evaluate; and multimedia or interactive content is not directly searchable.  The inadequacy of this search technique springs, in part, from the fact that it ultimately works only with mere character combinations, matching those typed as searches against those occurring in Web pages.  These search services have no way of understanding or registering the significance of these character combinations or the potential purpose or value of the resources they identify.  In other words, these services only recognize the formal properties or the appearance of words, seeing them simply "formal squiggles" (Dreyfus, 2001).  

What has been widely suggested as a solution to these problems is to turn attention to the actual meanings of the words in Web documents, and to focus on the purpose or significance of the pages or resources themselves.  Attempts to capture these meanings have become the raison d'ętre of Web-based descriptive metadata.  "If there is a solution to the problem of resource discovery on the Web, it must surely be based on a distributed metadata catalog model" (Gill, 2001; p.7).

In this sense, metadata would function in a manner similar to a card or record in a library catalogue, providing controlled and structured descriptions through searchable "access points" such as title, author, date, location, description and subject.  But unlike library catalogue records, these metadata records are expected to provide information on the potential education application of resources.  In further contradistinction to the card catalog example, a metadata record could either be located separately from the resource it describes, or be embedded or packaged with it.  Also, many visualize this metadata as being distributed across the Web, rather than collected in a single catalog.  Others extend this vision even further to include a number of other distributed mechanisms that involve logically rigorous ways of describing the relations between these resource descriptions.

What is important to emphasize at this point is that the use of metadata inserts a layer of human intervention and interpretation into Web-based search and retrieval processes.  This is layer where words are emphatically not just understood as "formal squiggles" that match other formal character strings, but as actual bearers of meaning and significance.  When searching metadata --whether it is distributed across the Web or collected in a conventional library catalogue-- documents and other resources are seen as relevant to a given search not because of the letter or word combinations they contain.  Instead, their value and purpose is assessed only according to the way they are represented and interpreted in the metadata that describes them.   In this new vision of the Web, a resource would be determined as relevant to a specific subject or category not as a direct result of its contents, but as a direct result of the way a metadata creator has understood its relevance. 

The "Aboutness" of Metadata

This shift in emphasis implied in this application of metadata can be understood in terms of a shift from data manipulation and processing to the creation, interpretation and assessment of information or knowledge.  Data, information and knowledge are often conceived of as forming a hierarchy, where each successive layer is differentiated from the last through a process of interpretation and mediation.  Merriam-Webster defines data as "information in numerical form that can be digitally transmitted or processed" (2001) --in other words, as pure, un-interpreted fact, perception, signal or message.  Information, as characterized by management guru Peter Drucker, is data that is "endowed with relevance or purpose" (1988; p. 4).  Information, in other words, can be said to form the contents of the data signal or message.  Knowledge, finally, is defined in terms that associate it even more closely with human understanding, intention and purpose:  As 1) "the fact or condition of knowing something with familiarity gained through experience or association," 2) "acquaintance with or understanding of a science, art, or technique," or 3) "the fact or condition of being aware of something" (Merriam-Webster, 2001).

In this context, to characterize an interpretation of the meaning or purpose of a digital resource as "metadata" seems misleading.  For in order to be "about" something --or to deserve the prefix "meta"-- data needs to be endowed with purpose and relevance.  To acquire relevance or "aboutness," raw datum needs to be transformed into interpreted information or knowledge --on their own, the 1's and O's of a digital description (or any other digital resource) are not "about" anything in particular.  In this sense, metadata as data that has significance or is "about something" is a contradiction in terms.  Only by clearly indicating and understanding how metadata is to function as a complex description of resource meanings, purposes and contexts, will it be possible to realize the potential of specifications, profiles and technologies developed for metadata.

 Example: Learning Resource Types

An example of the complexity of metadata creation as an interpretive undertaking is provided by elements in educational metadata schemes dealing with various "types" or "genres" of educational resources.  In the IMS metadata model, this is represented by element 5.2 Educational.LearningResourceType, which has the following "best practice" vocabulary:  "Exercise, Simulation, Questionnaire, Diagram, Figure, Graph, Index, Slide, Table, Narrative Text, Exam, Experiment, ProblemStatement, SelfAssesment" (IMS, 2001).  In the case of this vocabulary, as well as the alternatives put forward by the GEM, EdNA, and MERLOT, two different sorts of categorization are brought together or conflated.  On the one hand, it includes terms that describe the formal properties of a resource --like "slide", "table", or "narrative text."  On the other hand, it also includes values that speak to the pedagogical application of a resource --as is the case with terms like "exercise", "self assessment" or "exam."  As a further complication, it would be easy to conceive of a table or narrative text as making up a significant part of an exercise or exam, or as serving an entirely different pedagogical purpose.  This, in turn, raises the further problem of specifying how a user will actually learn from a particular resource.  The educational "genre" or activity associated with any learning asset or object is highly context-dependant, and is very difficult to anticipate in advance. 

Language: Sign System or Social Process?

Contrary to what the word "metadata" might imply, the use of terms to describe a learning resource type --or any other aspect of a resource-- is a necessarily reflection of the interpretation and judgment of the indexer.  Peter Drucker's earlier definition of the term "information," quoted now at greater length, further indicates that this interpretation and judgment does not occur in isolation:

Information is data endowed with relevance and purpose.  Converting data into information thus requires knowledge.  And knowledge, by definition, is specialized. (In fact, truly knowledgeable people tend toward overspecialize-tion, whatever their field, precisely because there is always so much to know.) (Drucker, 1988; p. 4)

It is significant that Drucker mentions specialization or expertise as being constitutive of knowledge:  For it is specialization or expertise that locate the individual knower in a particular context, and often, in specific and concrete practices and communities.  One could extend this argument even further by saying that the conversion of data to information or knowledge that happens in the interpretive process of metadata creation implies a number of commonplace social and human realities:  It entails personal involvement in and commitment to specific practices, and participation in a community of those with similar or complimentary understandings.   

All of this seems to suggest that the significance of words and descriptions in metadata may not be so much a matter of clear and unambiguous definition --as one might be led to believe from the highly technical orientation of many metadata specifications.  Instead, it is more a matter of doing, acting, and belonging.  Establishing the meaning of words is, then, not so much a matter of definitional or analytical rigor, but of simply doing and of using words.  This is especially the case for resources with an educational purpose, where the potential significance and application of a learning object is very much dependant on a context of action and practice.

Significantly, this general position is one that is frequently articulated by experts in the emergent field of organizational ethnography.  For example, in his book Communities of Practice: Learning, Meaning and Identity, Etienne Wenger emphasizes that meanings arise through practice and engagement with everyday concerns:
 

This focus on meaningfulness is… not primarily on the technicalities of "meaning."  It is not on meaning as it sits locked up in dictionaries.  It is not just on meaning as a relation between a sign and a reference….  Practice is about meaning as an experience of everyday life (Wenger, emphasis in original). (1999; pp. 51-52)

Wenger underscores the fact that everyday life is above all social and participatory.  He describes this "social participation" as follows:

[It] refers not just to local events of engagement in certain activities with certain people, but to a more encompassing process of being active participants in the practices of social communities and constructing identities in relation to these communities…. Such participation shapes not only what we do, but also who we are and how we interpret what we do. (1999; pp. 4)

The meaning of any set of terms, and the significance and utility of any taxonomy, according to Wenger, can be evaluated only in the context of a community whose members are involved in similar activities and share similar values.  Wenger calls this process the "negotiation of meaning:"  The production of meanings "that extend, redirect, dismiss, reinterpret, modify or confirm… the histories of meanings of which they are a part." (Wenger, 1999; p. 53)

Z39.50: A Lesson in Interoperability

In short, semantic interoperability is tied directly to communities of practice, and to the negotiation of meaning that occurs within them.  The Z39.50 protocol represents an attempt to achieve semantic (and other) interoperability both within and between such communities.  This particular protocol, now under development for more than 20 years, defines query and retrieval functionality for searching across multiple databases from a single point of access.  William Moen, co-author of the "Bath Profile" for Z39.50, suggests that interoperability is something that varies with the degree of commonality between communities:  "… the degree of interoperability between information systems may be dependent on the distance between communities whose information systems attempt to interact" (Moen, W. E. 2001).  Moen goes on to explain that

Within a community or domain, relative homogeneity reduces interoperability challenges.  Heterogeneity increases as one moves outside of a focal community/domain, and interoperability is likely [to be] more costly and difficult to achieve (Moen, W. E. 2001).

Moen also outlines a number of levels of community commonality or homogeneity, as well as a number of relationships that can exist between communities.  Following the example of a diagram provided by Moen specifically for the Z39.50 protocol, the relationships and identities of a number of communities involved in education and research might be schematized in the following way:

According to Moen, communities fall into three categories: Focal, Extended and Extra.  Although not explicit in the diagram, focal communities must be understood as being constituted through internal intra-community relationships.  As Moen suggests, focal communities generally have a high degree of homogeneity, with clearly defined interests, memberships and common understandings.  If two or more of these focal communities are able to identify common interests and values, they can establish an inter-community relationship.  Together, these related communities constitute an "extended community" --one that possesses less homogeneity than a focal community, but that still shares some ways of understanding and defining meaning.  Once they have negotiated common interests and understandings, it is possible for two extended communities to enter into their own inter-community relationship.  Finally, when an individual community is brought into relation with an extended community, that individual community is known as an "extra community."

As the above diagram shows, the focal communities constituted by schools, technical colleges and universities together would together form an extended community for specifically educational metadata.  As such, this extended community would be able to form a relationship with the extended "cultural heritage" community (constituted, in turn, by the related focal communities of museums and archives).  Either of these extended communities would then be able to form relationships with the "extra community" constituted by the geospatial focal group.  

Intra-community relationships that constitute focal communities and inter-community relationships that can be formed between communities are quite different in nature.  Both types of relationship are formed through different processes, and both entail the use of different technologies.  Both will also likely be able to realize varying levels of interoperability.

Communities and Technical Mechanisms

Relationships that constitute a focal community in and of itself can be formed through relatively direct agreement on semantics and other forms of interoperability.  Members in these communities can rely on an existing set of shared meanings and can make use of established community mechanisms (governing bodies, conferences, and special interest groups, for example) to define the semantics for information interchange.  The key technological form that helps define these semantics is the XML DTD (Document Type Definition) or more recently, a more flexible means of XML document type definition known as the "schema".  These largely syntactic frameworks are provided by the XML standard for the reliable exchange of customized documents.

Inter-community relationships --bridging separate focal, extended or extra communities-- are more difficult to develop and define.  Members cannot count on shared sets of meanings and practices. There are a limited number of governing bodies, conferences and other organizational mechanisms that straddle multiple communities and that would be ready to facilitate the creation of such relationships.  In addition, the meanings encoded in a DTD or schema for use within one community are defined only implicitly.  This can makes their translation into other communities and contexts very difficult (Heflin, J. & Hendler, J. 2000). 

The development of inter-community interoperability, as a result, should emphasize matters of community identity and goals, and semantic negotiation and explication.  However, the solutions actually suggested for the creation of such relationships -- XML transformations, ebXML, or RDF and ontology structures-- all emphasize formalization, syntax and data processing.  They pay scant or no attention to issues such as the negotiation of meanings and identities.  For example, notions of "degrees" of interoperability or community homogeneity (conceptions that have proven so important to the limited success of Z39.50) seem to find no mention in the literature advocating these novel interoperability technologies. 

Conclusion

The goal of increased interoperability both within and between communities will clearly not be achieved through further formalization and abstraction.  What will bring this goal closer is increased negotiation within, but especially between communities.  Techniques of accomplishing this, such as "domain analysis" (Nielsen, M. L. 2001) and the identification of "boundary objects" (Bowker, G. C. Star, S. L. 1999; p. 196-198) have already been developed and used with some success in other fields.  If interoperability is to be established between educational communities or across a semantic Web of disparate resources, it seems likely that organizational and descriptive rather than technical supports are the key ingredient that is currently lacking.  

References

ARIADNE Alliance of Remote Instructional Authoring and Distribution Networks of Europe (1999). ARIADNE Educational Metadata Recommendation. [Web Page]. URL http://ariadne.unil.ch/Metadata/

Blair, D.C. (1990). Language and Representation in Information Retrieval. Amsterdam: Elsevier.

Bowker, G.C. and Star, S. L. (1999). Sorting Things Out. Cambridge, Mass: MIT Press.

Dreyfus, H. (2001). On the Internet. London: Routledge.

Drucker, P. F. (1988). The Coming of the New Organization. Harvard Business Review. January-February. pp. 4-11.

EdNA Education Network Australia. (2000). EdNA Metadata Elements. [Web Page]. URL http://standards.edna.edu.au/metadata/elements.html

EdNA Education Network Australia. (2001). About Us. [Web Page]. URL http://www.edna.edu.au/aboutus/aboutsite.html

Fensel, et. al. (2000). OIL in a Nutshell. [Web Page]. URL http://www.cs.vu.nl/~ontoknow/oil/downl/oilnutshell.pdf

Furnas G, T.K. Landauer, L.M.  Gomez,  S.T.  Dumais. (1987). The vocabulary problem in human-system communication. Communications of the ACM. (30). pp. 964-71.

GEM Gateway to Educational Materials. (2000). GEM Resource Type Controlled Vocabulary. [Web Page]. URL http://www.geminfo.org/Workbench/Metadata/Vocab_Type.html

GEM Gateway to Educational Materials. (2001). About GEM. [Web Page]. URL http://www.geminfo.org/networker.html

Geoffrey C. Bowker and Susan Leigh Star. Sorting Things Out:Classification and Its Consequences. Cambridge, MA: MIT Press, 1999.

Gill, T. (2001). Metadata and the World Wide Web. [Web Page]. URL http://www.getty.edu/research/institute/standards/intrometadata/pdf/gill.pdf

Heflin, J. and Hendler, J. Semantic Interoperability on the Web. In Proceedings of Extreme Markup Languages 2000. Graphic Communications Association, 2000. pp. 111-120.  URL http://www.cs.umd.edu/projects/plus/SHOE/pubs/extreme2000.pdf

IMS Global Learning Consortium. (2000) IMS Learning Resource Meta-data Best Practices and Implementation Guide. [Web Page]. URL http://www.imsproject.com/metadata/mdbestv1p1.html

IMS Global Learning Consortium. (2001) IMS Learning Resource Meta-data Information Model.[Web Page]. URL http://www.imsproject.org/metadata/ims_md_infov1p2.html

Lynch, C. A. (1997). The Z39.50 Information Retrieval Standard. Part I: A Strategic View of Its Past, Present and Future. D-Lib Magazine. April. [Web Page]. URL http://www.dlib.org/dlib/april97/04lynch.html

Miller, P. (2000). Interoperability What is it and Why should I want it? ARIADNE 24. June 2000. [Web Page]. URL http://www.ariadne.ac.uk/issue24/interoperability/

Moen, W.E. (2001). Mapping the interoperability landscape for networked information retrieval.  In Proceedings of First ACM/IEEE-CS Joint Conference on Digital Libraries, Roanoke, VA, June 24-28, 2001. pp. 50-52.  [Web Page]. URL http://www.unt.edu/wmoen/publications/MapInteropJCDLFinal.pdf

Norrick, N.R. (1998). Lecture Semantics. [Web Page]. URL http://www.uni-saarland.de/fak4/norrick/lectsem.htm

Notess, G.R. Search Engines by Search Features. [Web Page]. URL http://www.searchengineshowdown.com/features/byfeature.shtml

Song, D.W. Wong, K.F. Bruza, P.D. Cheng, C.H. Towards a Commonsense Aboutness Theory for Information Retrieval Modelling. [Web Page]. URL http://www.dstc.edu.au/Research/Projects/Infoeco/publications/aboutness-sci00.pdf

Todd, R. J. (1992). Academic Indexing: What's it all About? The Indexer. 18 (2). pp. 101-104.

Varela, F., Thompson, E., & Rosch, E. (1991). The Embodied Mind; Cognitive Science and Human Experience. Cambridge, Mass: MIT Press.

XML.com (2001). DTD Repositories. [Web Page]. URL http://www.xml.com/pub/rg/DTD_Repositories