Friday, June 29, 2012

Analogous Terms

A while back I blogged on Univocal, Equivocal, and Analogous terms.  Thomas Aquinas wrote about these in Summa Theologica, so it is not really a new topic.  That said, I think that how we deal with these three classes of term in information management is a fairly murky area, and requires the development of practical guidance. 

Let’s start with analogous terms.  What is an analogy?  The Free Dictionary provides the following definition:

similarity in some respects between things that are otherwise dissimilar.
[http://www.thefreedictionary.com/analogy, accessed 2012-06-29]  

This is a reasonable start, but I think that we need to understand a lot more about analogous terms in information management.  I would suggest that an analogous term is:

a term used to signify a concept that uses all or part of a term which signifies a second concept that is better understood.  There are supposed to be attributes in common between the two concepts (often only one or a very few), or some other similarity (maybe an emotional response to the term). 

An analogous term I particularly dislike in information management is “Data Owner”.  For IT staff this signifies a person who has some unspecified responsibility for the data in question.  The word “Owner” is clearly an analogous term, because the IT staff have no intention of referring to holding legal title – which is the true meaning of “Owner”.  An individual who is a Data Owner should be able to take the data and sell it to anyone they liked.  Of course, anyone trying that in a major enterprise would be promptly fired, sued, and have criminal charges preferred against them.  It would be no defense to say that somebody in IT told them they were a Data Owner.

So what concept does the term “Data Owner” signify?  I cannot find out what it is, but I know for sure that for IT folks at least it does not mean “an individual who holds legal title to the data”.  For the moment, let us label this new concept “Concept D” (which may or may not exist). 

What I think is going on is that IT folks have a vague idea about Concept D and that Concept D has an analogous relationship with the concept signified by the term “Owner”.  I suggest, based on years of dealing with this, that the IT folks think of one shared attribute between Concept D and “Owner” – and that attribute is “responsibility”.

An owner of some item of property will generally assume responsibility for it.  For instance, when it comes to my house I pay the mortgage, mow the lawn, take out the trash, shovel the snow, and so on.  I could enumerate a very long list of tasks that I undertake because I own my house.  All of these tasks could be generalized into the attribute of “responsibility”. 

My guess is that IT folks have no clue what the specific tasks are that a “Data Owner” is obliged to perform for the data they “own”.  However, it is nevertheless certain that all these unknown tasks could be generalized into the attribute of “responsibility”.  The logic here can be expressed as:

1.      Every Owner has responsibility for an item they own
2.      Individual X has responsibility for a specific set of data
3.      Therefore, Individual X is an Owner

This syllogism is of course an invalid argument since it contains 4 terms (rather than 3).  The problem is that the terms “an item they own” and “a specific set of data” are distinct.  As we have seen, there is no real ownership of data.  Hence, we have the 4-term fallacy.

Let us focus now on the term “responsibility” which is the core shared attribute in the analogical relationship.

IT cannot really tell anyone they are responsible for a particular set of data, as nobody in IT is empowered to assign such responsibility.  Furthermore, as noted above, IT cannot give a specific list of the tasks associated with being responsible for a set of data.  Using the term “Data Owner” gets IT out of these difficulties – unless the “Data Owner” refuses to accept the term, or asks IT what responsibilities are involved in being a Data Owner.

The aspect of the abstraction of specific tasks into the term “responsibility” is an interesting one.  It shows how an abstract term can be used to suggest that someone should (or does know) the specifics covered by the abstract term in the context under consideration.  This is good fodder for a future blog post

So let us return to Concept D.  What exactly is it that IT is signifying by the term “Data Owner”?  It is quite possible that some IT staff do not know, in which case it is a null concept.  However, other IT staff may simply be seeking somebody who is an actual user of the data, and who can explain the data to them.  This is a good deal more mundane than being a “Data Owner”.

One last point is that new concepts emerge frequently in information management.  It seems natural to use analogical terms to signify them because the concepts themselves are not yet properly understood.  I have no problem with this, but what I have a problem with is giving the impression that the concept exists and is fully understood.  The use of an analogical term brings with it the concreteness of the second concept which the term signifies.  This gives the impression that something is fully known when it is not.  I have experienced this many times over my career in information management.

So what can we learn from this?  I suggest the following:

1.       In information management be on the lookout for terms that are clearly analogical.  They suggest that there is either a null concept or poorly understood concept.


2.      Identify the attributes that are in common between the concept to which the analogical term is now being applied and the concept which it originally signified.

3.      Determine if the attributes that are supposedly shared between the two concepts are abstracted.  If so, this indicates that a level of specificity is missing and further suggests that the new concept is poorly understood – or unintelligible. 

4.      If an analogical term is used to signify a poorly understood concept, be honest and say this.  Try to define the concept, and what its boundaries are.

5.   Try to isolate any emotive of suggestive notions associated with the analogical term and determine what their impact is.

Sunday, June 17, 2012

Rethinking Common vs. Technical Terms in Definitions and Some New Definition Rules


After I posted the blog on "Common vs. Technical Terms in Definitions and Some New Definition Rules" I was contacted by Suzanne DalBon, who had a different view on this topic.  Suzanne kindly put it that my view might apply in certain situations, but that handing primacy to common terms might cause chaos in other situations.  

The first point that Suzanne raised is that the emphasis on common terms in definitions will lead to inconsistency.  There are simply too many common terms (words and expressions) to choose from.

A major issue in collaborative environments where any kind of content is produced is the need to achieve consistency.  Consistency in outputs is needed for users (readers) of the content to be able to reliably use it.  Such consistency is remarkably difficult to achieve.  Heavy editorial control is one way, but such control can poison any collaborative environment where contributors are providing their time and effort on voluntary basis.  Of course, a collaborative environment is very likely to be the case for the development of definitions in enterprise environments.  The editor of a commercial encyclopedia can be expected to whip his contributors into providing a consistent format, and has the power of the purse at his disposal.  Further, his contributors would have been chosen in the first place not simply for their substantive ("domain") knowledge, but also for their literary tradecraft.

Any directive to use common terms as much as possible will result in inconsistent content development.  Not only is it a question of having many different synonyms of common terms, but there is also the problem of subtle differences when common terms are used.  English is especially prone to this.  For instance, English terms derived from Anglo-Saxon seem masculine, while English terms derived from Latin seem feminine.  Dr Samuel Johnson gave the example of "hearty welcome" having exactly the same roots as "cordial reception".  Think for a moment about the different images these conjure up for you - even if you are not a native English speaker.     

Suzanne's suggested approach is to use more general technical terms in definitions wherever possible.   I had been thinking about descriptive definitions in my original blog post.  In essential definitions, more general terms are used because the formula for producing the definition is Definition = Superordinate Genus + Specific Difference.  However, essential definitions are comparatively rare.  Nevertheless, as pointed out in previous blog posts here, concepts are always found in concept systems.  Therefore, we should describe a concept in terms of the closest concepts to it.  These will be the ones that have a direct relationship to it.  For a technical term, it is unlikely that a common term will signify a concept that is directly related to the concept signified by the term being defined.  So a common term will be just too general to be precise.  Even so, we should still avoid using technical terms that are more detailed than the term being defined (e.g. the parts of a whole, or the subspecies of a species).  Of course, this may be unavoidable in some cases, so we cannot make it a hard and fast rule.

Suzanne also noted that if we have to define a concept using more general technical terms, this is a good way of surfacing technical terms that have not yet been recognized as requiring a definition.  It is a very good way of validating the completeness of a special vocabulary.  

So to summarize our revised rules for using terms in a definition:

  1. Use more general terms found in the same special vocabulary, whose concepts have a direct relationship to the concept being defined.  Inform the editor if any term is not yet defined.
  2. If no appropriate term is available in (1), then use even more general terms within the same special vocabulary as the term to be defined.  Inform the editor if any term is not yet defined.
  3. If no appropriate term is available in (2), then use a more specific term within the same special vocabulary as the term to be defined.  Inform the editor if any term is not yet defined.
  4. If no appropriate term is available in (3), then use a common term.  However, editorial control may be needed to rationalize the use of common terms across definitions.
I suppose that these rules presuppose the existence of an editor.  But that is a topic for future blog posts.

Monday, June 11, 2012

Dangers of Automated Hyperlinking in Definitions

In my previous post I noted the example of the definition of “Mortgage-Backed Securities” in Prof. Campbell Harvey’s Hypertextual Finance Glossary.  The definition is:

Securities backed by a pool of mortgage loans  [http://www.duke.edu/~charvey/Classes/wpg/glossary.htm]


The term “pool” is hyperlinked in this definition, and the definition of “pool” is:

In capital budgeting, the concept that investment projects are financed out of a pool of bonds, preferred stock, and common stock, and a weighted-average cost of capital must be used to calculate investment returns. In insurance, a group of insurers who share premiums and losses in order to spread risk. In investments, the combination of funds for the benefit of a common project, or a group of investors who use their combined influence to manipulate prices.

This definition of “pool” does not fit the use of the term “pool” as it appears in the definition of “Mortgage-Backed Securities”.  I would argue that “pool” in this context should be preliminarily defined as:

a set of mortgages with common characteristics that act as collateral for debt instruments 

This is quite different to Prof. Harvey’s definition, so how did there come to be such a difference? 

I wonder if what we are looking at here is the automated hyperlinking of terms in definitions.  I do not know that this is happening in Prof. Harvey’s Glossary, but I have seen it as a feature of other semantic tools.  It is very convenient to type in a definition and have all the terms within it hyperlinked if they occur elsewhere in the vocabulary that is being constructed.  If this is not done, the analyst has to go and create the hyperlinks manually – a process that is very time-consuming.

But while automated hyperlinking is a great feature, it needs to be controlled.  I think the best way to do this is to have a cross reference report that provides a side by side comparison of a definition with the definitions of the terms used in it.  We can imagine the definition of a term on the left hand side of a page, with the hyperlinked terms highlighted.  On the right hand side of the page we could have the definitions of all the terms highlighted on the left hand side.  The analyst can then check whether the way in which each term is used in the definition on the left is consistent with the definition of that term on the right.

Simply relying on automated hyperlinking without this kind of check would seem to be inviting trouble.

A further check would be to ensure that the terms used in a definition to not link to definitions that are outside the vocabulary under consideration.  For instance,   here is a definition of “pool” from http://oxforddictionaries.com/definition/pool--2?region=us

a supply of vehicles or goods available for use when needed

This is definitely not the concept being identified by “pool” in Prof. Harvey’s Glossary.  However, in a broad semantics repository it is quite possible that a wide range of vocabularies might be present, and a term in a definition might get hyperlinked to a definition in an entirely different vocabulary – a term which is signifying a concept that is quite alien to the vocabulary which contains the original definition.   

Thursday, June 7, 2012

Common vs. Technical Terms in Definitions and Some New Definition Rules


A good deal of work in dealing with definitions in information management is done by analysts, and when I have been doing analytical work I have been struck by the need to capture Technical Terms.  Particular business areas always seem to have their own technical jargon, as does all of IT.  However, in capturing the concepts that lie behind these terms there is always a challenge about what terms to use in their definitions.

It is an old rule that high quality definitions should not use terms as obscure as the term being defined.  This is negative advice - telling us what not to do.   But what should we do?  I suppose that the best approach would be to use Common Terms.  A Common Term is one used in everyday discourse, and for which there is a well-know definition.   I agree that it is a noble goal to use only Common Terms in a definition of a Technical Term, and we should make every effort to do this.

However, can I really define something like "Mortgage-Backed Security" only by using Common Terms?  Campbell Harvey's Hypertextual Glossary defines "Mortgage-backed Securities" as:

Securities backed by a pool of mortgage loans [http://www.duke.edu/~charvey/Classes/wpg/bfglosm.htm]

But in this case, "pool" is not a Common Term, meaning a body of water or a swimming pool.  It is actually a Technical Term which is further defined by Prof. Harvey as:

In capital budgeting, the concept that investment projects are financed out of a pool of bonds, preferred stock, and common stock, and a weighted-average cost of capital must be used to calculate investment returns. In insurance, a group of insurers who share premiums and losses in order to spread risk. In investments, the combination of funds for the benefit of a common project, or a group of investors who use their combined influence to manipulate prices

Having developed quite a lot of securitization software in my time I would not define "pool"  that way, but as:

a set of mortgages with common characteristics that act as collateral for debt instruments 

That definition could probably stand improvement too, but let us get back to our main point.  "Pool" seems to be a Common Term but is really a Technical Term.  Yet we have no way of recognizing it is a Technical Term.  Actually, to be fair, in Prof. Harvey's Glossary we can infer it is a Technical Term because it is hyperlinked to the above definition (which is inadequate to define "pool" in the context of "Mortgage-Backed Security")

Prof. Harvey's definition of "Mortgage-backed Securities" also refers to "mortgage loans".  You could argue that this is a Common Term, but you could also argue that it is a Technical Term in the very broad area of finance, which is a much broader area than Mortgage Securitization.  This is interesting as it implies that there are vocabularies for specialized areas which are subspecies of less specialized areas.  It would seem to be helpful to use Technical Terms from a more general specialized area in definitions of terms that exist within a more specialized area.  After all, the more general specialized areas should be more widely known, so more people will understand these concepts.  But if somebody does not understand a term from a more general specialized area it should be easy for them to find understand its definition.

From this discussion we can derive some rules of what terms to use in a definition:

  1. Always try to use Common Terms in definitions
  2. If a Technical Term has to be used, try to use a Technical Term from a vocabulary that covers a more general area than the area to which the concept being defined belongs
  3. Always try to use a Technical Term from the most general area above the area to which the concept being defined belongs 
  4. Only as a last resort should a definition contain a Technical Term that is specific to the area to which the concept being defined belongs
  5. Do not use Technical Terms from a more specialized area than that to which the concept being defined belongs

This is interesting as it implies we will always need generalization hierarchies to do definitions well if we adopt the above rules.  Of course, the Tree of Porphyry has been around for many centuries to support the old formula of Definition = Superordinate Genus + Specific Difference.  However, I am discussing Descriptive Definitions rather than Essential Definitions here, and it is Essential Definitions to which the Tree of Porphyry and the old formula apply.  So  it is interesting to see that there are additional reasons for having a generalization hierarchy.