Friday, June 29, 2012

Analogous Terms

A while back I blogged on Univocal, Equivocal, and Analogous terms.  Thomas Aquinas wrote about these in Summa Theologica, so it is not really a new topic.  That said, I think that how we deal with these three classes of term in information management is a fairly murky area, and requires the development of practical guidance. 

Let’s start with analogous terms.  What is an analogy?  The Free Dictionary provides the following definition:

similarity in some respects between things that are otherwise dissimilar.
[http://www.thefreedictionary.com/analogy, accessed 2012-06-29]  

This is a reasonable start, but I think that we need to understand a lot more about analogous terms in information management.  I would suggest that an analogous term is:

a term used to signify a concept that uses all or part of a term which signifies a second concept that is better understood.  There are supposed to be attributes in common between the two concepts (often only one or a very few), or some other similarity (maybe an emotional response to the term). 

An analogous term I particularly dislike in information management is “Data Owner”.  For IT staff this signifies a person who has some unspecified responsibility for the data in question.  The word “Owner” is clearly an analogous term, because the IT staff have no intention of referring to holding legal title – which is the true meaning of “Owner”.  An individual who is a Data Owner should be able to take the data and sell it to anyone they liked.  Of course, anyone trying that in a major enterprise would be promptly fired, sued, and have criminal charges preferred against them.  It would be no defense to say that somebody in IT told them they were a Data Owner.

So what concept does the term “Data Owner” signify?  I cannot find out what it is, but I know for sure that for IT folks at least it does not mean “an individual who holds legal title to the data”.  For the moment, let us label this new concept “Concept D” (which may or may not exist). 

What I think is going on is that IT folks have a vague idea about Concept D and that Concept D has an analogous relationship with the concept signified by the term “Owner”.  I suggest, based on years of dealing with this, that the IT folks think of one shared attribute between Concept D and “Owner” – and that attribute is “responsibility”.

An owner of some item of property will generally assume responsibility for it.  For instance, when it comes to my house I pay the mortgage, mow the lawn, take out the trash, shovel the snow, and so on.  I could enumerate a very long list of tasks that I undertake because I own my house.  All of these tasks could be generalized into the attribute of “responsibility”. 

My guess is that IT folks have no clue what the specific tasks are that a “Data Owner” is obliged to perform for the data they “own”.  However, it is nevertheless certain that all these unknown tasks could be generalized into the attribute of “responsibility”.  The logic here can be expressed as:

1.      Every Owner has responsibility for an item they own
2.      Individual X has responsibility for a specific set of data
3.      Therefore, Individual X is an Owner

This syllogism is of course an invalid argument since it contains 4 terms (rather than 3).  The problem is that the terms “an item they own” and “a specific set of data” are distinct.  As we have seen, there is no real ownership of data.  Hence, we have the 4-term fallacy.

Let us focus now on the term “responsibility” which is the core shared attribute in the analogical relationship.

IT cannot really tell anyone they are responsible for a particular set of data, as nobody in IT is empowered to assign such responsibility.  Furthermore, as noted above, IT cannot give a specific list of the tasks associated with being responsible for a set of data.  Using the term “Data Owner” gets IT out of these difficulties – unless the “Data Owner” refuses to accept the term, or asks IT what responsibilities are involved in being a Data Owner.

The aspect of the abstraction of specific tasks into the term “responsibility” is an interesting one.  It shows how an abstract term can be used to suggest that someone should (or does know) the specifics covered by the abstract term in the context under consideration.  This is good fodder for a future blog post

So let us return to Concept D.  What exactly is it that IT is signifying by the term “Data Owner”?  It is quite possible that some IT staff do not know, in which case it is a null concept.  However, other IT staff may simply be seeking somebody who is an actual user of the data, and who can explain the data to them.  This is a good deal more mundane than being a “Data Owner”.

One last point is that new concepts emerge frequently in information management.  It seems natural to use analogical terms to signify them because the concepts themselves are not yet properly understood.  I have no problem with this, but what I have a problem with is giving the impression that the concept exists and is fully understood.  The use of an analogical term brings with it the concreteness of the second concept which the term signifies.  This gives the impression that something is fully known when it is not.  I have experienced this many times over my career in information management.

So what can we learn from this?  I suggest the following:

1.       In information management be on the lookout for terms that are clearly analogical.  They suggest that there is either a null concept or poorly understood concept.


2.      Identify the attributes that are in common between the concept to which the analogical term is now being applied and the concept which it originally signified.

3.      Determine if the attributes that are supposedly shared between the two concepts are abstracted.  If so, this indicates that a level of specificity is missing and further suggests that the new concept is poorly understood – or unintelligible. 

4.      If an analogical term is used to signify a poorly understood concept, be honest and say this.  Try to define the concept, and what its boundaries are.

5.   Try to isolate any emotive of suggestive notions associated with the analogical term and determine what their impact is.

Sunday, June 17, 2012

Rethinking Common vs. Technical Terms in Definitions and Some New Definition Rules


After I posted the blog on "Common vs. Technical Terms in Definitions and Some New Definition Rules" I was contacted by Suzanne DalBon, who had a different view on this topic.  Suzanne kindly put it that my view might apply in certain situations, but that handing primacy to common terms might cause chaos in other situations.  

The first point that Suzanne raised is that the emphasis on common terms in definitions will lead to inconsistency.  There are simply too many common terms (words and expressions) to choose from.

A major issue in collaborative environments where any kind of content is produced is the need to achieve consistency.  Consistency in outputs is needed for users (readers) of the content to be able to reliably use it.  Such consistency is remarkably difficult to achieve.  Heavy editorial control is one way, but such control can poison any collaborative environment where contributors are providing their time and effort on voluntary basis.  Of course, a collaborative environment is very likely to be the case for the development of definitions in enterprise environments.  The editor of a commercial encyclopedia can be expected to whip his contributors into providing a consistent format, and has the power of the purse at his disposal.  Further, his contributors would have been chosen in the first place not simply for their substantive ("domain") knowledge, but also for their literary tradecraft.

Any directive to use common terms as much as possible will result in inconsistent content development.  Not only is it a question of having many different synonyms of common terms, but there is also the problem of subtle differences when common terms are used.  English is especially prone to this.  For instance, English terms derived from Anglo-Saxon seem masculine, while English terms derived from Latin seem feminine.  Dr Samuel Johnson gave the example of "hearty welcome" having exactly the same roots as "cordial reception".  Think for a moment about the different images these conjure up for you - even if you are not a native English speaker.     

Suzanne's suggested approach is to use more general technical terms in definitions wherever possible.   I had been thinking about descriptive definitions in my original blog post.  In essential definitions, more general terms are used because the formula for producing the definition is Definition = Superordinate Genus + Specific Difference.  However, essential definitions are comparatively rare.  Nevertheless, as pointed out in previous blog posts here, concepts are always found in concept systems.  Therefore, we should describe a concept in terms of the closest concepts to it.  These will be the ones that have a direct relationship to it.  For a technical term, it is unlikely that a common term will signify a concept that is directly related to the concept signified by the term being defined.  So a common term will be just too general to be precise.  Even so, we should still avoid using technical terms that are more detailed than the term being defined (e.g. the parts of a whole, or the subspecies of a species).  Of course, this may be unavoidable in some cases, so we cannot make it a hard and fast rule.

Suzanne also noted that if we have to define a concept using more general technical terms, this is a good way of surfacing technical terms that have not yet been recognized as requiring a definition.  It is a very good way of validating the completeness of a special vocabulary.  

So to summarize our revised rules for using terms in a definition:

  1. Use more general terms found in the same special vocabulary, whose concepts have a direct relationship to the concept being defined.  Inform the editor if any term is not yet defined.
  2. If no appropriate term is available in (1), then use even more general terms within the same special vocabulary as the term to be defined.  Inform the editor if any term is not yet defined.
  3. If no appropriate term is available in (2), then use a more specific term within the same special vocabulary as the term to be defined.  Inform the editor if any term is not yet defined.
  4. If no appropriate term is available in (3), then use a common term.  However, editorial control may be needed to rationalize the use of common terms across definitions.
I suppose that these rules presuppose the existence of an editor.  But that is a topic for future blog posts.

Monday, June 11, 2012

Dangers of Automated Hyperlinking in Definitions

In my previous post I noted the example of the definition of “Mortgage-Backed Securities” in Prof. Campbell Harvey’s Hypertextual Finance Glossary.  The definition is:

Securities backed by a pool of mortgage loans  [http://www.duke.edu/~charvey/Classes/wpg/glossary.htm]


The term “pool” is hyperlinked in this definition, and the definition of “pool” is:

In capital budgeting, the concept that investment projects are financed out of a pool of bonds, preferred stock, and common stock, and a weighted-average cost of capital must be used to calculate investment returns. In insurance, a group of insurers who share premiums and losses in order to spread risk. In investments, the combination of funds for the benefit of a common project, or a group of investors who use their combined influence to manipulate prices.

This definition of “pool” does not fit the use of the term “pool” as it appears in the definition of “Mortgage-Backed Securities”.  I would argue that “pool” in this context should be preliminarily defined as:

a set of mortgages with common characteristics that act as collateral for debt instruments 

This is quite different to Prof. Harvey’s definition, so how did there come to be such a difference? 

I wonder if what we are looking at here is the automated hyperlinking of terms in definitions.  I do not know that this is happening in Prof. Harvey’s Glossary, but I have seen it as a feature of other semantic tools.  It is very convenient to type in a definition and have all the terms within it hyperlinked if they occur elsewhere in the vocabulary that is being constructed.  If this is not done, the analyst has to go and create the hyperlinks manually – a process that is very time-consuming.

But while automated hyperlinking is a great feature, it needs to be controlled.  I think the best way to do this is to have a cross reference report that provides a side by side comparison of a definition with the definitions of the terms used in it.  We can imagine the definition of a term on the left hand side of a page, with the hyperlinked terms highlighted.  On the right hand side of the page we could have the definitions of all the terms highlighted on the left hand side.  The analyst can then check whether the way in which each term is used in the definition on the left is consistent with the definition of that term on the right.

Simply relying on automated hyperlinking without this kind of check would seem to be inviting trouble.

A further check would be to ensure that the terms used in a definition to not link to definitions that are outside the vocabulary under consideration.  For instance,   here is a definition of “pool” from http://oxforddictionaries.com/definition/pool--2?region=us

a supply of vehicles or goods available for use when needed

This is definitely not the concept being identified by “pool” in Prof. Harvey’s Glossary.  However, in a broad semantics repository it is quite possible that a wide range of vocabularies might be present, and a term in a definition might get hyperlinked to a definition in an entirely different vocabulary – a term which is signifying a concept that is quite alien to the vocabulary which contains the original definition.   

Thursday, June 7, 2012

Common vs. Technical Terms in Definitions and Some New Definition Rules


A good deal of work in dealing with definitions in information management is done by analysts, and when I have been doing analytical work I have been struck by the need to capture Technical Terms.  Particular business areas always seem to have their own technical jargon, as does all of IT.  However, in capturing the concepts that lie behind these terms there is always a challenge about what terms to use in their definitions.

It is an old rule that high quality definitions should not use terms as obscure as the term being defined.  This is negative advice - telling us what not to do.   But what should we do?  I suppose that the best approach would be to use Common Terms.  A Common Term is one used in everyday discourse, and for which there is a well-know definition.   I agree that it is a noble goal to use only Common Terms in a definition of a Technical Term, and we should make every effort to do this.

However, can I really define something like "Mortgage-Backed Security" only by using Common Terms?  Campbell Harvey's Hypertextual Glossary defines "Mortgage-backed Securities" as:

Securities backed by a pool of mortgage loans [http://www.duke.edu/~charvey/Classes/wpg/bfglosm.htm]

But in this case, "pool" is not a Common Term, meaning a body of water or a swimming pool.  It is actually a Technical Term which is further defined by Prof. Harvey as:

In capital budgeting, the concept that investment projects are financed out of a pool of bonds, preferred stock, and common stock, and a weighted-average cost of capital must be used to calculate investment returns. In insurance, a group of insurers who share premiums and losses in order to spread risk. In investments, the combination of funds for the benefit of a common project, or a group of investors who use their combined influence to manipulate prices

Having developed quite a lot of securitization software in my time I would not define "pool"  that way, but as:

a set of mortgages with common characteristics that act as collateral for debt instruments 

That definition could probably stand improvement too, but let us get back to our main point.  "Pool" seems to be a Common Term but is really a Technical Term.  Yet we have no way of recognizing it is a Technical Term.  Actually, to be fair, in Prof. Harvey's Glossary we can infer it is a Technical Term because it is hyperlinked to the above definition (which is inadequate to define "pool" in the context of "Mortgage-Backed Security")

Prof. Harvey's definition of "Mortgage-backed Securities" also refers to "mortgage loans".  You could argue that this is a Common Term, but you could also argue that it is a Technical Term in the very broad area of finance, which is a much broader area than Mortgage Securitization.  This is interesting as it implies that there are vocabularies for specialized areas which are subspecies of less specialized areas.  It would seem to be helpful to use Technical Terms from a more general specialized area in definitions of terms that exist within a more specialized area.  After all, the more general specialized areas should be more widely known, so more people will understand these concepts.  But if somebody does not understand a term from a more general specialized area it should be easy for them to find understand its definition.

From this discussion we can derive some rules of what terms to use in a definition:

  1. Always try to use Common Terms in definitions
  2. If a Technical Term has to be used, try to use a Technical Term from a vocabulary that covers a more general area than the area to which the concept being defined belongs
  3. Always try to use a Technical Term from the most general area above the area to which the concept being defined belongs 
  4. Only as a last resort should a definition contain a Technical Term that is specific to the area to which the concept being defined belongs
  5. Do not use Technical Terms from a more specialized area than that to which the concept being defined belongs

This is interesting as it implies we will always need generalization hierarchies to do definitions well if we adopt the above rules.  Of course, the Tree of Porphyry has been around for many centuries to support the old formula of Definition = Superordinate Genus + Specific Difference.  However, I am discussing Descriptive Definitions rather than Essential Definitions here, and it is Essential Definitions to which the Tree of Porphyry and the old formula apply.  So  it is interesting to see that there are additional reasons for having a generalization hierarchy.

Monday, March 12, 2012

The Humpty-Dumpty Principle in Definitions

In dealing with empty concepts, we came across the issue that if somebody uses a term that potentially has an unintelligible definition, they are likely to defend themselves by quickly making up some kind of definition.   I strongly suspect that in such cases, the usage of the term will be inconsistent with the definition.  Which brings us to Humpty-Dumpty.

Lewis Carroll (Rev. Charles Lutwidge Dodgson) is best known for his children's' books Alice's Adventures in Wonderland and Through the Looking Glass.  It is in the latter that Humpty Dumpty - an argumentative egg perched on a wall has the following exchange with Alice:

'And only one for birthday presents, you know. There's glory for you!'
`I don't know what you mean by "glory",' Alice said.
Humpty Dumpty smiled contemptuously. `Of course you don't -- till I tell you. I meant "there's a nice knock-down argument for you!"'
`But "glory" doesn't mean "a nice knock-down argument",' Alice objected.
`When I use a word,' Humpty Dumpty said, in rather a scornful tone, `it means just what I choose it to mean -- neither more nor less.'
`The question is,' said Alice, `whether you can make words mean so many different things.'
`The question is,' said Humpty Dumpty, `which is to be master -- that's all.'

What Humpty-Dumpty is saying is that he can stipulate what the definition of a term is.  Lewis Carroll was a processor of logic and mathematics at Oxford University (Christ Church College), and wrote extensively on logic.  He elaborated on the theme in Humpty Dumpty in Chapter 2 of Book 10 of his work Symbolic Logic, where he says:

"...I maintain that any writer of a book is fully authorized in attaching any meaning he likes to any word he intends to use.  If I find an author saying at the beginning of his book, 'Let it be understood that by the word white I shall always mean black' I meekly accept his ruling, however injudicious I may think it. " 

[I quote from Lewis Carroll's Symbolic Logic by William Warren Bartley III,  ISBN 0-517-52383-3 - a book I had great difficulty in obtaining].   

Carroll implies that an author must use such a term consistently - that is, without varying the underlying definition.  This appears to be the principle that Humpty Dumpty was getting at.  Varying the definition of a term within an argument is the fallacy of equivocation, but Carroll is going beyond that and demanding consistent usage throughout a universe of discourse.

A second point here is that if a term is to be used in a nonconventional way, then the definition should be explained up front.  Again, Carroll implies this in his example of white.  This is one place where I would agree with the maxim of "starting with definitions".  However, this maxim is often misused.  For instance, in analysis we nearly always aim to arrive at definitions, as we do not understand concepts at the outset.

Thus, I think we now have a reasonable framework for dealing with individuals who use terms that they cannot supply adequate definitions for, particularly in the context of empty concepts.

One quick Lewis Carroll story.  Carroll found it necessary to deny that he had ever presented Queen Victoria with a book.  The story circulated that the Queen has been so charmed by reading Alice in Wonderland that she expressed her desire to receive the author's next work - whereupon he sent her The Condensation of Determinants

Wednesday, March 7, 2012

What is an Empty Concept?


In the last post, univocal, equivocal, and analogous terms were discussed.  It occurred to me afterwards that all of these classes of term presuppose terms that signify concepts.  But what about a term that does not signify any concept?  

At first this sounds a bit stupid.  Surely we would not waste our time on terms that do not signify a concept.  However, I have listened to several decades of marketing hype in Information Technology and I think that I have heard terms that do not signify anything - but which have some kind of emotive power.

I have tried to look for philosophical sources about terms that signify empty concepts, but have not been able to find any - probably due to the short time I have been able to invest in the search.  This makes me cautious, so I will confine the discussion of empty concepts mostly to data management.

First, if a term signifies a concept that can supposit for actual materially existing instances, then the concept is non-empty.  E.g. "Computer".  Unfortunately, in information management, we are mostly dealing with concepts who instances are immaterial, such as "Mortgage Backed Securities" (MBS's).  These exist, but not materially.  They are essentially contracts between human beings and/or institutions.   This way of thinking about empty concepts is now at a point where my metaphysics runs out.  

Let's try another approach.  If a concept can be empty, that implies it can possibly have content.  What "content" has traditionally been taken to mean by logicians (in the context of concepts) is a definition.  Thus, I would suggest that an empty concept is one with either (a) no definition; or (b) an unintelligible definition.
And it gets more complex.   Someone who uses a term can generally attempt to provide a definition for it.  They are very unlikely to admit there is no definition.   The definition provided is likely either a definition for a concept signified by another term or terms; or the definition supplied is unintelligible.  

Perhaps an unintelligible definition is more interesting.  Such a definition must fail significant quality checks,  Such checks may be those we formally use to assess all definitions, or may be checks based in the subject matter of what is being defined.  For instance, I suppose that "Platonic Forms" (which proposed real existence of concepts such as "table" or "chair") is a concept ultimately found to be unintelligible by philosophers.  More mundanely, I think "data owner" has an unintelligible definition insofar as it contains anything about "ownership".

This is yet another topic I will have to follow up. 

Tuesday, March 6, 2012

Univocal, Equivocal, and Analogous Terms


This is a topic which we will probably have to return to in the future, but a start has to be made.  Definitions are inextricably bound up with terms, and one classification of terms  divides them up into Univocal, Equivocal, and Analogous.  Let us briefly review these three classes.  
  • Univocal Term: A terms that has only one meaning.  That is, it signifies only one concept, and thus corresponds to only one definition.  Such a term always has the same intension wherever it is used.   E.g. the term "entomology" signifies the study of insects. 
  • Equivocal Term: A term that has more than one meaning.  That is, it signifies more than one concept, and thus corresponds to more than one definition.  An equivocal term has different intensions when it is used.  E.g. the term "chihuahua" can signify (a) a breed of dog; (b) a state of Mexico. 
  • Analogous Term:  A term that is intended to convey one or more similar characteristics that exist between two concepts.  E.g. the term "data owner" is applied to individuals who have no legal title to the data they manage, but are expected to exercise responsibilities like those owners would typically exercise.  Sometimes an analogous term can be no different to an equivocal term. 
It is not always easy to know if a univocal term really is univocal.  For instance, I am not aware of any equivocal use of "entomology".  There might be, but I am unaware of it if it exists.  Also, a univocal term might become equivocal in the future.  

With respect to equivocal terms, a big problem is that there are far more concepts than there are terms to describe them.  This is a reason why equivocal terms come into being, and there seems to be no way to avoid it happening.  From a practical point of view the problems with equivocal terms arise when a term is being used equivocally in communication.  For instance, if I use "backup" within a group managing databases, everybody knows what I mean - even if " backup " has other meanings, such as in police operations. 

One way round this is to maintain specialized vocabularies for "subject fields" (as the terminologists call them).  Roughly speaking this means that an equivocal term can be taken to be univocal in a specific context.  We will need to come back to this.

Analogous terms are more difficult.    Problems include: (a) only one of the related objects is identified; (b) the characteristics supposedly in common between the two objects are never identified.  This means that an analogous term may not actually signify a concept - it may be unintelligible and just used for emotive effect.  It certainly implies that there is a more complex relationship between analogous terms and definitions.  This too we will have to return to.