Wednesday, November 30, 2011

Solvency II and Tracking Other Organizations' Definitions

I attended an interesting webinar by Golden Source (www.thegoldensource.com) today about Solvency II.  Solvency II is a huge pan-European insurance regulatory framework that is going to be implemented over the next few years.

Not unexpectedly, definitions came up.  The point was made that an insurer has to not only know what its definitions of concepts are, but also definitions used by its partners.  For instance, an insurance company may utilize the services of many asset managers (as part of its overall investment activities).  For Solvency II purposes, the insurance company must know what its definition of e.g. "Country of Risk" is, and also how each of its Asset Manager partners defines "Country of Risk".

This is an important point.  Data managers often only look within the enterprise when it comes to definitions.  Yet there can be compelling reasons to track the definitions that are used by other organizations which the enterprise interacts with. 

I have noticed that when I bring this point up, some colleagues think that reason to do this work is to figure out which definition is "right".  However, the practical need is for semantic interoperability, not arguing about correctness of a term.  We need to understand definitions used by our partners as a first step to integrating data they send to us.  Therefore we have a requirement to track the definitions used by our partners.  An interesting challenge, but one clearly highlighted by the requirements of Solvency II.

Monday, November 28, 2011

Evolution of Definitions – The Problem of Pluto

In early 2006 I had the privilege of seeing NASA’s New Horizon’s mission blast off on its way to Pluto.  At that time, Pluto was a planet.  By August of the same year it was not.

On August 24, 2006, the International Astronomical Union (IAU) publicly defined a planet as "a celestial body that (a) is in orbit around the Sun, (b) has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape, and (c) has cleared the  neighbourhood around its orbit.".

This raises questions such as:
  • Can definitions change?  Pluto had been called a planet until the IAU changed the boundaries of the definition.
  • What authority has the IAU to define a planet?
  • I still think of Pluto as a planet – am I wrong to do so?
  • The IAU’s new definition seems a bit contrived.  Will it stand up?
  • What motivation did the IAU have to change the definition? 
Definitions can change, and should as we get to know reality better.  Yet the evolution of definitions seems to be little dealt with in the traditional literature on definitions.  It would be nice to have some rules about it, and some thought about how it should be done.

As to the IAU, it is quite free to come up with a definition for any term – just as the Red Queen did in Alice in Wonderland.  And I too am free to have my own definition of a planet.  Whether the IAU’s definition will stand up is a good question.  New research suggests  that extra-solar planetary systems are very diverse.  The IAU’s definition may very well not stand up in the face of future discoveries – but surely such evolution is part of what science is.  As to motivation, that is a discussion for another post – but it often matters for all kinds of reasons.

Saturday, November 26, 2011

A Note on the Role of Precision in Definitions

The term "precision" seems to have changed its meaning over the centuries, which may cause confusion to anyone dealing with the literature of definitions.  It signifies more than one concept, which muddles things up.  Unfortunately, I may be adding to the muddle, as some of my points in this post are from memory, and I will have to rediscover the references for them.  However, I wanted to capture what I now have about precision. 

The etymology of "precision", according to Peirce, means "to cut off at the end" (from "Issues in Pragmatism", The Monist, Vol 15, Oct 1905 pp481-499).  Apparently, it is connected with "curt denials and refusals" - cutting someone off.  Oddly, this seems to have tradionally meant that the more cutting off you did, the greater precision you achieved.  As such, it runs counter to our idea of numerical precision, where the greater the number of decimal places, the greater the precision.  On the traditional view, the fewer the number of decimal places (the more chopping off we have done) would seen to mean the greater the precision (though I cannot find an example to confirm this numerical aspect of precision in traditional literature).

Peirce also had something to say about precision in definitions.  According to him, removal of superflous words achieves greater precision in a definition (again from my memory, so needs to be checked).  I think that this rule applies more to summary than to definition, and Peirce may have been thinking about definition work he was doing for dictionaries, where printing costs are a factor.  Too much text may be repetitous or confusing.  However, repetition may be valuable to drive a point home.  Confusion is not always produced by additional text, but can be a danger.

Perhaps a more important point involves using the word "prescind" (the act of precision) which today seems to be replaced by the overloaded term "abstraction".  When we prescind we cut away from concrete instances.  E.g. the concept "animal" can apply to individual instances, but "animality" cannot (from Sullivan, An Introduction to Traditional Logic ISBN 1-4196-1671-4, pp 23-24).  The greater the precision of a concept, the greater its "abstraction".  Traditionally, "concrete concepts" are abstractions "without precision" (since they apply to individuals), while "abstract concepts" are concepts "with precision" since they cannot apply to individuals (Sullivan, footnote on p24).  So, perhaps counter-intuitively, the more abstract a concept is, the more precise it is.

Friday, November 25, 2011

Must a Definition Include Identification of Related Concepts?

The traditional answer to this question is "yes", because classic essential definitions follow the formula Definition = Genus + Specific Difference.

However, definitions of concepts in natural science tend to be more like descriptions than classical definitions.  This may be unavoidable, but there is always a danger in a descriptive definition of not mentioning any related concepts.  Such definitions may give the impression of a style in which definitions should be written, and this sometimes carries over into analyst work - so that some analysts tend to write descriptive definitions, even if essential ones could be provided.  And such definitions lack mention of related concepts.    

But should a definition always identify related concepts?  I think it should.  I think that practical usage of a definition requires an understanding of the Concept System in which the definition is located.  Without such an understanding, the user runs a risk of not being able to use the definition adequately.  I will have to return to this to offer a proof in the future. 

The Concept System itself is a concept.  Merely identifying the Concept System may be at too high a level of generalization - but it is better than nothing.  Better would be to find a proximate superordinate genus (an immediate parent supertype) within the Concept System, but this may sometimes be difficult.

A concept need not have just one relation (to the Concept System or a higher-level concept within it).  It may have other relationships.  Identifying such relationships in definitions will be helpful too.  This is a topic that leads to matching definitions to Conceptual Models, which we will have to return to. 

Therefore, I think that we have an additional quality assurance point for definitions: a definition must identify a superordinate genus within the Concept System in which it is located, or identify the Concept System itself.

A futher quality assurance point might be that a definition must identify all relations between the concept being defined and other relevant concepts - but this point needs to be followed up in a future post.

This leads to the consideration that a definition of a concept will change depending on the Concept System it is located in, and one concept can be located in more than one Concept System.  And that it yet another topic for a future post.

Thursday, November 24, 2011

The Fallacy of Language As The Mirror of Reality

Analysts can sometimes create huge problems if they try to produce single definitions for terms within a single semantic community and within a single  universe of discourse.  This may sound crazy, so let me start by giving an example.

Not long ago I was in a discussion concerning data quality.  The group leaders decided there was a need to discuss the so-called "dimensions of data quality", e.g. Accuracy, Consistency, Timeliness, and so on.  We started with Consistency.  Each individual in the group offered their view of what Data Consistency was.  Several different definitions were offered.  Eventually, the group took a vote and decided which definition of Data Consistency they preferred.  The alternative definitions were not discussed further, nor recorded.  The individuals who had proposed the unaccepted definitions felt slighted, perhaps even hurt.  And they had a right to - as far as I could tell, the alternative definitions represented valid concepts.

What a broken process!  Definitions of valid concepts were simply rejected, and lost.  Individuals were turned off from definitional work, maybe permanently. Why did it happen?  I think I can offer a hypothesis.

The first mistake is to believe in that every concept that is known of is represented by a term in language.  Unknown concepts will obviously not be so represented.  But what is "unknown" in a semantic community?  Is it any concept not known by everyone in the community?  What about a concept only understood by a minority in the community?

Secondly, in technical language, there seems to be more expected of technical terms than is warranted.  They sound  "scientific".  They sound as if we should expect them to convey something precise - a trick taken advantage of by thousands of misleading advertisements every day.  But there is no reason to expect a technical term to have an agreed definition by everyone in a semantic community.  There may be several valid concepts competing to be signified by the term. 

The expectation - in technical areas - that terms mirror known reality should not be relied on.  The phrase "language as a mirror of reality" is connected with Wittgenstein (see http://www.percepp.com/lacus.htm).  It should be granted he may not have been talking about terms per se, and granted that probably few analysts are conciously influenced by Wittgenstein.  However, the presupposition seems to have got about somehow, and, anyway academics show little interest in how analysts go about their daily work.

Language cannot be assumed to mirror reality in technical areas.  Analysts must create governance processes that guide their definitional work so they harvest all valid concepts, and encourage members of semantic communities to contribute.  Terms are starting points, not a final list of signs that denote all the individual concepts in a universe of discourse.   

Wednesday, November 23, 2011

A Brief Review of Nordterm 8 - A Guide to Terminology

In the sparse literature about definitions, most publications are from Academia.  However, Nordterm 8 Guide to Terminology by Heidi Suonuuti (ISBN 952-9794-14-2) is a very useful and practical booklet from a practicioner community.  Further details can be found at

http://www.nordterm.net/info/Publ/PNORDTERM8-en.html.

Nordterm describes itself as follows: "Nordterm is an association of organisations and societies in the Nordic countries which are engaged in terminology work, training and research.".  Terminology work, of course, covers much more than definitions, but A Guide to Terminology does contain a fair amount on definitions, and has a very practical focus.

Nordterm 8 contains a very useful set of references, particularly about ISO Standards.  It introduces definitions in a section on Concept Analysis, that is also valuable as an overview of ontology - again from a practical perspective. A section dedicated to definitions follows, with systematization in terminology work as a special focus.  A valuable high level methodology for terminology work is presented next - again very practical.  Deficient definitions are examined next.  The following major section is on terms.  This, of course, is a specialist topic within terminology, but obviously closely allied to definition work.  Finally, there is a summary of the terms used in terminology work (alas without definitions) and their translations in Nordic languages.  The whole booklet is 42 pages.

A Guide to Terminology would be valuable just because there is so little literature, and Academia seems to have a mission to avoid any practical contributions.  However, the booklet packs a lot into a small volume and is highly informative and will be a help to any practicioner.

A First Note on Partial Definitions

I think that partial definitions exist, and there are practical reasons for being interested in them.  I cannot find any literature about them, and this post is my first attempt at dealing with partial definitions.

The only conceptualization of a partial definition that I have figured out in any detail can be summarized by the formula:

Partial Definition = Name of Concept System + Type of Relation in Concept System

E.g., for "Wristwatch"

Definition of "Wristwatch" = "A type of timepiece"

Obviously, this parallels the Aristoteialan formula of Definiton = Genus + Specific Difference.  However, I think that Aristotle commits definitions to being only in a Concept System of generic relations (supertype-subtype to our data modeling friends).  Other types of Concept System exist, e.g. partative (part-whole), and associative. 

In a partial definition we provide information by locating the concept to be defined within a particular Concept System, giving context to the minds we are communicating with.  Of course, we must expect that these minds know about the Concept System we name in the partial definition.

The Concept System does not have to be a proximate Genus, as Aristotle would like.  It could be a much higher level generic concept, though this may broaden the context too much.  In the above example, locating "Wristwatch" in the concept system "Timepiece" provides more precise context than if I said "A type of instrument", "instrument being a more generic concept containing "Timepiece".  Obviously, there is skill required to choose the level appropriate to the mind being communicated with.

There is also the choice of Concept System to locate the concept in.  For "Wristwatch" I could have alternative partial definitions such as "A fashion accessory", or "An item of jewelery".  These identify different Concept Systems within which I wish to locate "Wristwatch" for whatever my purposes may be.

The Type of Relation in the Concept System in my formula above (e.g. "type", "part", "item") is one level of abstraction up from a description of the Concept System itself.  I think it serves to reinforce the essence of the Concept System.

While full definitons are not something we work on every day, I think partial definitions are very common in everyday communication. 

That's enough for now.  To summarize (and these points may need further proof): 
  • Partial Definitions exist
  • A common kind of Partial Definition is to locate a concept in a Concept System
  • The Partial Defintion also describes the type of relation that predominates in the Concept System
  • There is skill in selecting the Concept System as one concept can belong to many Concept Systems
  • There is skill in selecting the level of generalization of the Concept System
  • Partial Definitions are very common in everyday language (does that mean that everyone is an ontologist?)

Sunday, November 20, 2011

How is a Definition Different from an Explanation? (Part 1)

This topic will require more than one post, because there is more than one definition of "Explanation".  That is, the term "Explanation" signifies more that one concept.

The first kind of explanation we will deal with can be defined as:

"Bringing a mind to an understanding of a topic".

This means that an explanation must be relative to the mind which is to be brought to the intended understanding.  For instance, the way we explain the solar system to an 8-year old child will likely be different to the way we explain the solar system to a college undergraduate.  The explanation offered will depend on prior knowledge, experience, developed intelligence, and probably many other factors.  Differences between defintion and explanation can be summarized as follows:
  • A definition is not intended to be proportional to any mind.  It has to be the best description of a concept that is available. This is is quite unlike explanation, where there will be many explanations proportionate to the types of minds which are to be provided with a given explanation.  
  • There can be many explanations for a concept, but there should be only one definition.  
  • An explanation may exclude difficult points (say to convey an initial understanding), but a definition must be as complete as possible.  
  • An explanation may avoid technical language.  A definition should aim to exclude technical language, but it may sometimes need to include technical language.
There is always a danger that an analyst creating a definition will treat it as an explanation.  This will likely make the definition unusable as an authoritative reference.  The analyst will probably only be guessing at the type of mind any explanation is aimed at.  It is one reason careful governance of definitions is needed (so analysts do not go in the wrong direction).  

It can be argued that there is an environment in which the definition is to be used, and that the definition should be the best for that environment - but might be vary across such environments.  For instance, the insurance concept of "Incurred But Not Reported" can be defined within the environment of Insurance Company X, without any intent to be used in any other insurance company.

An issue here is that if a definiton is the best for a particular environment, how can this satisfy the vision of the Semantic Web which would seem to have common definitions of concepts across the entire web?  However, this is an issue beyond comparing definition to explanation.

What is an Identifying Characteristic?

A definition describes a concept.  A concept has many characteristics.  The process of abstraction occurs when we consider the characteristics of a concept individually (or, at least this is one way in which the term "abstraction" is used).  Very often, the characters considered the most when doing definitions are those which describe the essence of the concept.  Consider, for instance, the traditional Aristotelian formula for a definition:

Definition = Genus + Specific Difference

In the historical literature about definitions, emphasis was put on the specific difference as being the essential characters of a concept.  This may serve the philosophers' purposes, but it ignores the problem of identification, and the need to consider identifying characteristics in a definition.

Suppose I want to define the concept of Exit Row in a plane.  Let's try to do so using Aristotle's method.

"A row of seats that provides emergency access to the outside of a plane"

A bit provisional perhaps, but is does begin to convey the essence of an Exit Row.  But how do flight attendants identify an Exit Row to passengers when they give the safety briefing?  It goes something like this:

"For those of you  seated in a row marked by the sign 'No Children In This Row'..."

The essence of an Exit Row is not that it is an area of a plane that is kept free of children.  The essence is that it is a point of emergency egress from a plane.  Only competent adults are thought to be capable of opening an emergency exit, so only they are permitted to sit in an Exit Row.

However, it fairly easy for a passenger  to identify an Exit Row based on finding a sign saying 'No Children in This Row'.  The sign functions as an identifying characteristic.

An identifying characteristic is a characteristic that easily permits the identification of an instance of a concept with the correct concept.

Identification is  a distinct use case (possibly a set of distinct use cases) that definitional work must support.  It is not automatically supported by finding essential characteristics, as our example of Exit Row shows.  We shall return to the topic of identification in the future.  Frankly, identification seems to be poorly supported in the literature of definitions.  Yet it is something we are challenged with every day. 

If practical work with a concept involves identification of instances, the definition of the concept must support this work by clearly listing identifying characteristics and saying how they should be used in identification.

Thursday, November 17, 2011

How Should Defined Terms Be Formatted in Definitions?

An enterprise may go to a lot of trouble to develop definitions.  Some degree of standardization across these definitions is helpful.  One aspect of standardization is dealing with terms that appear in definitions.  Terms can be divided into two classes: Common Terms, which widely understood, and require no definition; and Defined Terms, which signify the concepts that the enterprise is defining.

A major rule in definitions is that the term signifying the concept should not appear in the definition.  This is a good idea for summary definitions, but it is hard to justify in full definitions.  Thus, if one component of a definition is a summary, it is best to keep the term being defined out of the summary.  However, the term should be allowed elsewhere in the definition.

Common terms should be allowed in a definition with no special formatting.  There is no need to distinguish them as something special.  Indeed, this would be misleading.  

Defined Terms (either the term being defined, or terms defined elsewhere by the enterprise) should be distinguished.  One reason is that terms composed of many words may not easily be understood to form a single term.  E.g. "The project code management team should be contacted".  Perhaps this refers to a team that manages project codes ("Project Code management team") or perhaps it is a team that manages code for a project ("project Code Management team").  A common way to distinguish Defined Terms is to capitalize them.  This makes it easier for the reader to understand they are a defined term (versus common term), and also to understand that a series of words forms one term.  Another approach is to make defined terms bold.  However, this tends to indicate emphasis rather than the presence of a defined term, and may be disagreeable to readers.  Underlining and italics are also used to indicate emphasis, but may sometimes be a convention for defined terms, as in biological species names, e.g. Panthera leo.

Defined Terms can also be turned into hyperlinks in definition repositories.  This clearly distinguishes them.  Usually this is an automated feature, so Common Terms can get hyperlinked too if they are included in the repository.  Therefore governance is required to keep Common Terms out of the repository.

Wednesday, November 16, 2011

Should A Term Be Analyzed to Determine if It is an Analogy Before Attempting to Define It?

Much definitional work begins with terms.  However, it is surprising that analysts very often take a term on face value and do not try to analyze it further.  In particular, they do not try to determine if the term is an analogy.  There is value in recognizing if a term is an analogy because the analyst can be put on guard against making errors in the definition

An analogy seeks to explain a less familiar concept by some degree of resemblance to a more familiar concept.  By "explain" it is meant that there is an attempt to bring a mind to a better understanding of the less familiar concept (explanation being different to definition). 

There can be problems with analogies, e.g. a business concept explained by football analogies will be confusing to an individual with no interest in football.  Because of problems inherent in the use of analogies, the analyst must find out if the term requiring definition is an analogy.

It may be easy to detect an analogy in a term.  E.g. when it is said the economy will have a "soft landing", it is relatively easy to recognize "soft landing" as an analogy.  However, when we speak of "data ownership" (meaning responsibility for data management tasks), it is not so easy to recognize that "ownership" is an analogy (meaning possessing legal title to property).

The analyst must look at the term to be defined to see if it is entirely or partly composed of terms that signify concepts that are different to the candidate concept that the term to be defined is supposed to signify.  These analogous concepts will likely be more familiar.

A term that is an analogy may be trying to signify an empty concept.  What did Ben Bernanke mean by "green shoots".  If it was "economic improvement", why did he not use that term (which is still vague - increased productivity at the cost of job loss might not be "improvement").  "Green shoots" is probably just a term intended to provoke an emotional feeling of comfort rather than to signify an intelligible state of the economy.

Besides positing a null concept, an analogy may falsely suggest that an attribute in the more familiar concept must exist in the concept being defined.  This is not necessarily so (the fallacy of false analogy).

Because of these dangers, the analyst must detect if a term to be defined is an analogy.

Welcome to Definitions in Semantics

The purpose of this blog is to contribute to the overall field of semantics by advancing the knowledge and promoting the understanding of definitions.  The orientation of the blog is ultimately to practical advances in data and information management, but guided by the realization that such advances must be based on sound theory.

Last year I published the book Definitions In Information Management.  Since that time I have gathered additional facts and thoughts about definitions.  However, I have not set down these facts and thoughts, and risk losing them.  This blog will provide the means for recording these details - hopefully to be retrieved and structured more adequately at a future date.