Definitions In Semantics: 2012

Friday, August 3, 2012

Glossary versus Vocabulary versus Concept System

Just a brief note on something that has been bothering me for a while.

In reading Prof. Campbell Harvey's Hypertextual Finance Glossary (http://www.duke.edu/~charvey/Classes/wpg/glossary.htm) I noticed that it contains the ISO 4217 Currency Codes, e.g. there is an entry for "USD" with the definition "The ISO 4217 currency code for the USA Dollar". However, the ISO 4217 Currency Codes are sprinkled though the glossary, because all terms are arranged in alphabetical order (actually, lexicographical order since numbers and symbols have to be taken into account).

This means that there is no single list of ISO 4217 Currency Codes in the Glossary. To extract them would require searching on a piece of text, such as "4217" which is hopefully in every definition.

Figure 1: Concept System of Customer Type

It also requires the searcher to know in advance that there is such a concept system as Currency Code. But how is this always possible? Consider what is shown in Figure 1. Here there are three concepts: Customer Type, Individual Customer, and Corporate Customer. Suppose these are put into a business glossary containing a few hundred entries, and that a reader comes across "Customer Type". How will the reader be aware that there are two other very closely related concepts - Individual Customer and Corporate Customer? Or if someone finds "Corporate Customer", how will they be aware of Customer Type and Individual Customer?

This shows that a large business glossary merges all concept systems and loses relationships between concepts. It might be argued that I can put information into definitions to mitigate this. For instance, I could define Individual Customer as "a Customer Type where the customer is an individual person". However, it would be very unwise to list out all the Customer Types within every entry of the three types. This is not defining a concept, but repeatedly describing the concept system.

However, I do wonder if the understanding that definitions are to be included in a glossary affects the way they are written, so that there are attempts to document the concept system and the relations within it. This risks creating confusion.

A better attempt might be to have self contained vocabularies. So we could form a mini-vocabulary for the three concepts shown in Figure 1. At least they will all be together. But it would seem logical to start off with the concept system itself, or at least the highest genus - Customer Type. However, "Corporate Customer" will sort lexicographically ahead of that. So this is another problem.

Producing a diagram and explanation of the concept system in which all the concepts are defined, and their relationships explained (is a relationship a concept I wonder) is probably the best way. So this leads us to conceptual models.

In reality, a glossary, vocabulary, and concept system are three views of the same semantic space and we probably need all three. However, we have to recognize their limitations and advantages.

Sunday, July 29, 2012

Definition Reference

In the previous posts I had the idea of a person or group that creates a definition. This was a rough draft of a part of another concept system that I am calling Definition Reference. Figure 1 shows this concept system. The unquoted terms are the official terms in this concept system. The quoted terms are shortened versions of the official terms, which are univocal (have only one meaning) within the concept system, but are generally equivocal (have more than one meaning) if you increase scope to beyond the system shown here, and must be used with care.

Figure 1: Definition Reference Concept System

The Concepts in The System

The concepts in this system are as follows:

Definition Reference: a source of a definition

Informal Definition Source: a Definition Reference that cannot be relied upon. There is no guarantee that it is correct.

Definition Authority: a Definition Reference that can be relied on. There is some kind of guarantee that it is correct.

Definition Authoritative Reference: a Definition Authority that has recorded a definition, but did not create it.

Definition Creator: a Definition Authority that created a definition

Definition Analyst: a Definition Creator who creates a definition in the absence of one, but does not claim to have originally created the definition.

Definition Stipulator: a Definition Creator who claims to have originally created a definition

Legislative Definition Authority: a Definition Stipulator who has legitimacy sufficient to make a definition they create binding upon one or more Legislative Definition Users.

Recognized Expert: a Definition Stipulator whose prestige or reputation is sufficient to make a definition they create acceptable to one or more people.

Informal Stipulator: a Definition Stipulator who has no basis for obtaining acceptance of a definition they create.

Thoughts

I am happy with this so far. Here are some random thoughts:

(1) The concept system is a purely generic one (like a Tree of Porphyry). Everything in it is a genus or species of something else (or supertype and subtype if you prefer). This makes it easy to deal with - every relationship if of the same type ("is genus of").

(2) It is easy to see how terms will be shortened and how confusion can occur even within this concept system. For instance, if someone uses the term "Authority" they might mean Definition Authority, Definition Authoritative Reference, or Legislative Definition Authority.

(3) The species (subtypes) of Definition Stipulator are worrying. They only exist in reference to how people accept the definitions. A law with a particular definition that is passed in Canada will not affect me in the USA. I might not recognize an individual as an expert because I am unfamiliar with their work. This area needs further investigation, and it is missing relationships to users of definitions (see Legislative Definition User in prior posts). Also, this is where we depart from the Tree of Porphyry structure.

(4) I think the concept system provides good input to a governance framework for definition management.

(5) I need to expand the definitions provided above - they are preliminary and abbreviated.

(6) I think I can make some of the differentia contradictory. For instance Definition Authority descends into two contradictory species, depending on whether the definition was created or not. This means that I will not have missed any other class at this level. However, I do not think I can reliably do this everywhere.

(7) By creating a visual concept system, it becomes much easier to formulate definitions. I know this is not the point of this post, but it struck me how easy it was to write the definitions with the diagram in front of me. If I was doing a glossary, all the terms would be distributed throughout it and would need much more robust definitions. For example in a prior post I had Legislative Definition Authority (then simply termed "Authority") defined as "an individual person or organization who has legitimacy sufficient to make any Legislative Definition they create binding on one or more Legislative Definition Users".

Thursday, July 26, 2012

On to Stipulative and Legislative Definitions - Visually

Having taken care - for the moment - of the core conceptual model for concepts, terms, and definitions, I returned to where I began, which was to try to show how Stipulative Definitions and Legislative Definitions differ, and to do so visually as shown in Figure 1

Figure 1: Conceptual Model of Stipulative and Legislative Definitions

Basic Definitions

The concepts shown in Figure 1 are defined as follows:

Stipulative Definition: a definition that a Definition Creator creates to describe a concept and which the Definition Creator assigns to a term. In performing the latter, the Definition Creator acts as a Terminologist.

Legislative Definition: a Stipulative Definition whose Definition Creator is an Authority, and whose acceptance is obligatory for Legislative Definition Users.

Authority: an individual person or organization who has legitimacy sufficient to make any Legislative Definition they create binding upon one or more Legislative Definition Users

Definition User: an individual who can potentially use a definition

Legislative Definition User: an individual person or organization that is obliged to accept the Legislative Definition assigned to a given Term by the Authority within a given context. The context may be a contract, regulation, agreement, etc.

Be warned that these definitions are preliminary, and I think I see quite a bit of circularity in them. However, they will have to do for now.

Preliminary Thoughts

What is shown in Figure 1 triggers the following immediate rough thoughts:

1. Legislative Definition is a species (a subtype) of Stipulative Definition

2. A Stipulative Definition is always created by an identified individual or organization (the Definition Creator). Here we have an issue as every Definition must have had a Definition Creator at the outset. The connection with the Definition Creator may be lost over time, at which point the Definition ceases to be a Stipulative Definition. This requires much further exploration.

3. A Definition is not a Stipulative Definition unless the Definition Creator is also known by users of the Definition. It is not a Stipulative Definition merely by having been created by a Definition User. This relationship is essential for a Definition to be a Stipulative Definition.

4. The essential difference between a Stipulative Definition and a Legislative Definition is that a Legislative Definition is created by an individual or organization that is an Authority.

5. A Definition User can freely accept or reject a Stipulative Definition.

6. The Authority must be legitimate with respect to the Legislative Definition User. The Legislative Definition User must be aware that they have an obligation that they have directly or indirectly taken upon themselves to use the Legislative Definition. If the Authority is not legitimate with respect to the Legislative Definition User, then we are dealing with a case of unlawful power being used to enforce the acceptance of a Definition on a person or organization. Such a case is not a Legislative Definition.

7. A Definition Creator has obligations to a Definition User. This might include making a Stipulative Definition intelligible. If the Definition Creator fails to meet these obligations, then the Definition User has one or more reasons to reject the Stipulative Definition. This would seem to be a warning for those individuals and institutions who decide to become Definition Creators.

8. The burden of obligation of an Authority to a Legislative Definition User for a Legislative Definition is higher than that of a Definition Creator to a Definition User for a Stipulative Definition.

Further Thoughts

There is a lot in this. A big part of how we operate in our culture and society includes what concepts we accept or reject. Furthermore, any Authority who creates Legislative Definitions had better do a good job or they might cause problems. I am thinking of the unintelligible definition of "Swap" in the Dodd-Frank Act here.

On the notation front, I do not like the fact that general relationships such as Definition User rejects Stipulative Definition cannot be related to more specific relationships that override them, such as Legislative Definition User is obliged to accept Legislative Definition. Yet something more to be tackled.

Saturday, July 21, 2012

Well, I had another day or so of thinking about the conceptual model I was developing - originally for stipulative and legislative definitions. Actually, a few minutes rather than a whole day was what I had, but such is life when you have a job. But even in that limited time I realized that I had not got the idea of communities in the model.

So I went back to the conceptual model and put in communities as shown in Figure 1.

Figure 1: Relations of Concept and Term

Speech Community: a group of individuals who share a vocabulary that describes a concept system. [See Wikipedia for more - http://en.wikipedia.org/wiki/Speech_community - although it seems you have to belong to the Speech Community Speech Community to understand this entry.]

Semantic Community: a group of individuals who share a common understanding of a set of concepts and relationships (a concept system), irrespective of the terms used to describe them.

I understand that both of these definitions are preliminary and need more work.

The Interpretant must belong to a Speech Community in order to use the term in communication. If the Interpretant is a member of such a community, they will be able (or will have a way to be able)to recognize the sign as signifying a concept. Recognition is the important element here. It is, of course, possible that the Interpretant may not recognize a particular term that is used within the Speech Community (e.g. if the interpretant is new to the community and still learning its vocabulary). However, the Interpretant is in communication with at least some other members of the Speech Community, and has the opportunity to find out from them what the term means to them - how they recognize it.

Recognition seems to be very little discussed in the semantics literature (which I am not well versed in, so I can easily be wrong). However, my background is biology - and recognition is perhaps the most fundamental concept in biology. The standard analogy here is that of a lock and a key. One biologically active molecule, e.g. an antigen, is recognized by another biologically active molecule, e.g. an antibody. The antibody meets the antigen and neutralizes it. This is recapitulated in various ways across the entire science.

Back in our diagram. The key is the term and the lock is the recognition process (part of signification) that identifies the concept to the interpretant. What is important is that recognition can be learned among the members of a Speech Community.

New Types of Existing Term

I have also done some more work on Existing Term. An Existing Term can be differentiated into the following subordinate genera:

Known Term: a term that is in use in a particular Speech Community

Unknown Term: a term that is not in use in a particular Speech Community

What is interesting here is that a New Term is not in use in any Speech Community. But how would you know if a given term, T, is not in use in any Speech Community? You could not unless you knew about all speech communities. Obviously, this is impossible, or at least unimaginably difficult. It would seem therefore that a given interpretant could not distinguish a New Term from an Unknown Term (an Unknown Term relative to the Speech Communities that the interpretant belongs to). This has important practical implications.

New Types of Previously Existing Concept

I have also differentiated Previously Existing Concept into Known Concept and Unknown Concept.

Known Concept: a Previously Existing Concept that is known to a Semantic Community

Unknown Concept: a Previously Existing Concept that is unknown to a Semantic Community

An example here might be "quasar" (derived from "quasi-stellar object"). Quasars existed long before astronomers identified them. So at one time they were an Unknown Concept, that then became a Known Concept within the Semantic Community of astronomers.

Contrast this with Mortgage-Backed Security (MBS). No MBS existed at all prior to the 1960's [I think this date is accurate, but I am not a historian of financial theory, so maybe the 1930's is a safer decade to use]. Neither instances of MBS, not the theoretical (uninstantiated) concept existed prior to this time. Then the concept came into being, and shortly thereafter came instances of the concept. Prior to this MBS was a Not Previously Existing Concept.

So New Concept, Known Concept, and Not Previously Existing Concept are all relative (involve a relation with) at least one Semantic Community.

Practical Implications

The above still needs a good deal of work, but it is possible to see the outlines of practical implications. I am very interested in the work of a Business Analyst (BA) and a Data Analyst (DA - which I will subsume within BA for the moment). Here are some brief thoughts.

On a global project, say a Customer Reference Data project, a BA needs to know if the Speech Communities he is dealing with are actually one Semantic Community. The presumption is likely to be that Customer Reference Data is covered by one Semantic Community, but this is not necessarily the case. There may be differences in concepts and relations associated with some of the Speech Communities. I think the best way to figure this out is to construct a conceptual model for each Speech Community using their terms for concepts, definitions, and relations - and then to compare the conceptual models.

A BA would be unwise to think they are a member of a Speech Community in the area they are analyzing, if they are new to the area. Even if the BA has dealt with the same people before, or the BA thinks they are a member of the related Semantic Community, there is no guarantee they are a member of the Speech Community. Each concept system has its own vocabulary for the Speech Community, and strict terminological analysis is required. I think too many BA's make presuppositions about this kind of thing and get into trouble.

I think there are more practical implications, but blogging demands brevity, and this post is already too long.

Thursday, July 19, 2012

Thinking About Concepts and Terms

I was noodling around with stipulative and legislative definitions, and started to diagram out what I was finding. It occurred to me that I have not really had a rethink about diagramming the relationships between the concepts involved in definition work for a while. Pretty soon I found that I lacked some of the fundamentals, and had to get them sorted out before I could deal with stipulative and legislative definitions.

The result of that effort is the cartoon shown in Figure 1. I am calling it a "cartoon" because I have not had time yet to work it up in some formal notation, such as conceptual graph. I also realize it is incomplete. For instance, I have not had time to figure out where to put Nominal Definition.

Figure 1: Relations of Concept and Term

In Figure 1 all supertype-subtype relations are indicated by solid lines, with the label "is genus of", indicating how the superordinate genus is related to the subordinate genus. This is to distinguish them from other kinds of relationships.

So let us look at what we have (in no particular order):

Term: a linguistic symbol that signifies a concept.

Common Term: a term that signifies a concept that is understood by the general population.

Technical Term: a term that signifies a concept that is understood by a restricted community, and exists within a specialized context.

Existing Term: a term that has been in use prior to a specific point in time.

New Term: a term that is created at a particular point in time, and did not exist prior to this point in time.

Interpretant: a mind or machine that understands a particular term to signify a particular concept.

Terminologist: a person or group of people that assigns a term to concept. The term may be an Existing Term or New Term.

Concept: cognition of a universal as distinguished from the particulars which it unifies [from Baldwin's Dictionary of Philosophy, under "Conception"].

Previously Existing Concept: a concept that exists prior to a particular time, irrespective of whether it is known to exist or not.

Not Previously Existing Concept: a concept that does not exist prior to a particular time. [It was this that I was interested in with respect to stipulative and legislative definitions.]

Previously Known Concept: a Previously Existing Concept that is known to exist. It is known by at least someone. However, there is no guarantee it is known by everyone.

Not Previously Known Concept: a Previously Existing Concept that is not known to exist. It might be unknown to everyone, or to a specific community.

Real Definition: an explanation of a concept.

Critique

Well, this is interesting, but within my preliminary definitions some terms remain undefined.

They are: linguistic symbol; general population; community; context [ugh! I hate that one]; time [more difficult than "context"]; mind; machine; signify; person; group of people; universal; particular; exist.

I think I am also missing a taxonomy (or other concept set) under Previously Known Concept, Not Previously Known Concept to indicate "known / not known by whom".

I will have to figure out all of this in the future - too late today.

Other Notes

· It is interesting to see how Term has two taxonomies: Common Term, Technical Term; and Existing Term, New Term. These seem to be fully external to each other ("orthogonal"), which introduces complexity.

· I wanted to show that the question "what does T mean?" where "T" is some term is a very suspect question. I do not think the diagram shows it. I think this is a false assumption of univocity, which is something else. Oh well.

This is very much a work in progress. I will push on from here to stipulative and legislative definitions.

Monday, July 9, 2012

"How Do You Define Yourself?" - Answer: You Can't

A few days ago my son was getting same day surgery and I was forced to sit in a waiting room for longer than I wanted. The usual trash TV was being shown to keep the nervous relatives quiet, and there was a typical "self-help" show on which featured some bizarre members of the general public, each of whom had issues that apparently had some entertainment value.

A phrase that kept coming up was various variations of "define yourself", as in "How do you define yourself?", or "Don't let your [whatever problem the person had] define you". It occurred to me that the purveyors of these phrases must have had only a fuzzy idea of what they were saying. They were probably repeating a cliché they had heard before, to save themselves the difficulty of thinking and finding a way to express their thoughts.

Definitions for Concepts, Identities for Individuals

So what was wrong with asking "How Do You Define Yourself?"

The answer is that definitions apply to concepts, not to individuals. As it says in the entry for "Definition" in Baldwin's Dictionary of Philosophy

"Individual objects and summa genera are logically indefinable."

So you cannot define yourself, or any individual person, or any individual object. Individual objects have identities, and descriptions, but not definitions.

If the people on the trash TV show were using "definition" in an attempt to express a thought they had, what could that thought have been?

Well, concepts can be implemented as individuals. These individuals possess the attributes of the concept. More properly, the attributes of the concept are expressed in the individual. That is the terminology I first learned in genetics, and I think it is appropriate (although geneticists talk of characteristics, not attributes). So an individual has a character, which is the sum of how all the attributes they possess are expressed.

I would therefore suggest that the subjects of the trash TV show did not want their characters to be judged solely on the basis of the appearance of whatever problem they had.

Moral character, which is what we are probably talking about here, is not the same as definition. You cannot define a person, but you can describe their moral character.

Thursday, July 5, 2012

Misuse of the Scientistic Analogy "Negative Feedback Loop" in Financial Services

It is a well-known fact that the financial services industry - and their regulators - often use bizarre language. For instance, Alan Greenspan, during his tenure as Chairman of the Federal Reserve System, was famous for achieving a level of incomprehension that spawned a generation of "Fed Watchers" needed to interpret him. However, the problem goes far beyond Mr. Greenspan, and one of the worst offenses committed by financial services is scientism.

Scientism is the application of the language and methods of the natural sciences (physics, chemistry, biology, etc.) to problem domains that are completely outside of the realm of natural science. For instance, the term "financial engineering" has been used to describe a large set of practices which contributed significantly to the great financial crisis we are still living in. Having worked in asset securitization myself, I can assure anyone that whatever these practices were, they were certainly no form of "engineering".

Which brings us to the term "Negative Feedback Loop". This is a term that originated with real engineers, but which has since been appropriated by the financial regulators, and has then trickled down to the rest of the industry. Here is an example of its use (admittedly in a quote) by the otherwise erudite Ambrose Evans-Prichard of the Daily Telegraph.

"Mr Roberts said the collapse in Spanish tax revenues is replicating the pattern in Greece. Fiscal revenues have fallen 4.8pc over the last year, and VAT returns have slumped 14.6pc. Debt service costs have risen by 18pc.

The country is caught in a classic deflationary vice: a rising debt burden on a shrinking economic base. “Once you get into such a negative feedback loop, you can move beyond the point of no return quickly,” he said."

[ http://www.telegraph.co.uk/finance/financialcrisis/9301270/Spain-faces-total-emergency-as-fear-grips-markets.html - by Ambrose Evans-Pritchard, International business editor, 7:39PM BST 30 May 2012]

The term "Negative Feedback Loop" in its engineering sense means that the output of a process becomes an input to that process, and acts to stabilize the activity of the process. For instance, the governor of a steam engine is a device at the top of a pipe that is rotated by the engine. The governor includes two or more weighty metal balls hinged to close a hole through which steam would otherwise escape. As the pipe rotates faster, due to engine activity, the centrifugal force lifts the balls off the hole and steam escapes, reducing the power delivered by the engine. This puts an upper limit on the power the engine can provide to whatever it is driving, thus preventing "runaway" effects such as running a pump so fast that it breaks.

This example shows that a "Negative Feedback Loop" can be a very good thing in engineering - and actually it is typically so. However, the great minds of the financial services industry have little acquaintance with real science and engineering. I presume they saw the word "negative" and equated it with "bad" and simply appropriated the whole thing. So if a country owes $1 Trillion, and has a falling GDP (say due to government austerity), the debt-to-GDP ratio increases and the debt is even more difficult to service. This is more closely resembles a "Positive Feedback Loop", where the output of the system is amplified by the output becoming an input, often contributing to equipment damage in real engineering. Take a look a Wikipedia if you want a fuller explanation.

So "Negative Feedback Loop" in financial services is either:

(a) Misuse of terminology to signify real positive feedback loops that do occur in financial services; or

(b) A scientistic analogy that is conning us into thinking there is a real concept being signified, whereas in fact there is no intelligible concept. And the analogy has gone wrong because of the desire for the term "negative" to mean "bad", rather than "dampening" as it does in engineering.

The example I gave above of the debt-to-GDP ratio has points in common with a true positive feedback loop, but that is what I would expect of any analogy. It does not prove that we are dealing with an actual positive feedback loop (that happens to be mis-termed as a "Negative Feedback Loop"). A definition is not a few selected attributes that are in common with some other generalization. In financial services strange things can happen because the domain is governed by human laws, not natural laws. Debt can be repudiated in financial services, but matter and energy must be conserved in the natural world.

My conclusion is that the class of phenomena to which the term "Negative Feedback Loop" is applied in financial services breaks down into concepts that have never been given adequate definitions. Therefore we are dealing with unintelligible concepts. The fakery of using scientistic analogies (because scientific language is always so plausible) has been poorly applied in this case, and the terminology has pointed up the problem.

Monday, July 2, 2012

More Angst on Analogies

The previous post looked at analogies, which often are unrecognized as such in information management. It posited that analogical terms are applied to new and poorly understood concepts, which go undefined or poorly defined. As a result attributes of the concept to which the analogical term originally signified are transferred to the new concept. Of course, this transfer also tends to go unrecognized. The result is a bad definition of the new concept, which can have severe negative practical effects.

I came across a discussion of analogies by Robert B. Stewart in the book Come Let Us Reason (2012, Copan and Craig, editors, ISBN 978-1-4336-7220-0). Stewart points out that analogies "are not evidence that something is so, but rather illustrations of how something could be so" (Stewart's italics). Here I think Stewart is discussing analogies in controversy - how they can be used to support a point, or in the search for an explanation. The passage made me realize that this is not what I had in mind in the previous post. What I was discussing was when a new concept emerges in information management - which is quite frequent - an analogous term is used to signify it, and brings with it the associations of the definition of the original concept signified by the analogous term, and this becomes a big part of an informal de facto definition.

But also, I find people in information management inventing their own terms all the time, even if good terms already exist. These invented terms are always analogies. For instance, I have heard a lot recently about "viewing" a topic space "through different lenses". What is meant by "lens" here? I think it must be conceptual model that filters part of the concept system that constitutes the topic space. Putting it in these terms might raise a lot of questions in the mind of listeners. For some reason, analogical terms seem to slide by without question.

Stewart goes on to say that "analogies can distort our view of reality and lead us down many dead-end paths". He uses the example of the luminiferous ether. This was the hypothetical substance that was rigid with respect to electromagnetic waves, and thus the medium in which these waves were propagated. The ether was supposedly permeable to matter. The ether was supposed to exist based on the analogy between light waves and sound waves. After all, sound waves propagate through air (and other substances). As Stewart points out, the analogy was plausible, but it held up scientific progress, until the Michelson Morley experiment eventually disproved it.

So, we can see that there are even more worries about the effect of analogies on definitions, and how analogies can negatively affect the way we think.

Friday, June 29, 2012

Analogous Terms

A while back I blogged on Univocal, Equivocal, and Analogous terms. Thomas Aquinas wrote about these in Summa Theologica, so it is not really a new topic. That said, I think that how we deal with these three classes of term in information management is a fairly murky area, and requires the development of practical guidance.

Let’s start with analogous terms. What is an analogy? The Free Dictionary provides the following definition:

similarity in some respects between things that are otherwise dissimilar.

[http://www.thefreedictionary.com/analogy, accessed 2012-06-29]

This is a reasonable start, but I think that we need to understand a lot more about analogous terms in information management. I would suggest that an analogous term is:

a term used to signify a concept that uses all or part of a term which signifies a second concept that is better understood. There are supposed to be attributes in common between the two concepts (often only one or a very few), or some other similarity (maybe an emotional response to the term).

An analogous term I particularly dislike in information management is “Data Owner”. For IT staff this signifies a person who has some unspecified responsibility for the data in question. The word “Owner” is clearly an analogous term, because the IT staff have no intention of referring to holding legal title – which is the true meaning of “Owner”. An individual who is a Data Owner should be able to take the data and sell it to anyone they liked. Of course, anyone trying that in a major enterprise would be promptly fired, sued, and have criminal charges preferred against them. It would be no defense to say that somebody in IT told them they were a Data Owner.

So what concept does the term “Data Owner” signify? I cannot find out what it is, but I know for sure that for IT folks at least it does not mean “an individual who holds legal title to the data”. For the moment, let us label this new concept “Concept D” (which may or may not exist).

What I think is going on is that IT folks have a vague idea about Concept D and that Concept D has an analogous relationship with the concept signified by the term “Owner”. I suggest, based on years of dealing with this, that the IT folks think of one shared attribute between Concept D and “Owner” – and that attribute is “responsibility”.

An owner of some item of property will generally assume responsibility for it. For instance, when it comes to my house I pay the mortgage, mow the lawn, take out the trash, shovel the snow, and so on. I could enumerate a very long list of tasks that I undertake because I own my house. All of these tasks could be generalized into the attribute of “responsibility”.

My guess is that IT folks have no clue what the specific tasks are that a “Data Owner” is obliged to perform for the data they “own”. However, it is nevertheless certain that all these unknown tasks could be generalized into the attribute of “responsibility”. The logic here can be expressed as:

1. Every Owner has responsibility for an item they own

2. Individual X has responsibility for a specific set of data

3. Therefore, Individual X is an Owner

This syllogism is of course an invalid argument since it contains 4 terms (rather than 3). The problem is that the terms “an item they own” and “a specific set of data” are distinct. As we have seen, there is no real ownership of data. Hence, we have the 4-term fallacy.

Let us focus now on the term “responsibility” which is the core shared attribute in the analogical relationship.

IT cannot really tell anyone they are responsible for a particular set of data, as nobody in IT is empowered to assign such responsibility. Furthermore, as noted above, IT cannot give a specific list of the tasks associated with being responsible for a set of data. Using the term “Data Owner” gets IT out of these difficulties – unless the “Data Owner” refuses to accept the term, or asks IT what responsibilities are involved in being a Data Owner.

The aspect of the abstraction of specific tasks into the term “responsibility” is an interesting one. It shows how an abstract term can be used to suggest that someone should (or does know) the specifics covered by the abstract term in the context under consideration. This is good fodder for a future blog post

So let us return to Concept D. What exactly is it that IT is signifying by the term “Data Owner”? It is quite possible that some IT staff do not know, in which case it is a null concept. However, other IT staff may simply be seeking somebody who is an actual user of the data, and who can explain the data to them. This is a good deal more mundane than being a “Data Owner”.

One last point is that new concepts emerge frequently in information management. It seems natural to use analogical terms to signify them because the concepts themselves are not yet properly understood. I have no problem with this, but what I have a problem with is giving the impression that the concept exists and is fully understood. The use of an analogical term brings with it the concreteness of the second concept which the term signifies. This gives the impression that something is fully known when it is not. I have experienced this many times over my career in information management.

So what can we learn from this? I suggest the following:

1. In information management be on the lookout for terms that are clearly analogical. They suggest that there is either a null concept or poorly understood concept.

2. Identify the attributes that are in common between the concept to which the analogical term is now being applied and the concept which it originally signified.

3. Determine if the attributes that are supposedly shared between the two concepts are abstracted. If so, this indicates that a level of specificity is missing and further suggests that the new concept is poorly understood – or unintelligible.

4. If an analogical term is used to signify a poorly understood concept, be honest and say this. Try to define the concept, and what its boundaries are.

5. Try to isolate any emotive of suggestive notions associated with the analogical term and determine what their impact is.

Sunday, June 17, 2012

Rethinking Common vs. Technical Terms in Definitions and Some New Definition Rules

After I posted the blog on "Common vs. Technical Terms in Definitions and Some New Definition Rules" I was contacted by Suzanne DalBon, who had a different view on this topic. Suzanne kindly put it that my view might apply in certain situations, but that handing primacy to common terms might cause chaos in other situations.

The first point that Suzanne raised is that the emphasis on common terms in definitions will lead to inconsistency. There are simply too many common terms (words and expressions) to choose from.

A major issue in collaborative environments where any kind of content is produced is the need to achieve consistency. Consistency in outputs is needed for users (readers) of the content to be able to reliably use it. Such consistency is remarkably difficult to achieve. Heavy editorial control is one way, but such control can poison any collaborative environment where contributors are providing their time and effort on voluntary basis. Of course, a collaborative environment is very likely to be the case for the development of definitions in enterprise environments. The editor of a commercial encyclopedia can be expected to whip his contributors into providing a consistent format, and has the power of the purse at his disposal. Further, his contributors would have been chosen in the first place not simply for their substantive ("domain") knowledge, but also for their literary tradecraft.

Any directive to use common terms as much as possible will result in inconsistent content development. Not only is it a question of having many different synonyms of common terms, but there is also the problem of subtle differences when common terms are used. English is especially prone to this. For instance, English terms derived from Anglo-Saxon seem masculine, while English terms derived from Latin seem feminine. Dr Samuel Johnson gave the example of "hearty welcome" having exactly the same roots as "cordial reception". Think for a moment about the different images these conjure up for you - even if you are not a native English speaker.

Suzanne's suggested approach is to use more general technical terms in definitions wherever possible. I had been thinking about descriptive definitions in my original blog post. In essential definitions, more general terms are used because the formula for producing the definition is Definition = Superordinate Genus + Specific Difference. However, essential definitions are comparatively rare. Nevertheless, as pointed out in previous blog posts here, concepts are always found in concept systems. Therefore, we should describe a concept in terms of the closest concepts to it. These will be the ones that have a direct relationship to it. For a technical term, it is unlikely that a common term will signify a concept that is directly related to the concept signified by the term being defined. So a common term will be just too general to be precise. Even so, we should still avoid using technical terms that are more detailed than the term being defined (e.g. the parts of a whole, or the subspecies of a species). Of course, this may be unavoidable in some cases, so we cannot make it a hard and fast rule.

Suzanne also noted that if we have to define a concept using more general technical terms, this is a good way of surfacing technical terms that have not yet been recognized as requiring a definition. It is a very good way of validating the completeness of a special vocabulary.

So to summarize our revised rules for using terms in a definition:

Use more general terms found in the same special vocabulary, whose concepts have a direct relationship to the concept being defined. Inform the editor if any term is not yet defined.
If no appropriate term is available in (1), then use even more general terms within the same special vocabulary as the term to be defined. Inform the editor if any term is not yet defined.
If no appropriate term is available in (2), then use a more specific term within the same special vocabulary as the term to be defined. Inform the editor if any term is not yet defined.
If no appropriate term is available in (3), then use a common term. However, editorial control may be needed to rationalize the use of common terms across definitions.

I suppose that these rules presuppose the existence of an editor. But that is a topic for future blog posts.

Monday, June 11, 2012

Dangers of Automated Hyperlinking in Definitions

In my previous post I noted the example of the definition of “Mortgage-Backed Securities” in Prof. Campbell Harvey’s Hypertextual Finance Glossary. The definition is:

Securities backed by a pool of mortgage loans [http://www.duke.edu/~charvey/Classes/wpg/glossary.htm]

The term “pool” is hyperlinked in this definition, and the definition of “pool” is:

In capital budgeting, the concept that investment projects are financed out of a pool of bonds, preferred stock, and common stock, and a weighted-average cost of capital must be used to calculate investment returns. In insurance, a group of insurers who share premiums and losses in order to spread risk. In investments, the combination of funds for the benefit of a common project, or a group of investors who use their combined influence to manipulate prices.

This definition of “pool” does not fit the use of the term “pool” as it appears in the definition of “Mortgage-Backed Securities”. I would argue that “pool” in this context should be preliminarily defined as:

a set of mortgages with common characteristics that act as collateral for debt instruments

This is quite different to Prof. Harvey’s definition, so how did there come to be such a difference?

I wonder if what we are looking at here is the automated hyperlinking of terms in definitions. I do not know that this is happening in Prof. Harvey’s Glossary, but I have seen it as a feature of other semantic tools. It is very convenient to type in a definition and have all the terms within it hyperlinked if they occur elsewhere in the vocabulary that is being constructed. If this is not done, the analyst has to go and create the hyperlinks manually – a process that is very time-consuming.

But while automated hyperlinking is a great feature, it needs to be controlled. I think the best way to do this is to have a cross reference report that provides a side by side comparison of a definition with the definitions of the terms used in it. We can imagine the definition of a term on the left hand side of a page, with the hyperlinked terms highlighted. On the right hand side of the page we could have the definitions of all the terms highlighted on the left hand side. The analyst can then check whether the way in which each term is used in the definition on the left is consistent with the definition of that term on the right.

Simply relying on automated hyperlinking without this kind of check would seem to be inviting trouble.

A further check would be to ensure that the terms used in a definition to not link to definitions that are outside the vocabulary under consideration. For instance, here is a definition of “pool” from http://oxforddictionaries.com/definition/pool--2?region=us

a supply of vehicles or goods available for use when needed

This is definitely not the concept being identified by “pool” in Prof. Harvey’s Glossary. However, in a broad semantics repository it is quite possible that a wide range of vocabularies might be present, and a term in a definition might get hyperlinked to a definition in an entirely different vocabulary – a term which is signifying a concept that is quite alien to the vocabulary which contains the original definition.

Thursday, June 7, 2012

Common vs. Technical Terms in Definitions and Some New Definition Rules

A good deal of work in dealing with definitions in information management is done by analysts, and when I have been doing analytical work I have been struck by the need to capture Technical Terms. Particular business areas always seem to have their own technical jargon, as does all of IT. However, in capturing the concepts that lie behind these terms there is always a challenge about what terms to use in their definitions.

It is an old rule that high quality definitions should not use terms as obscure as the term being defined. This is negative advice - telling us what not to do. But what should we do? I suppose that the best approach would be to use Common Terms. A Common Term is one used in everyday discourse, and for which there is a well-know definition. I agree that it is a noble goal to use only Common Terms in a definition of a Technical Term, and we should make every effort to do this.

However, can I really define something like "Mortgage-Backed Security" only by using Common Terms? Campbell Harvey's Hypertextual Glossary defines "Mortgage-backed Securities" as:

Securities backed by a pool of mortgage loans [http://www.duke.edu/~charvey/Classes/wpg/bfglosm.htm]

But in this case, "pool" is not a Common Term, meaning a body of water or a swimming pool. It is actually a Technical Term which is further defined by Prof. Harvey as:

Having developed quite a lot of securitization software in my time I would not define "pool" that way, but as:

a set of mortgages with common characteristics that act as collateral for debt instruments

That definition could probably stand improvement too, but let us get back to our main point. "Pool" seems to be a Common Term but is really a Technical Term. Yet we have no way of recognizing it is a Technical Term. Actually, to be fair, in Prof. Harvey's Glossary we can infer it is a Technical Term because it is hyperlinked to the above definition (which is inadequate to define "pool" in the context of "Mortgage-Backed Security")

Prof. Harvey's definition of "Mortgage-backed Securities" also refers to "mortgage loans". You could argue that this is a Common Term, but you could also argue that it is a Technical Term in the very broad area of finance, which is a much broader area than Mortgage Securitization. This is interesting as it implies that there are vocabularies for specialized areas which are subspecies of less specialized areas. It would seem to be helpful to use Technical Terms from a more general specialized area in definitions of terms that exist within a more specialized area. After all, the more general specialized areas should be more widely known, so more people will understand these concepts. But if somebody does not understand a term from a more general specialized area it should be easy for them to find understand its definition.

From this discussion we can derive some rules of what terms to use in a definition:

Always try to use Common Terms in definitions
If a Technical Term has to be used, try to use a Technical Term from a vocabulary that covers a more general area than the area to which the concept being defined belongs
Always try to use a Technical Term from the most general area above the area to which the concept being defined belongs
Only as a last resort should a definition contain a Technical Term that is specific to the area to which the concept being defined belongs
Do not use Technical Terms from a more specialized area than that to which the concept being defined belongs

This is interesting as it implies we will always need generalization hierarchies to do definitions well if we adopt the above rules. Of course, the Tree of Porphyry has been around for many centuries to support the old formula of Definition = Superordinate Genus + Specific Difference. However, I am discussing Descriptive Definitions rather than Essential Definitions here, and it is Essential Definitions to which the Tree of Porphyry and the old formula apply. So it is interesting to see that there are additional reasons for having a generalization hierarchy.