Monday, March 5, 2012

Is a Definition Just a List of Attributes?

If we look at a data model, is a definition of an entity type automatically produced by listing the attributes of the entity type?  If this were true then a data modeler would not need to produce entity definitions - he or she would simply need to identify and list a sufficient number of attributes.  I have actually heard data modelers being criticized by terminologists for doing just this.  The extent to which such criticism is fair or not is a separate discussion, but the question remains as to whether a list of attributes can suffice as a definition.

I do not think that a list of attributes is sufficient based on the recent discussions about concept systems in this blog.  No concept exists in isolation.  Every concept exists in some kind of concept system where it has relationships to other concepts.  At least some of these relationships and/or related concepts have to enter into a definition so that the concept being defined can be located properly in a concept system, which appears to be necessary for knowledge.

A list of attributes usually will not distinguish between those that are determining for the concept under consideration, and those which are not - some of which may be shared with other concepts.  Thus, just reviewing a list of attributes becomes a test of figuring out which ones are pertinent to a definition.  This surely defeats the practical aspects of definition.

It would seem therefore that something more than a list of attributes is required to produce a quality definition.  While many data modelers do produce quality definitions, it can be seen that the practice of data modeling may present the temptation to just assume that the definition of an entity type is provided by the attributes captured for it.  Of course, relationships and other concepts are present in a data model, but it will need another blog to answer the question of whether a data model has enough information to produce a definition based on entity types, attributes, and relationships alone.

Wednesday, February 22, 2012

Generic vs. Partitive Concept Systems


For the past couple of blogs I have been exploring different types on concept systems.  I have found these discussed, oddly enough, not in the literature on data modeling, but in the literature on terminology work.  At this point, I want to look at the two major concept systems.  These are very abundant in the raw material of information management, and require special attention.

Generic:  This is the familiar supertype-subtype concept system, where a more generic concept encompasses a range of more specific concepts.  E.g. Animal - Chordate - Vertebrate - Mammal - Primate - Homo sapiens.  There are a couple of interesting properties of this concept system:
  • Any instance found in a specific concept is also covered by a more general concept.  The more general concepts possess fewer attributes than the more specific ones, but every specific concept possesses the attributes of each "parent" generic concept.  
  •  Intention is inversely related to extension.   That is, the greater the number of specifying characteristics (intension), the more restricted the population of instances that is covered by the concept (extension).
Partitive: This is the part-whole concept system.  The study of part-whole relationships is called mereology.  It seems a bit odd to have a named discipline for this type of concept system, but not for others.  Perhaps it is an artifact of the evolution of philosophy.  Anyway, an example of a part whole system would be the organs of the human body, such as brain, liver, pancreas, kidney, and so on.  To have a complete view of the human body we would have to include tissues, such as epithelium, blood, muscle, nerves, etc.  This concept system is totally unlike the generic one as the parts have quite different identities that do not share characteristics.  We also run into interesting problems such as denial that the whole is anything more than the sum of its parts.  To summarize its properties:
  • Each concept in a partitive concept system covers a range of instances that are not found in any other concept in the system.  There is no overlap of instances among the concepts in the system, unlike the generic type of concept system.  
  • There is no relation between extension and intension of the concepts in the system.  Each concept has characteristics, none of which apply to the system as a whole.
I think that understanding different types of concept system has been overlooked by data modelers. Presumably this is because the arrangement of boxes and lines in a data model does not look very different for a generic or a partitive concept system.

Monday, February 20, 2012

On Types of Concept System

In my previous blog I discussed the existence of different types of concept systems.  I have found these discussed, oddly enough, not in the literature on data modeling, but in the literature on terminology work.  I have not found discussion of concept systems in philosophy, but that might merely reflect my lack of education, reading in, and general knowledge of philosophy.

Before going further into types of concept systems, we need to establish what a concept system is.  Nordterm 8 Guide to Terminology by Heidi Suonuuti (ISBN 952-9794-14-2) states the following:

Concepts are not independent phenomena.  They are always related to other concepts in one way or another, and form concept systems which can vary from fairly simple to extremely complicated.  In terminology work, an analysis of the relations among concepts and an arrangement of them into concept systems, is a prerequisite for the successful drafting of definitions.

This is not a great definition of "concept system" but it is a good start.  The ISO 704 standard from ISO/TC 37/SC 1 tells us the following:

Concepts do not exist as isolated units of knowledge but always in relation to each other. Our thought processes constantly create and refine the relations between concepts, whether these relations are formally acknowledged or not. A set of concepts structured according to the relations among them is said to form a concept system

In organizing concepts into a concept system, it is necessary to bear in mind the subject field that gave rise to the concept and to consider the expectations and objectives of the target users. The subject field shall act as the framework within which the concept field, the set of thematically related but unstructured concepts, is established.

ISO 704 also states:

The terminology of a subject field is not an arbitrary collection of terms. The relevant concepts constitute a coherent concept system based on the relations existing between concepts. The unique position of each concept within a system is determined by the intension and the extension.

This is not the place to get into what is meant by a "subject field".  However, it does seem apparent that a concept system is a number of concepts and the relations between them.  This corresponds to what data modelers call a "conceptual data model", but which they should call a "conceptual model".  A conceptual model describes a set of business information as such, without any thought of how it might be stored as data.

A concept system also differs from a single relationship between two or more terms.   Baldwin's Dictionary of Philosophy defines "relation" in logic as follows:

The mutual dependence of two or more subjects upon a common principle, fact, or truth, of such a kind that any assertion regarding one modifies the meaning of the other.  Accordingly, the predicate is true or false of one taken not independently or in isolation, but only in reference, regard, or respect to the other.  

The way in which a concept system differs from a relationship is that a concept system contains many relationships.

So we have some idea of what a concept system is.   What is interesting is that concept systems are not all of different kinds, but are distributed in types.  This idea is found in terminology work, but does not seem to be found in data modeling.  Perhaps this is because data models are oriented to building databases for data storage, and not for describing business information.    

If there are types of concept system, then each type should have its own properties.  Understanding these properties might help us work with concept systems and hence with definitions.   Data modelers do not seem to have contributed much if any thought about types of concept systems.  It is true that there are many books on data model patterns, but these are oriented to data modeling goals such as not needing to change a database structure unless it is unavoidable.

A challenge will therefore be to catalog the different types of concept system, and their special properties, and find ways to apply them in the practical work of definition management.

Monday, February 6, 2012

The Idea of Concept Systems

An involuntary hiatus has prevented me from the pleasure of blogging on definitions for about a month.  I am now gradually getting back to normal, and am able to blog again.

Today I want to look at concept systems, and types of concept system.

In data modeling, only one type of concept system commonly appears - the generic concept system, containing Supertypes and Subtypes.  Very occasionally, the part-whole type of concept system can also be found.  The latter be seen in "bill of material" structures.  Strangely, the visual representation of a generic concept system and a part-whole concept system can look very similar in a data model.   I think that this leads data modelers to play down the idea of concept systems, and indeed the term "concept system" is not really met with in data modeling.

However, if we turn to the discipline of terminology, the idea of concept system is very prominent, and different types of concept systems are called out.  Let me quote from the Nordterm Guide to Terminology by Heidi Suonuuti (ISBN 952-9794-14-2):

"Concepts are not independent phenomena.  They are always related to other concepts in one way or another, and form concept systems which can vary from fairly simple to extremely complicated.  In terminology work, an analysis of the relations among concepts and an arrangement of them into concept systems, is a prerequisite for the successful drafting of definitions."

It is interesting that from the data modeler's perspective, concept systems are viewed only with respect to designing data storage solutions.  A terminologist, by contrast, is more interested in business information and how concepts are related within it - irrespective of how such information might be stored as data.

This makes me wonder about semantic modelers.  We hear a lot about semantics these days, and there is no doubt that semantics involves identifying concepts and providing definitions for them.  But finding the relationships between the concepts must be done prior to forming the definitions.  This is because a definition, in part, describes a concept's relations to other concepts in the concept system in which it is found.  So what good methodologies, notations, and techniques exist for describing or visualizing concept systems?  I am not sure we have yet got any good ones.   The danger is that we then fall back on the data modeling methodologies, notations, and techniques, which fail to capture significant semantic details.

But perhaps more important is that the terminologists have the idea of types of concept systems.  The generic and partitive types of concept system are the major ones, but there are others.  We will deal with the different types in a future post.

Thursday, January 5, 2012

On Roles, Attributes, and Definitions

Dave Hay commented on my post How Many Attributes Do I Have?  Dave notes that there is a difference between me and the roles that I play.  This is an important point that I struggled with previously.  Dave states "most of the examples are attributes of my role as a customer", meaning the examples I provided in my post.

"Role" is a term that gets bandied around a lot in data modeling.  In my previous post on Role vs. Relationship I argued that roles really refer to certain kinds of relationships.  However, Dave's point is one that I have heard on a lot of occasions and has to be taken seriously.

Let's state the question this way: is the attribute Customer Lifetime Value to Hardbitten Liquors an attribute of me, or an attribute of my role as a customer of Hardbitten Liquors?  And if the latter, just what do we mean by "role".

There is no doubt that I am an instance of a concept.  The concept is human being.  Further, Customer Lifetime Value to Hardbitten Liquors can be predicated of me, strongly suggesting it is an attribute I possess.  

But now let us think of the role that is being suggested in this discussion.  What is it?  Is this role "Customer of Hardbitten Liquors"?  If so, I would argue that this is a relationship between me and Hardbitten Liquors.  And if an entity type has attributes, and relationships do not, then we cannot say that a role has any attributes.

But suppose Dave is right and the role does have attributes.  It will have to be an entity of some kind.  What other thing could the role be - apart from "human being".  There is a possibility.  Suppose I only ever bought one bottle of Grandpa's Tipple from Hardbitten Liquors.  Then, my entire relationship with Hardbitten Liquors could be encompassed by this one event - the purchase of this one bottle.  Now, Purchase is an entity type, albeit non-material, so it can at least be a candidate for the role.

But can Purchase really be the same as role?  I do not think that an event can have an attribute such as Customer Lifetime Value to Hardbitten Liquors, which really refers to the individual customer.  And I do not think this can be true of any aggregate of instances of Purchase events either, supposing, for instance, that I buy one bottle of Grandpa's Tipple every week.  

So if role is not to be identified either with me or my purchases, what other entity types can it be identified with?  I need to do some more research to be able to answer that.  However, for now I am still going to stick with attributes like Customer Lifetime Value to Hardbitten Liquors as being an attribute of me.  So my original point provisionally remains: a concept can have a vast number of attributes and some methodology is needed to decide which ones to include in a definition.

Wednesday, January 4, 2012

Legislating vs. Discovering Definitions: Radical Differences

Most of my experience in doing definition work has mostly been from the perspective of an analyst involved in systems development.  These days it usually data-centric applications, such as building Master Data Management (MDM) or business intelligence (BI) environments.  The method of the analyst begins with understanding scope and requirements, and then finding the business concepts and data objects that need to be defined.

However, there are other perspectives.  Terminologists, often oriented to the language translation industry, do deinition work.  So, I suspect, do brand managers, who want to control messaging to customers. I work a good deal in financial services, and I am aware that there is another group that gets involved in definitions.  These are business people who create completely new products.  For instance, Asset Backed Securities (ABS) and Colladeralized Debt Obligations - both now notorious as weapons of mass financial desctruction - were created by investment bankers in the past few years. 

My experience with ABS begins with the legal documents of a bond issuance deal.  These documents contain the full definition of everything in the deal, the rules for how the bond is supposed to work over time, and the contractual obligations of the parties involved.  One of the tasks I was involved in was to take these documents, reverse engineer them, and create cashflow models under various scenarios to see how the bonds would perform.  A global meltdown caused by changing the credit rating of the these bonds from AAA to DDD overnight was not part of these scenarios (in case you were wondering).

The definition work that goes into an ABS structure has to be precise.  It is essentially part of building a conceptual system - a new piece of reality - that will be set in motion.  A major problem in finance is that the products are all non-material.  It is not like manufacturing new designs of plastic gnomes to decorate gardens, or baking a different kind of doughnut.  The laws of metaphysics, mathematics, and nature do not supervene to automatically take care of things.  The new plastic gnome will not suddenly melt down overnight for no apparent reason.  A doughut I place in the fridge will not evaporate for no apparent reason.  But equally strange things can and do happen in financial systems.  Contradictions, I would maintain, do not exist in material reality - but they can be present (albeit unrecognized) in financial reality.  An ABS issuance can both be AAA-rated and have significant defaulted underlying collateral at the same time. 

Legislative definitions are those which are created as part of creating the concept being defined.  I agree that creating a concept is diffenent to creating an instance of a new type of already existing concept.  Each ABS deal includes a lot of concepts that were defined previously - in other ABS deals.  However, differences can still arise.  My point is that the degree of care involved in such definitions much be much greater than that of the analyst.  An investment banker can create a Doomsday Machine if he or she puts together a flawed ABS deal.  An analysts can mess up an integration point for a data object, but can usually remedy it after discovering the problem.

So I think we can conclude that the consequences for bad definition work vary depending on what the work in being done for.  In some situations the definitions have to be rock solid from the start.  Other situations may be more forgiving.  Recognizing the risks involved is an important part of definition work.
One final point.  I have illustrated legislative definitions using examples from financial services.  However, I would maintain that the same problems apply to all sectors, e.g telecommunications, pharmaceutical, and government.  The problems will arise in any situation where non-material reality is constructed.

Tuesday, January 3, 2012

How Many Attributes Do I Have?

Characteristics of a concept - its attributes - are central to definitions.  But to what extent should the characteristics of a concept be listed in its definition?  Should it be few, or some, or as many as possible?  As a step in beginning to answer that question, I think that we need to ask if we can reliably determine all the characteristics that a concept possesesses.  And I now intend to see if I can answer that question by finding out if I can list all the attributes that I possess.

I am aware that I am an instance and not a concept (at least a general concept).  However, I would submit that there is a prima facie case that I should be able to provide a list of my attributes.   If such a list could be produced, then we couldn see if the attributes apply to the concept I am an instance of (humans).  We could then move on to figuring out what attributes should or should not be included in a definition.  But, if I cannot even reliably figure out what attributes I have, then I may have difficulties I have not yet recognized in the method I have chosen to get answers to the questions I am posing.

It is easy to start listing out all the physical characteristics I possess: height, weight, eye color, and so on.  I could add some non-material ones too, such as age, and IQ score.  However, from my experience as a data modeler and developer, these seem rather trivial.  I have come across many examples of database tables such as Customer, where I could conceivably be represented by a record.  These tables have columns (representing attributes) for e.g. Customer Lifetime Value, Customer Sales Year to Date, Customer Average Order Size, and so on.  I would guess that every company for which I am a customer maintains such attributes to describe me. 

Actually, I am guessing.  I know I possess a height, weight, eye color, etc, because I know what these attibutes are and I know I possess them.  However, when it comes to a company of which I am a customer, say, Big Box Super Store, it is not so clear.  Specifically: (a) I do not really know what attributes Big Box Super Store considers I have; and (b) I do not know how Big Box Super Store defines each of these attributes. 

Many of the Customer tables I have seen have had hundreds of columns (attributes).  Some have had thousands.  At this scale, even when working with these tables it is difficult to keep track of all the attributes they are representing.  Admittedly, the tables were not always designed well, and included columns that represented attributes that were not truly part of Customer.  But even allowing for this, the scale is still great.  Furthermore, Big Box Super Store is not the only company I buy from.  I probably have a similar relationship with about 50 other companies.  So the total number of attributes I have as a result of these relationships is certainly in the thousands, maybe in the tens of thousands.

It could be argued that many of these attributes are really the same.  Suppose Big Box Super Store calculates Customer Lifetime Value the same way as Hardbitten Liquors (of which I am also a customer).  Then, are we talking about one attribute or two?  As a practical problem, however, I cannot give an answer to this because I do not know how each company is calculating the attribute each calls "Customer Lifetime Value", or how each defines this attribute.

What I strongly suspect is that I carry around with me a vast burden of attributes that companies, government agencies, educational institutions, and other organizations have heaped on me, mainly without my consent, and certainly without my knowing what they are.  Not as many as the grains of sand on the seashore, or stars in the night sky, but enough to wonder at.

So the answer to the question posed in the title is that I cannot reliably say how many attributes I have, but it must be a vast number, and some of them are likely to be outside my range of understanding.  Does this present an issue for definition work?  I think it does.  It suggests some kind of need for scoping.  It also suggests that I appear in different ontologies, and that my definition in each may vary.  But the Muse of Blogging now decrees an end to the current post, so these topics will have to be taken up when Her inspiration returns.