Thursday, January 5, 2012

On Roles, Attributes, and Definitions

Dave Hay commented on my post How Many Attributes Do I Have?  Dave notes that there is a difference between me and the roles that I play.  This is an important point that I struggled with previously.  Dave states "most of the examples are attributes of my role as a customer", meaning the examples I provided in my post.

"Role" is a term that gets bandied around a lot in data modeling.  In my previous post on Role vs. Relationship I argued that roles really refer to certain kinds of relationships.  However, Dave's point is one that I have heard on a lot of occasions and has to be taken seriously.

Let's state the question this way: is the attribute Customer Lifetime Value to Hardbitten Liquors an attribute of me, or an attribute of my role as a customer of Hardbitten Liquors?  And if the latter, just what do we mean by "role".

There is no doubt that I am an instance of a concept.  The concept is human being.  Further, Customer Lifetime Value to Hardbitten Liquors can be predicated of me, strongly suggesting it is an attribute I possess.  

But now let us think of the role that is being suggested in this discussion.  What is it?  Is this role "Customer of Hardbitten Liquors"?  If so, I would argue that this is a relationship between me and Hardbitten Liquors.  And if an entity type has attributes, and relationships do not, then we cannot say that a role has any attributes.

But suppose Dave is right and the role does have attributes.  It will have to be an entity of some kind.  What other thing could the role be - apart from "human being".  There is a possibility.  Suppose I only ever bought one bottle of Grandpa's Tipple from Hardbitten Liquors.  Then, my entire relationship with Hardbitten Liquors could be encompassed by this one event - the purchase of this one bottle.  Now, Purchase is an entity type, albeit non-material, so it can at least be a candidate for the role.

But can Purchase really be the same as role?  I do not think that an event can have an attribute such as Customer Lifetime Value to Hardbitten Liquors, which really refers to the individual customer.  And I do not think this can be true of any aggregate of instances of Purchase events either, supposing, for instance, that I buy one bottle of Grandpa's Tipple every week.  

So if role is not to be identified either with me or my purchases, what other entity types can it be identified with?  I need to do some more research to be able to answer that.  However, for now I am still going to stick with attributes like Customer Lifetime Value to Hardbitten Liquors as being an attribute of me.  So my original point provisionally remains: a concept can have a vast number of attributes and some methodology is needed to decide which ones to include in a definition.

Wednesday, January 4, 2012

Legislating vs. Discovering Definitions: Radical Differences

Most of my experience in doing definition work has mostly been from the perspective of an analyst involved in systems development.  These days it usually data-centric applications, such as building Master Data Management (MDM) or business intelligence (BI) environments.  The method of the analyst begins with understanding scope and requirements, and then finding the business concepts and data objects that need to be defined.

However, there are other perspectives.  Terminologists, often oriented to the language translation industry, do deinition work.  So, I suspect, do brand managers, who want to control messaging to customers. I work a good deal in financial services, and I am aware that there is another group that gets involved in definitions.  These are business people who create completely new products.  For instance, Asset Backed Securities (ABS) and Colladeralized Debt Obligations - both now notorious as weapons of mass financial desctruction - were created by investment bankers in the past few years. 

My experience with ABS begins with the legal documents of a bond issuance deal.  These documents contain the full definition of everything in the deal, the rules for how the bond is supposed to work over time, and the contractual obligations of the parties involved.  One of the tasks I was involved in was to take these documents, reverse engineer them, and create cashflow models under various scenarios to see how the bonds would perform.  A global meltdown caused by changing the credit rating of the these bonds from AAA to DDD overnight was not part of these scenarios (in case you were wondering).

The definition work that goes into an ABS structure has to be precise.  It is essentially part of building a conceptual system - a new piece of reality - that will be set in motion.  A major problem in finance is that the products are all non-material.  It is not like manufacturing new designs of plastic gnomes to decorate gardens, or baking a different kind of doughnut.  The laws of metaphysics, mathematics, and nature do not supervene to automatically take care of things.  The new plastic gnome will not suddenly melt down overnight for no apparent reason.  A doughut I place in the fridge will not evaporate for no apparent reason.  But equally strange things can and do happen in financial systems.  Contradictions, I would maintain, do not exist in material reality - but they can be present (albeit unrecognized) in financial reality.  An ABS issuance can both be AAA-rated and have significant defaulted underlying collateral at the same time. 

Legislative definitions are those which are created as part of creating the concept being defined.  I agree that creating a concept is diffenent to creating an instance of a new type of already existing concept.  Each ABS deal includes a lot of concepts that were defined previously - in other ABS deals.  However, differences can still arise.  My point is that the degree of care involved in such definitions much be much greater than that of the analyst.  An investment banker can create a Doomsday Machine if he or she puts together a flawed ABS deal.  An analysts can mess up an integration point for a data object, but can usually remedy it after discovering the problem.

So I think we can conclude that the consequences for bad definition work vary depending on what the work in being done for.  In some situations the definitions have to be rock solid from the start.  Other situations may be more forgiving.  Recognizing the risks involved is an important part of definition work.
One final point.  I have illustrated legislative definitions using examples from financial services.  However, I would maintain that the same problems apply to all sectors, e.g telecommunications, pharmaceutical, and government.  The problems will arise in any situation where non-material reality is constructed.

Tuesday, January 3, 2012

How Many Attributes Do I Have?

Characteristics of a concept - its attributes - are central to definitions.  But to what extent should the characteristics of a concept be listed in its definition?  Should it be few, or some, or as many as possible?  As a step in beginning to answer that question, I think that we need to ask if we can reliably determine all the characteristics that a concept possesesses.  And I now intend to see if I can answer that question by finding out if I can list all the attributes that I possess.

I am aware that I am an instance and not a concept (at least a general concept).  However, I would submit that there is a prima facie case that I should be able to provide a list of my attributes.   If such a list could be produced, then we couldn see if the attributes apply to the concept I am an instance of (humans).  We could then move on to figuring out what attributes should or should not be included in a definition.  But, if I cannot even reliably figure out what attributes I have, then I may have difficulties I have not yet recognized in the method I have chosen to get answers to the questions I am posing.

It is easy to start listing out all the physical characteristics I possess: height, weight, eye color, and so on.  I could add some non-material ones too, such as age, and IQ score.  However, from my experience as a data modeler and developer, these seem rather trivial.  I have come across many examples of database tables such as Customer, where I could conceivably be represented by a record.  These tables have columns (representing attributes) for e.g. Customer Lifetime Value, Customer Sales Year to Date, Customer Average Order Size, and so on.  I would guess that every company for which I am a customer maintains such attributes to describe me. 

Actually, I am guessing.  I know I possess a height, weight, eye color, etc, because I know what these attibutes are and I know I possess them.  However, when it comes to a company of which I am a customer, say, Big Box Super Store, it is not so clear.  Specifically: (a) I do not really know what attributes Big Box Super Store considers I have; and (b) I do not know how Big Box Super Store defines each of these attributes. 

Many of the Customer tables I have seen have had hundreds of columns (attributes).  Some have had thousands.  At this scale, even when working with these tables it is difficult to keep track of all the attributes they are representing.  Admittedly, the tables were not always designed well, and included columns that represented attributes that were not truly part of Customer.  But even allowing for this, the scale is still great.  Furthermore, Big Box Super Store is not the only company I buy from.  I probably have a similar relationship with about 50 other companies.  So the total number of attributes I have as a result of these relationships is certainly in the thousands, maybe in the tens of thousands.

It could be argued that many of these attributes are really the same.  Suppose Big Box Super Store calculates Customer Lifetime Value the same way as Hardbitten Liquors (of which I am also a customer).  Then, are we talking about one attribute or two?  As a practical problem, however, I cannot give an answer to this because I do not know how each company is calculating the attribute each calls "Customer Lifetime Value", or how each defines this attribute.

What I strongly suspect is that I carry around with me a vast burden of attributes that companies, government agencies, educational institutions, and other organizations have heaped on me, mainly without my consent, and certainly without my knowing what they are.  Not as many as the grains of sand on the seashore, or stars in the night sky, but enough to wonder at.

So the answer to the question posed in the title is that I cannot reliably say how many attributes I have, but it must be a vast number, and some of them are likely to be outside my range of understanding.  Does this present an issue for definition work?  I think it does.  It suggests some kind of need for scoping.  It also suggests that I appear in different ontologies, and that my definition in each may vary.  But the Muse of Blogging now decrees an end to the current post, so these topics will have to be taken up when Her inspiration returns.       

Friday, December 30, 2011

The Problem of Pluto: What Is being Defined?

I wanted to return to the issue of Pluto, which has already been the subject of a number of posts.  The International Astronomical Union (IAU) created a rich array of issues and problems when it undertook a definitional change that resulted in the demotion of Pluto to the class of "dwarf planets".

The topic this time is what exactly did the IAU define?

I was watching a PBS special on the status of Pluto a few days ago.  It included scenes from a diner where the genial Neil deGrasse Tyson was asking customers what they thought about the new status of Pluto.  The reponses varied, but the issue at hand was about whether Pluto was "a planet".  The diners all thought that they were dealing with the general concept signfied by the term "planet".  Yet there is reason to think they were mistaken.

The IAU resolved (see http://www.iau.org/public_press/news/detail/iau0603/) concerning the following:

"The IAU therefore resolves that planets and other bodies in our Solar System, except satellites, be defined into three distinct categories in the following way:"

So what is being defined? Answer: "planets and other bodies in our Solar System, except satellites". 

Not planets in general.  But wait a moment - on the web page referred to, it also says "Resolution 5A is the principal definition for the IAU usage of 'planet' and related terms."  Yet this is not part of the text of Resolution 5A.  It seems to be some extraneous comment of uncertain provenance.  It certainly appears to be in conflict with the text of Resolution 5A, which, again, is only dealing with the situation in the Solar System.

So we have: a lot of people thinking that the IAU defined "planet"; and the text of Resolution 5A which is defining "planets and other bodies in our Solar System, except satellites"; and a statement on the IAU web site saying that Resolution 5A is to be used for planets in general. 

This is contradictory.  The definition is for a "planet in the Solar System" but somehow can be used for a planet not in the Solar System also.  In other words, we can substitute the definition for both A and Not-A. 

Let's try that with the proposition about one of the extrasolar planets:

"51 Pegasi b is a planet that orbits the star 51 Pegasi". 

Substituting the definition presented in Resolution 5A for the term "planet" we get:

"51 Pegasi b is a celestial body that (a) is in orbit around the Sun, (b) has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape, and (c) has cleared the neighbourhood around its orbit that orbits the star 51 Pegasi."

So we have a contradiction.  51 Pegasi b apparently orbits both the Sun and 51 Pegasi.

This contradiction arises from the IAU restricting the definition of "planet" to the Solar System, but pretending that it can be used for any planet.  It also shows how Natural Science is dependent on Logic, which is part of Philosophy.  But that is a far more controversial topic.

Thursday, December 29, 2011

Three Classes of Identification in a Definition

Stijn commented on my earlier post "What is an Identifying Characteristic?" (http://definitionsinsemantics.blogspot.com/2011/11/what-is-identifying-characteristic.html) raising the point that "identification of a thing is dependent on the application".  He lists out things that identify him, and notes that one cannot always be substituted for another.  E.g. a passport cannot always be substituted for a driving license.  It depends on the application, and each application has rules about what can be used as identification.  Stijn asserts that trying to capture all such rules in a definition will create conflict between the parties representing the applications.  So he advises us to separate a definition from capturing such rules.

There are a lot of topics compressed into this comment, so I am only going to pick one here.  It is the different classes of identification that should be captured in a definition.  I suggest that these are:
  • Characteristics of the concept being defined that set it apart from other related concepts.  These are the classic specific differences (differentia)
  • Characteristics that can be used to recognize an instance of the concept.  This is what I was trying to highlight in the original post when I stated that an exit row in an airplane could be recognized by a sign saying "no children in this row".  There is no reason for these characteristics to be specific differences.
  • Characteristics that can be used to identify an instance of the concept.  This is what Stijn was talking about, saying his passport could be used to identify him.  Identifying an instance is not the same as identifying or recognizing a concept. 
So it turns out that identification is quite complex in definition work.  Simply talking about "identifying characteristics" as I did, does not take this richness into account.

The per-application rules that Stijn mentions add more complexity, but that will require another post.

Wednesday, December 28, 2011

Role versus Relationship - What Does it Mean for Definitions?

A couple of days ago I was reading some material on a semantics product and came upon the term "role".  We see role used in data modeling where a primary key migrated into a child entity can be assigned a "role name".  This is the name by which the attribute is known in the child entity, and is useful to disambiguate the same attribute migrated for other relationships between the same two entities.

You also hear about "role" in the party model.  Rather than say that Unindicted Broker is a client of Overleveraged Bank, and that Unindicted Broker is a prime broker for Overleveraged Bank, we can say that Unindicted Broker is a party that plays two roles with Overleveraged Bank: (a) client; and (b) prime broker.

I think that there are deeper issues here.  We think of a relationship in a data model as a line between two entities.  We cannot allocate attributes to the relationship as we can to entities.  Our notations, methodologies, and tools will not allows it (at least the commonly used ones).  Furthermore, it is relatively rare to find multiple relationships between the same entities.  When we do find quite different relationships between the same two entities we seem to start thinking of roles.

Now, a relationship is a concept, and therefore must have a description and hopefully a definition.  If a relationship can have a definition, it must have characteristics (qualities, i.e. attributes). This worries me somewhat as relation and quality are two Aristotelian categories and one should not be reducible to the other.  However, I cannot find the theoretical foundation for what I am describing.

A further issue is that what we are calling a relationship such as "Unindicted Broker is a client of Overleveraged Bank" is a generalization about a lot of processes.  Unindicted Broker had to be solicited to be a client, then onboarded as a client, and then assessed in terms of how the relationship would be managed.  All of these processes come under the umbrella of "Unindicted Broker is a client of Overleveraged Bank" but break down to many more detailed entities and relationships. So "Unindicted Broker is a client of Overleveraged Bank" is a generalization, although it is valid.

So where does this leave us?  Not very far I am afraid, but we can begin to see the outlines of the problem.  The term "role" is used in semantics, but it is not clear if it is used technically or informally.  "Role" exists in data modeling, but is for refining names of attributes associated with relations.  And "role" exists in the party model, for "high-level" relationships.  There is some evidence that relations can be broken down into more detailed entities and relations that may serve to describe a role.  Ultimately it does seem that roles can be resolved into sets of entities and relationships at the data model level.  However, at the level of semantics it is not clear how they can be treated as other than relationships.

A role does seem to demand a definition that is greater than what is to be supplied for a "regular" relationship.  A role must be distinct from other roles, or you could argue it should be collapsed with its sibling roles into one role. So at least we can conclude that if we have identified a role we need to provide it with a good enough definition to provide such distinction.  Obviously, there is a lot more to this, but that is enough for now. 

Tuesday, December 27, 2011

Is A Data Model An Abstraction?

Rob brings up a good point in his comment on The Problem of Abstraction in Definitions of Data (http://definitionsinsemantics.blogspot.com/2011/12/problem-of-abstraction-in-definitions.html).  He notes that what I am describing is not really abstraction but really a number of different things.
Today it seems the term "abstraction" is used in all kinds of situations when talking about data.  For me, it is often difficult to figure out what "abstraction" is supposed to mean in any one of these situations.  I strongly suspect that at least sometimes it does not really mean anything.  Sometimes I suspect it is even used for marketing hype.

The entry for "abstraction" in Baldwin's Dictionary of Philosophy and Psychology describes how abstraction is filtering out of attributes from an instance or a concept to achieve a particular view of the instance or concept.  Rather poetically the entry describes how a child looks at a body of water and becomes fascinated by the lustre caused by the play of sunlight on the surface of the water, to the exclusion of all the other qualities (attibutes) of the water.

This traditional understanding of abstraction as creating a view by filtering out attributes can be used in a special way to create the generalization hierarchies of genus and species (a.k.a. supertype and subtype, or general concept and specific concept).  The particular attributes of a group of specific concepts are left behind and attributes that the concepts have in common remain.  These are used to form the general concepts that include the specific concepts. 

However, abstraction as filtering out of attibutes can generate other perspectives.  Abstraction does not always have to lead to the traditional generalization hierarchy.  I can understand a man's watch as a timepiece, or a piece of jewelery, or as a fashion accessory.

Now, Rob is right in that I was not using "abstraction" in the above senses.  However, I do not have a better term to use for what I was trying to describe.  The main idea I was trying to get across is that one concept system can describe or specify another - such as how a  data model describes a physical database.  The relationship of "description" here is different to every other kind of relationship because the concepts present in the concept system being described have to have some kind of presence in the concept system doing the the describing.  This is not the same class of relationship we see in e.g. "I own a car".

So we somehow have the presence of a concept being described (e.g. a column of a physical database table) in a concept system doing the describing (e.g. an attribute of an entity type in a data model).
Rob terms this "representation" (If I understand his comment correctly).  This has to be right.  However, a representation can often be a picture - a mere image.  Technically, this is called a "phantasm" because it does not have the attributes differentiated from the whole.  Unfortunately, the process of recognizing and separating the attributes from a phantasm is also called abstraction.  It gets more complicated.  we cannot take a photograph of a physical database and produce anything like a data model.  A database has to be conceieved, not imagined.

Obviously, we are getting into a whole lot of other issues here.  I cannot really defend myself against Rob's criticism of my having overloaded (or over-abstracted?) the term "abstraction".  However, I do not have a commonly accepted set of terms that I can use to convey the idea of one concept system describing another.  More of an excuse than a reason, but it will have to do for now.