Monday, December 12, 2011

Data-Centricity vs. Concept-Centricity - Supposition vs. Definition

Definitions of concepts are extremely valuable, but are they sufficient for data management?  There seems to be a need for something more - for "data centric definitions".  By this I am referring the definition of a table, or column, or other data object in a database.  A concept definition - a full description of a business concept - is undoubtedly needed for a data object, but there is a strong case that more is required to form an adequate "data-centric definition".  I do not really like calling what I am discussing here "data-centric definitions", because a definition has traditionally been thought of as a property of a concept, not of a data object.  However, the term "data-centric definition" at least focuses us well on the area of interest, so I will keep it for now.

Let us first try to discover if there is anything in logic that relates to the topic we are discussing.  And indeed there is. Definition has been part of logic since its inception, and continues to be an area of interest, even in modern logic.  However, an idea called "supposition" goes beyond definition and seems to have something to say about data-centric definitions.  "Supposition" appears to be a purely Medieval idea, apparently not appearing in either ancient or modern logic. 

Supposition works like this.  Imagine I can successfully come up with a definition of "employee" for my enterprise.  The definition, as such is describing the concept of employee.  The definition, as such, has nothing to say about any actual employees.  It is only when I begin to use the term "employee" in propositions that I have to consider exactly what set of instances (actual employees) I am referring to.  The definition alone cannot help me with this.  

If I make a database table called "employee" I have to ensure that I am conforming to the definition of the concept "employee".  But the table may be designed to capture a specific set of employees.  Perhaps the table is only intended to store US employees, or Canadian employees.  Perhaps it is only intended to store current employees, or perhaps only past employees.  None of this information about the nature of the table can appear in the definition of the concept "employee".  This information is the supposition of "employee" - better yet, the supposition of "employee" in a data-centric context.  To work effectively with my employee database table, I need the supposition as much as the definition.

So what is supposition?  Here I will paraphrase George Hayward Joyce's discussion in "Principles of  Logic".  The supposition of a term in a proposition must be understood because even a univocal term (a term which signifies only one concept) can be construed is various ways.  For the logicians there were three main forms of supposition:

(1) Collective and Distributive Use.  When anything is affirmed or denied of a plural subject, the predicate may apply: (a) to individuals (instances); or (b) to the individuals taken as a group.  (a) is called the distributive use (suppositio distributiva) and (b) is called the collective use (suppositio collectiva).  E.g. for (a) "The employees attended a town hall meeting" applies only to the individuals that came to the event.  E.g. for (b) "The employees in our enterprise sign an employment contract" applies to all employees, without having to think about any instances. 

(2) Real and Logical Use.  Is the term being used as it applies to: (a) the real order, or (b) as it is at the conceptual level?  E.g. for (a) "Employee A is sitting at his desk right now" - this refers to reality.  E.g. for (b) "I am working on the definition of 'employee' right now" - this refers to my dealing with the concept of "employee", and not any individuals who are employees.  In logician's Latin, (a) is called suppositio realis and (b) is called suppositio logica.  

(3) Material Supposition (suppositio materialis).  This one is much closer to our ideas about metadata.  It happens when I am referring to the sign I am discussing.  E.g. if I say "'Employee' is a word consisting of eight letters", I am not dealing with the concept, nor any individuals, but only discussing the sign.  Aspects of naming conventions in databases intended to keep the names of data objects reasonably short would seem to fall into this category.

I suspect that the distinction between supposition and definition may answer some of the problems brought up under the heading of "context" by data managers.  It seems to me that "context" is an overloaded and overabstracted term that refers to a number of quite different issues.

I hope this proves that there is a theoretical basis for "data-centric definitions" and that supposition provides part of it.  We have a lot to follow up on regarding "data-centric definitions" and will be returning to it in  the future.      

No comments:

Post a Comment