Sybase iAnywhere SQL AAnywhere Mobile and Embedded Database

I'd rather play golf


Thoughts on data management, autonomic computing, and self-managing database systems.

header image

Patterns of Data Modeling

February 22nd, 2011 · No Comments

Database design is all about creating abstract – or conceptual – models of the real world that can be manipulated to support the real, live enterprise. Most of the time we have an intuitive idea of what the business entities are that need to be represented in the model: customers could be modeled with a customer entity, invoices with an invoice entity, and so on. Subsequently, we then translate these abstract “entities” into relational tables with primary key and foreign key constraints.

This intuition has a formal methodology, of course: it’s relational normalization theory, which among others encompasses functional dependencies, multi-valued dependencies, and inclusion dependencies. For those readers that find normalization theory too difficult to fully grasp, a recent book by Michael Blaha [1] looks at schema design in a different way – by mimicking the use of Design Patterns in object-oriented application programs. The book was announced in the summer of 2009 and is now available as a softcover book, with excerpts available through online channels.

Blaha enumerates a selection of schema design “patterns” that can be used as the basis for designing specific solutions for a particular business problem. Of the six parts in the book, there are two main ones, Parts I and III:

  • A description of generic models of what the author terms “mathematical templates” that include trees, directed and undirected graphs, item description (homomorphism), and star schema.
  • A description of “archetypes”, detailed examples of UML models for 20 specific business entities such as Address, Customer, Item, Flight, Payment, and so on.

Each of the enumerated patterns are illustrated as both IDEF1X and UML diagrams, often along with sample relational instances and example SQL queries to go along with them. To illustrate, from the book (page 30) here is the UML diagram for a management hierarchy, modeled using the tree pattern with modifications to handle historical information:

Blaha_Figure_2-39

along with sample SQL code to illustrate what a query over the more generic example model would look like:

Blaha_Figure_2-37

Strengths

Patterns of Data Modeling contains a wealth of ideas about how to go about modeling common business entities – Blaha’s archetypes – in addition to describing more abstract designs. This is a useful book. The enumeration of these modeling patterns can be helpful when trying to decide what the tradeoffs are in designing a conceptual model, and perhaps more importantly what pitfalls to avoid. Part VI of the book is devoted to transforming these UML models into relational database designs, and part of the discussion of mapping these models into relations concerns implementation tradeoffs. One particular example is how to model an Entity-Type Hierarchy (ETH) as a set of relations.

Weaknesses

While the book enumerates a plethora of pattern models, and is by no means brief at 245 pages, many of the book’s sections cover the material only superficially, leaving it to the reader to investigate the cited references for further background information and other omitted detail. For example, Section 16.5, which concerns selecting primary keys, explains almost nothing concerning the tradeoffs of key selection, which is of critical importance in designing a computer system (see, for example, here and here). Another weakness of the text is that, while it presents SQL examples for the generic models, it does not do so for the archetypes, nor does it consider the performance tradeoffs of specific aspects of the given model. For example, consider the SQL query above from Figure 2.37. The disjunctions are necessary because of the use of NULL to indicate the current time period (i.e., an unknown effective date). There are performance implications to that (simple) modeling decision, which are important because, after all, our desire is to not only design a schema that appropriately models the real world, but also to design and implement a computer system that can effectively and efficiently manipulate that model.

[1] Michael Blaha (2010). Patterns of Data Modeling. CRC Press, Boca Raton, Florida. ISBN 978-1-4398-1989-0.

Tags: Computer Science education · Database Administration