Monday, July 02, 2007

SOA: Canonical "Data" Model

An important topic when designing service oriented systems is how to enable different services to share semantics to be able to be composed into working solutions. Jack van Hoof has written a good article about this: How to mediate semantics in an EDA. A few weeks later, Nick Malik posted another good read about this topic: Canonical Model, Canonical Schema, and Event Driven SOA. Read Jack's post first.

They both talk about using a canonical data model (CDM) as the Esperanto / Babel fish to map between the format and semantics of the disparate systems taking part in a SOA solution. Note that CDM is not about having a common data model (EAI CDM) or a shared database across all systems in an enterprise, don't get fooled by the "data" in the term 'canonical data model'. CDM is about not making everybody have to speak English, but rather having CDM translators for each native system.

Btw, Gregor Hohpe sometimes use the term 'canonical domain model' on his blog, while using the term 'canonical data model' in the book "Enterprise Integration Patterns" [Hohpe/Woolf CDM pattern (355)]. I think it is better to talk about the business domain rather than about "data", as this help focusing on the business processes rather than databases and other technology. You'd be surprised how many biz people concern themselves with how the data model looks - maybe a leftover from the client-server days, to show that they know what an ER-diagram is? Focus on designing a business process information model (BPIM) for each business process domain.

Trying to enforce a One True Schema across your services (everyone has to speak English) is not a viable path, and it is also a recipe for future maintenance hell. Making every service contract depend on the One True Schema will make it impossible for the services to evolve separately, they will no longer be autonomous. A simple change to e.g. the order entity will cause a ripple effect through all referring services. This is where the business process information model comes into play, it allows you to version and evolve the services independently of each other.

The Canonical "Data" Model concept is also sometimes referred to as a Common Information Model (CIM). Both the business process information model (BPIM) and the data focused CDM/CIM models has the same goal: mediation og semantics. However, they are not the same as the two other models are both variations of the common data model approach. The business process information model is about semantic business process integration, not just only semantic data integration.


Steve Jones said...

On the other hand I'd say that Single Canonical Form doesn't work for SOA

If you are consuming 3rd party services then you can't mandate "your" view of the world, equally having a "mega schema" that is the ultimate truth creates a fragile base class.

CDM is one of those beguiling things, in theory its ideal, in practice it fails.

Kjell-Sverre Jerijærvi said...

You're right about the single/superset being bad, and I think both Jack and Nick means the subset approach. See Nick's follow up post on CDM design (communicate sparingly, allow for questions): Getting the Enterprise Canonical Data Model right

I agree that the subset approach is better, that is also why I like the term "domain" (as in Domain Driven Design) better than "data", as this constrains the canonical model to a bounded context and requires you to draw a context map, and then apply a translator/ anti-corruption layer against services/systems that are not within the domain. If there exists a mapping between the domains in the context map, then one can think of having 'federated canonical domain models'.

Partner/3rd party/out-sourced servies are not part of the core biz domain according to DDD.

Anonymous said...

Using a Common Data Model with SOA

Anonymous said...

A possible reason for 'biz people' concerning themselves with how the data model looks is that they know it's a way of ensuring that the implemented data structures reflect their actual requirements. As a communication tool, a good old fashioned E-R model is often a lot easier for the 'biz people' to understand and validate than a bunch of XSDs. Unfortunately the need for full and clear communication with the business is often overlooked in this kind of exercise.

Anonymous said...

Trying to enforce a One True Schema across your services (everyone has to speak English) is not a viable path, and it is also a recipe for future maintenance hell.

Would n't it be the same hell when there are mutiple schemas for multiple services?
Can't the domain model be called the subset of DatModel? With the domain model their could be cross domain issues as we are speak SOA here..

Kjell-Sverre Jerijærvi said...

There should be only one schema per published 'process services' domain, limiting the ripple effect only the semantic mapping between domains (i.e. the ECDM).

The ECDM should be as small as possible (least common denominator) to keep the maintenance cost of the ECDM to a minimum. Note that the ECDM applies only to the orchestration between the domains, e.g. in a service bus layer (ESB).

Thus, the activity/capability/entity services are all independent of the ECDM and the schema of each other, and they should not depend on a one true, mandatory mega schema.

You're right that the ECDM will most likely be a subset of the complete schemas of the underlying business processes, that is the point of the ECDM. The services will have to "ask for more" if the current state/document of a biz process needs to be augumented with more reference data to move the process to the next step.

Anonymous said...

SOA - Common Information Model (CIM) - Part 1

Anonymous said...

InfoQ news:
Opinion: SOA doesn’t need a Common Information Model

Anonymous said...

More on the common data model thing