Friday, January 19, 2007

WCF: Core categories of data contracts

One of the famous SOA tenets is "services share contract, not class/implementation", meaning that it is the schema of your contract that is the main conveyor of how to consume the operations provided by your service. This has a huge impact on how you should design your contracts to provide for clear, understandable and comprehensive semantics, and also to minimize ambiguity in how to use your service. Contracts that have subtle or vague semantics are just more difficult to use and are thus more error prone. The same applies to contracts that are too flexible.

This post is about how to design data contracts that a simple to use, rather than easy to implement (simple vs easy); and at the same time keeping the number of data contracts to a minimum. The latter is important both for the consumers of your service and for the maintainability of your service. It is also important wrt SOA governance, the less stuff you have to govern, the better. Less schemas, less semantics, less maintenance, less governance.

Data contracts belong to one of these two groupings: altering state and querying information. Generally speaking, operations that modifies your system need to comply with stricter requirements and rules than operations that reads data from your system. This is because operations that can leave your system in an invalid state have greater technical impact on your business than operations that just returns information. Of course, if you disclose the incorrect information, your business could be in serious legal trouble.

The two data contract groupings can be further refined into several categories based on the different needs for expressing contract semantics and for being unambiguous. These five data contract core categories have manifested themselves through several more or less service-oriented solutions that I have implemented:
  • Insert/update contracts: Typically one contract per domain object. Optional contained data contracts must be avoided or specifically handled.
  • Delete contracts: Typically one contract per domain object.
  • Specification/criteria contracts: Typically one contract per result contract, but it is not uncommon that a single specification can relate to multiple result contracts. Optional members are perfectly standard; the same applies to nullable criteria. Composite specifications are normal.
  • Read/query result contracts: One or more contracts per domain object. Optional contained data contracts are allowed for flexibility and this is a key mechanism for keeping the number of result contracts to a minimum. Composite contracts are also allowed for the same reasons.
  • Batch update/import contracts: Typically one contract per domain object batch operation type. Composite contracts are normal. Optional composite or contained contracts must be specifically handled.
These are core data contract categories for entity/core services. You will need to have more than just these core data contracts to provide good, event-driven, specialized business process services (EDA) in different contexts (sales, support, accounting, logistics, partners, suppliers, customers, etc).

The term ‘domain object’ also comprises complex objects (aggregate root objects) such as an order or a document card. The term ‘contained’ is used for complex objects. The term ‘composite’ is used for contracts that consists of several domain objects. The term ‘batch update’ includes insert, update and delete actions or a combination of these actions.


Note that I use CRUDy terms in the categories for simplicity (easier for me), to cover any real-life event that affects the state of a domain object. E.g. the “customer has moved” event falls into the “update” category.

A result contract will typically contain a composite structure of domain objects, defined by exactly the same unambiguous data contracts used for insert/update actions. The main reason for defining data contracts in the first place is to promote standardization and reuse across services and operations. To be able to support both the rigid insert/update data contact requirements and the flexible result contract requirements; it becomes a must to separate structure from data, isolating the structure/composition to the result set data contracts. Structural elements in a data contracts implicitly impose subtle semantics: how will the service handle the omission of composite/contained domain objects.

Insert/Update Contracts

It is important that insert/update contracts have little room for ambiguity, especially for complex domain objects. E.g. if the customer data contract contains a collection of addresses, what will happen if a customer update action is performed and no addresses are provided: does it mean that the customer no longer have any addresses or does it just mean that your can update a customers phone number without having to specify the addresses?

Such contained objects must be either A) required or B) specifically handled and by default optional/ignored. Controlled optional elements can be handled the way that the .NET 1.x XmlSerializer handled optional elements: using an extra property to indicate the state of the optional element. The XmlSerializer uses a Boolean XxxSpecified property for each optional element, e.g. OrderShippedDateSpecified.

Rather than using just a Boolean for the contained optional object, I recommend using an enumeration that contains Ignore (default value) and then some other applicable actions; much like cascading actions in SQL Server. The customer data contract should contain both an AddressList collection and an AddressListAction enumeration with e.g. the values Ignore, Replace, Alter, Purge. The point is that the user has to explicitly assign an action on the contained collection, rather than the service just assuming that an empty collection means deletion of the existing children. Assumptions are semantic coupling, and that is something you should strive to avoid.

Note that these 'insert/update' contract recommendations apply to entity/core services, which are not the services you want to expose publicly. Your public services need to reflect the events of your service-oriented business processes, and these "published" services belongs to the 'application to application services' category. By layering your services according to the four service categories,
you will be able to expose more specialized operations with smaller contracts. Large contracts imply stronger coupling to the service, and as large contracts are more likely to change, your service will be more subject to breaking changes. Small contracts are simpler contracts, and simple contracts are important for the reusability, reliability, quality and robustness of your service (more about this in "Patterns for High-Integrity Data Consumption and Composition" by Dion Hinchcliffe).

You can still provide a very specific business operation that builds on the core service. E.g. the "customer has moved" event can be supported by an composite operation that takes only the customer key and the new postal address; which internally gets the complete customer, alters the address, and then stores the customer, in a single transaction using the core services.
Services at the A2AS layer allows you to be "liberal in what you accept" as they shield the consumers from the details of the core services.

Read/Query Result Contracts

Result data contracts should be able to fit multiple needs and support several views of domain objects and composite result sets. At the same time, a consumer should be able to control how much information that gets returned from the service. E.g. one consumer might not be interested in address information when fetching customer data. Thus, a result data contract will most likely comprise optional elements, and consumers will not fail if some data is not present in the result set.

An empty collection does not normally imply the same ambiguity for reads as it does for insert/update contracts. A consumer will typically assume that if a fetched complex object contains no elements for a contained data contract, then the object does not have any such children; e.g. that a customer has no addresses if the customer AddressList collection is empty. An extra metadata property could be added to the result data contract as an indication of whether an optional element actually contains data even if not returned due to the processed query specification.

Note that ‘not present’ in the result set is not the same as ‘missing’ from the result set, which is clearly an error and should have caused a service fault.

Batch Update/Import Contracts

Batch update contracts are typically used to alter the state of a set of (related) domain objects. E.g. to update the TaskList collection of a project by sending a message that contains the tasks to add, modify and remove as one batch. Batch operations are a good way to avoid having to expose transactions outside your service; package all domain objects that must be altered in a transaction into a single message and perform the update using a single transacted operation.

Note that each data contract must still follow the rules described for ‘insert/update contracts’ even when used as part of a batch contract.

To be ideal objects for batch operations, domain objects should expose a “row-state” property; if they don’t, you need something like ‘Service Data Objects’ to make your batch contracts really simple to use. A message with one collection per action should be the last alternative.

No comments: