Wednesday, November 05, 2008

SharePoint: Common Reference Data Management

There are two kind of common reference data typically used in SharePoint:

  1. Data that is native to the portal
  2. Data that provided by external systems such as CRM, ERP, SCM
The usage and maintenance of the common reference data differs by their information type classification as described in later sections. Doing a thorough information architecture analysis is important for this aspect of SharePoint also.

The sharing of common data across multiple site-collections will also affect how native data can be stored, accessed and maintained.

External Common Data

External data used in SharePoint should not be imported into SharePoint lists; but must rather be used as pure reference data, and be maintained in their native systems. This is to avoid replication as much as possible due to the extra implementation and operational effort each added replication requires.

Use the web-parts provided by the external system to view the external data. Usage of the external data as lookup columns in lists is restricted to either:

The “Business Data Catalog” (BDC) is the standard MOSS mechanism for utilizing external data in MOSS, but this will require MOSS enterprise edition. Note that the BDC "Business Data" column cannot be used as a site column and hence not be part of a site content type.

Suggested third-party components:

Always test and verify that custom columns can be used as site columns if you plan to use them in site content types. Also test and verify that custom columns will work in DIP when used in a document content type, even if only added to list content types.

Native Common Data

Native reference data should be stored and managed in SharePoint lists. The advantage of using SharePoint lists is that you get a data management UI for free: just use standard SharePoint to maintain the data. The downside is the need for replication of shared data across site-collections that use the reference data, as OOTB lookup columns are restricted to a single site-collection.

The common data lists will be used as reference data sources for site columns of type “lookup”. This will ensure that common data is based on native SharePoint aspects, possibly enhanced with third-party SharePoint components.

It is recommended to create the reference data list and the corresponding lookup site column at the root site of the site collection, as the lookup column can then be used in lists in any sub-site to lookup the reference data (
cross-site lookup).

Externalized Native Common Data - External/Hybrid

Sharing common data in SharePoint lists across site-collections is not trivial. It must be considered if native common reference data should be externalized to a custom database and be treated as external data. The downside is that an application must then be implemented to allow for maintaining the externalized data. Data-aware tools such as InfoPath, Excel or Access can be utilized as the data maintenance front-end.

A variant of the external database is to use SharePoint itself as the external database, then you get the best from both alternatives (hybrid): a real master for the common data and the SharePoint UI for free. Use either a BDC lookup column or a third-party lookup column that works across site-collections.

Externalizing the native common data to a separate database will require more work on backup/restore. Using the SharePoint database avoids this.

Suggested third-party components:

Always test and verify that custom columns work across site-collections.

Replicating Common Data across Site-Collections

Native common data will be managed centrally in a master site collection, replicated to multiple target site collections. There will be one master list per target list type. Lists in sites in the target site collections can then reference the target common data lists using columns of type “lookup”.

Note: always prefer using externalized native common data instead of replication.

There are two options for how to maintain and replicate the shared data:

A) Overwrite data in targets on replication, i.e. data must only be entered in master as any updates to data in target lists will be lost on next replication. Updating target lists must be prohibited.

B) Two-way data replication to allow for updating data in targets in addition to master. Conflict resolution rules must be configured so that the master wins any auto-resolvable conflicts.

Which option to use for a list depends on the classification of the information asset that the list contains. E.g. country and product lists need not be updated except from in the master; while it is critical for the business that project information assets like contacts can be updated and used immediately.

Note that there will be latency in the replication of changes for both option A) and B). This latency can be configured. The impact on this replication latency must be analyzed based on the information types identified as common portal data. Different information types will have different tolerance for data staleness.

For option A) it must be considered to use a remote list viewer/editor in relevant sites to allow for editing master lists “remotely” from target sites.

Suggested third-party components:

Always test and verify that custom components such as web-parts and column types can be replicated.

SharePoint Lookup Column Types

Note that the standard “lookup” column is not a many-to-many relationship; it is just a choice list with a multi-select option. There are several third-party lookup types available to alleviate this limitation.

Suggested third-party components:

Always test and verify that custom columns can be replicated if used in lists and content types.

Codename "Bulldog" - Master Data Management

Microsoft last year acquired the Stratature +EDM master data management (MDM) platform. It will be incorporated as part of the next version of SharePoint, however only a few details are public. Read the roadmap here.

1 comment:

SharePointFrank said...

If full SharePoint list features are required for external data too, e.g. workflow and alerts you also can use the 3rd party BDLC tool found here: