Thursday, May 08, 2014

Getting SSL Termination to work for HNSC in SP2013

We have been struggling a bit with getting off-box SSL termination to work properly for SharePoint 2013 host-named site collections (HNSC). We had issues with the ribbon, with admin pages like "manage content and structure", and with the term picker. Sure signs that some JavaScript files did not load. Users could not edit the terms in managed metadata fields, that is, terms could be selected, but clicking "ok" to save would just hang forever. A lot of scripts and links would not load, showing mixed content warnings in IE9 - and nothing at all in Chrome and Firefox, which both just blocks HTTP content on secure HTTPS pages.

To cut to the chase, this setup for SSL offloading is what worked for us:
  • create the web-app on port 80 (not 443), do not use the -SecureSocetsLayer switch
  • do not use the server name as the web-app name, you have a farm - don't you?
  • always extend the web-app to other zones before starting to create HNSC sites
  • create a classic root site-collection with the same HTTP name as the web-app, do not use a HNSC for this
  • a site template is not required for the root site-collection
  • alternate access mapping (AAM) is used for load balancing even for HNSC, but HNSCs can't use AAM for host name aliases
  • create the HNSC using an internal HTTP URL in New-SPSite for the default zone, remember that crawling must always use the default zone
  • create a public URL alias for the default zone by mapping an unextended zone using a HTTPS URL in Set-SPSiteUrl
  • create public HNSC mappings using HTTPS URL in Set-SPSiteUrl for the other zones
  • ensure that your gateway adds the custom header "front-end-https: on" for all your public URLs secured using SSL
  • note that using just "front-end-https: on" and HTTP in the public URL will not correctly rewrite all links in the returned pages

In short, the salient point is to use HTTPS in the public URLs even if the web-app zone does not use the SecureSocetsLayer switch nor any SSL certificates. The default zone of the web-application must be configured for crawling - either no SSL or full SSL with certificates assigned in IIS. With no SSL you have to simulate AAM by mapping two URLs to the HNSC default zone.

We had to use HTTP on the default zone to crawl the content of the published pages. It seems that if the web-application does not use SSL and your site default zone uses a HTTPS host header, then only the friendly URLs (FURL) will be crawled while the content will generate a lot of "This item comprises multiple parts and/or may have attachments. Not all of these parts were indexed." warnings. The result of the warning is no metadata being indexed, thus no search results - not good for a search-driven solution.

Note that SSL is recommended for all web-applications in SP2013 also inside the firewall, especially if you use apps - as the OAuth tokens otherwise will be exposed in the HTTP traffic, just as classic IIS basic authentication is not recommended without SSL. We wanted to do SSL bridging with BigIP due to this, but could not get SSL server name indication (SNI) configured successfully in BigIP v11 to allow us to have SSL certificates bound to two different IIS web-sites, even if IIS8 supports SNI.

SNI is required when the shared wildcard certificate or SAN certificate approach cannot be used for your SP2013 web-application setup, i.e. when binding to host names in multiple IIS web-sites at the web-application level. SNI is required when you need to use more than one web-application or more than one zone (extended web-app), even if you could bind your one-SAN-to-rule-them-all certificate to multiple IIS web-sites. IIS cannot route the request based on the host header until the request has been decrypted - SNI allows the request to be routed to the correct IIS web-site.

Remember that this is the path the HTTP(S) request travels from the browser:

 browser >
  host header >
   DNS A-record >
    virtual IP-address (VIP) in gateway > SSL off-box termination here
     load balancing >
      IIS server configured with IP-address >
       IIS web-site bound to IP-address (or host header) > normal SSL termination here
        SP web-application >
         site-collection bound to host header (HNSC)

Keeping tabs on this will help you understand the Technet guide to HNSC, which has some room for improvements. See this article by jasonth for a step-by-step guide for HNSC and SSL. Note that binding to host names in IIS rather than to IP-addresses for HNSCs at the SP2013 web-application level is supported, just as it was for SP2010.

Thursday, May 01, 2014

Managed Metadata Navigation and Anonymous Users in SP2013

The new term-driven navigation in SP2013 has some gotchas for anonymous users, resulting in them not seeing a full navigation menu. These are some things to check:
Finally, remember that you have to publish a major version for each page that you link to from the navigation node, otherwise anonymous users won't see the page, and neither the term. This includes all items on the page that also requires approval, such as images. An easy thing to forget, if you've been so stupid as not to use the simple publishing configuration for your site. If you as an admin or logged in user can see terms and view a page, while visitors can not - you forgot to publish. An empty page or no term is a sure sign.

Related to the managed navigation is the friendly URL (FURL) mechanism, which uses the term set structure to build the FURL from the linked-to term. To prevent broken links when moving a term, SP2013 stores links using the FIXUPREDIRECT.ASPX page, with params such as the termID, which will be resolved server-side into a friendly URL when rendered (see navigation term GetResolvedDisplayUrl). Do not render RichHtmlField using the simple "fieldvalue" web-control, as this will not resolve the fixup-links.

This all applies to author-in-place (AIP) usage of term-driven navigation and friendly URLs; cross-site publishing (XSP) have different kind of issues.

Wednesday, October 23, 2013

Roadmap for Responsive Design and Dynamic Content in SharePoint 2013

Responsive design combined with dynamic user-driven content and mobile first seems to be the main focus everywhere these days. The approach outlined in Optimizing SharePoint 2013 websites for mobile devices by Waldek Mastykarz and his How we did it series show how it can be achieved using SharePoint 2013.

But what if you have thousands of pages with good content that you want make resposive? What approach will you use to adapt all those articles after migrating the content over from SharePoint 2010? This post suggests a roadmap for gradually transforming your static content to become dynamic content.

First of all, the metadata of your content types must be classified so that you know the ranking of the metadata within a content type, so that it can be used in combination with device channel panels and resposive web design (RWD) techniques to prioritize what to show on what devices. This will most likely introduce more specialized content types with more granular metadata fields. All your content will need to be edited to fit the new content classification, at least the most important content you have. By edited, I mean that the content type of articles must be changed and the content adapted to the new metadata; in addition, selecting a new content type will also cause a new RWD page layout to be used for the content. These new RWD page layouts and master pages are also something you need to design and implement, as part of your new user experience (UX) concept.

While editing all the (most important) existing content, it is also a good time to ensure that the content gets high quality tagging according to your information architecture (IA), as tagging is the cornerstone of a good, dynamic user experience provided by term-driven navigation and search-driven content. Editing the content is the most important job here, tagging the content is not required to enable RWD for your pages.

Doing all of this at once as part of migrating to SP2013 is by experience too much to be realistic, so this is my recommended roadmap:

Phase 1
- Focus on RWD based on the new content types and their new prioritized metadata and new responsive master pages and page layouts
- Quick win: revise the search center to exploit the new search features, even if tagging is postponed to a later phase (IA: findability and search experience)
- Keep the existing information architecture structure, and thus the navigation as-is
- Keep the page content as-is, do not add search-driven content to the pages yet, focus on making the articles responsive
- Most time-consuming effort: adapting the content type of all articles, cutting and pasting content within each article to fit the new prioritized metadata structure

Phase 2
- Focus on your new concept for structure and navigation in SP2013 (IA: content classification and structure, browse and navigate UX)
- Tagging of the articles according to the new IA-concept for dynamic structuring of the content (IA: term sets for taxonomy)
- Keep the page content as-is, no new search-driven UX in this phase, just term-driven navigation
- Most time-consuming effort: tagging all of your articles, try scripting some auto-tagging based on the existing structure of the content

Phase 3
- Focus on search-driven content in the pages according to the new concept  (IA: discover and explore UX)
- New routines and processes for authors, approvers and publishers based on new SP2013 capabilities (IA: content contributor experience)
- Most time-consuming effort: tune and tag the content of all your articles to drive the ranking of the search-driven content according to the new concept

Phase 4
- Content targeting in the pages based on visitor profile segmentation, this kind of user-driven content is also search-driven content realized using query rules (and some code)

The IA aspects in the roadmap are taken from my SharePoint Information Architecture from the Field article.

Saturday, March 16, 2013

Controlling Content Database Size in SharePoint


A SharePoint content database can be up to 4TB with data (max 200GB is recommended). However, storage size is not the problem; it is the recovery time to restore all that data that is the availability problem. The recovery time decides for how long your business critical solution will be down. As SharePoint can spread its content across multiple databases, it is recommended that your architecture segments different content across different databases based on IA and other user experience aspects, plus business requirements for availability and recovery time. Plan for structuring your solutions with a strong focus on your information architecture (IA).

Here are some options for how to control the size of the content databases, without disposing and deleting content:

A) Use an ootb Record Center as an archive for old content: The users must manually send each document to the RC using e.g. move and leave a link; note that only the latest major version with metadata is kept – all version history is lost. The information management policies supported by SharePoint for retention and disposition can be used to automate the cleanup.
As the RC has its own content databases, the live collaboration databases will grow slower or even shrink as outdated information is moved to the archive. Keeping the live databases small ensures shorter recovery time; while the recovery time for the archived content can be considerable, but not business critical.
Search must be configured appropriately to cover both live and archived content.

B) Use a third-party archiving solution for SharePoint from e.g. MetaLogix or AvePoint. This has the same pros & cons as in option A, but the functionality is probably better in relation to keeping version history and batch management of outdated content.
Search must be configured appropriately to cover both live and archived content.

C) Use a third-party remote blob storage (RBS) solution for SharePoint, such as MetaLogix StoragePoint, so that documents are registered in the database, but not stored there. This gives smaller content databases, but more complicated backup and recovery as the content now resides both in databases and on disk. Provided that you don’t lose both at the same time, the recovery time should be shorter.
Search will work as before, as all content is still logically in the “database”.

D) Use powershell scripts or other code to implement the disposition of outdated content. The script can e.g. copy old documents to disk and delete old versions from the content database; the drawback being that all metadata will be lost and there is no link left in SharePoint.
The databases size will shrink as data is actually deleted, and backup and recovery is more complicated as content is now both in the database and on disk (same as for option C).
Search can be configured to also crawl and index the files on disk, but content ranking will suffer as the valuable metadata is lost.

My recommendation is to consider option A first, especially if you are able to define automated rules and exploit the built-in information management policies in SharePoint. The keyword is *able* - in my experience, everyone is positive to having automated retention and disposition, but noone even at large banks and law firms are able to come up with the policies.

Always consider using RBS for databases larger than 200GB, and note that RBS also helps you meet the disk IOPS requirements of SharePoint.