Reason 1: File Format Support

When creating a search solution you need content to be searchable. For SharePoint solutions the majority of this content is documents produced by information workers. Typically it will be Microsoft Office formats like Word, PowerPoint, and Excel, but also Adobe PDFs, e-mails and CAD drawings.

In order to support most binary formats outside of the Office Suite, you will have to purchase third party IFilters to do the conversion into plain text, which in turn can be indexed by the search engine. You have no real control over what metadata they extract, and how metadata will differ from other text in the documents. While the IFilters are often fairly cheap, you have to shop around to get a complete offering to cover your all of your file formats.

Come to the rescue: FAST for SharePoint. Bundled with FAST for SharePoint is a feature called the Advanced Filter Pack. This is a document conversion library by Stellent (now part of Oracle), which handles 200+ document formats, including the ones for which you would usually buy IFilters.

The two formats I have come across most times with customers are the already mentioned Adobe PDF as well as AutoCAD files. If a particular business does any kind of manufacturing, product design or owns installations of some kind, they probably have CAD files as part of the content. If this is the case, my bet is they want to search it.

Oddly enough the Advanced Filter Pack is turned off by default. It can easily be enabled with a PowerShell command, and I see no reason why you would not want this feature turned on after installing FAST for SharePoint. When executing the PowerShell script to enable the Advanced Filter Pack, a warning will be displayed: Beware; you might actually get some useful metadata and text from your files!


The second point to note with the Advanced Filter Pack is that metadata from the conversions are now available to you in a structured form inside the content processing pipeline, which leads me over to Reason 2.

Reason 2: Advanced Content Processing

With the default search offering with SharePoint 2010 you have no control over the data being available to you in the index. Sure you have crawled properties which you can map to managed properties, but you cannot create new ones or modify the content in the ones being outputted by the crawlers and IFilters. With FAST for SharePoint you are given an extensibility point where you are free to do whatever you want with any data or metadata during indexing.

How does this apply to business pains and real business value? When indexing several content sources, or even a single one, having multiple variants of a project name, product code or the name of a person is common and something you have to deal with. You typically want to deal with these variants as the same entity. Using built in word mapping in FAST for SharePoint or creating custom ones allows you to easily map several entities to one.

Example 1 – Normalizing a name

Barack Hussein Obama II Barack Obama
Barack Obama Barack Obama
Obama Barack Obama
President Obama Barack Obama
Senator Obama Barack Obama

Example 2 – Normalizing a product

iPhone 3GS iPhone
iPhone 4 iPhone
iPhone 32GB Black iPhone
MC605KN/A iPhone
I3GS-B iPhone

These are just two simple illustrative examples. You can do more advanced processing as well, like parsing xml files, calling a third party OCR (optical character recognition) module, enrich metadata with calls to Internet services, or in an enterprise settings where security is key, you might want to run the content through a module that removes terms which should not be listed due to security clearance levels.

This is where we start to solve business pains and extract real value from the content. By modifying, cleaning and enriching the data we can build much more sophisticated search solutions tailoring real business needs in the organization.

Of course this is not as easy as it sounds. It requires you to actually analyze the business domain, the business content (text and metadata) and business processes. Then you must try to map your findings to an information model which resembles how employees work with and consume data.

In the information model below, which was developed for an oil drilling company, the black boxes are pivotal entities around which all information revolves. The grey boxes above are instance types of the entities, while the darker grey boxes below are metadata and keys linking the different entities.

Without an information model it is hard to know how your users think about their content and how they navigate it. By interviewing employees throughout the organization for whom you are creating a search solution you will gain insight in how to create the model and you can adapt your governance plan for search accordingly.

You will quickly find that no two businesses are alike and the navigation axes are very different from the out of the box refiners.

Once the content is structured we need to get it back out, or query it. Continue to Reason 3.

Reason 3: Advanced Query Capabilities

The advanced query capabilities of FAST for SharePoint covers several technical features listen in the comparison chart: advanced sorting, contextual search, tunable relevance, and multiple rank profiles. Many of these features are accessed via the FAST Query Language (FQL). FQL is a query language providing advanced query capabilities against textual content, much like SQL allows you to query a relational database.

FQL enables fuzzy searches where you require words to be within a certain distance from each other or in a certain order, it allows search terms to be modified with lemmatization (expanding word forms in a linguistic fashion – eg. good-better-best), and you can boost or reduce the relevance score of items based on rules specific for your business.

Using FQL is not for the end-user, but it is a powerful tool IT Pros and developers can use when customizing or developing search applications and experiences.

FAST for SharePoint allows you to create dynamic search scopes, much like audiences on web parts, where you can filter or promote content for defined user roles or groups. As an example, out of the box Excel documents will be demoted for all users. If you have users who only work with Excel, you might want to create a specific rule for them, promoting Excel documents above anything else. Users in the marketing department might favor content produced by fellow colleagues, so you should boost content created by employees in the same department as the user executing the search.

Creating these rules directly links to the work you do when analyzing the content, and over time you end up going in a circle with the content and the queries you need to perform on it.

Being able to turn the following business needs into queries is now within your reach:

“I want to see all invoices, contracts and mail correspondence for customer X most relevant to me”

“Sort products by largest gross margin, but also favor those who have been in inventory more than 6 months”

“Find all documents where the words market and volatile appear in the same sentence”

Conclusion

Because it is now possible to analyze content during indexing, adding valuable metadata which will capture conceptual content and meaning from an otherwise unstructured collection of text, and then querying it in the manner the user expects, you are now writing search queries which target specific business needs, and not a general all-purpose search page.

I am not proclaiming the general search page we all know and love as dead, because it is a great starting off point, and indeed where most companies start their venture into more advanced search applications. But when you start to think outside the search box and add the power of FAST Search for SharePoint to your toolbox, you will have the power to create even better business solutions for your customers.

Hopefully I have managed to show how the technical features of FAST for SharePoint will allow you to create even better search solutions, and perhaps I have sparked some new ideas along the way. Creating the best search solutions for your customers is by no means an easy task: it requires planning, insight and a toolset to match, which you now have at your disposal within the realm of SharePoint 2010.