Searching for US States and Canadian Province Postal Abbreviations

If you want to use a text type dimension to search for addresses that include standard Post Office abbreviations for States and Provinces, you'll need to create an exclusion so that these words are not removed by applying stop words or other text analysis features.

The word set for US States looks like this:

<wordset id="USStates">
 AL AK AS AZ AR CA CO CT DE DC FM FL GA GU HI ID IL IN IA KS KY LA ME MH MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND MP OH OK OR PW PA PR RI SC SD TN TX UT VT VI VA WA WV WI WY
</wordset>

The word set for Canadian Provinces looks like this:

<wordset id="CAProvinces">
  AB BC MB NB NL NT NS NU ON PE QC SK YT
</wordset>

The default English stop word set is:

<wordset id="defaultEnglish">
  a and are as at be but by for if in into is it no not of on or s such t that the their then there these they this to was will with
</wordset>

The changeset data for each item looks something like this:

Item 1:
Name: Urban Digby
Description: Life as we know it.
Address: 34 West Street
City: New York
State: NY

Item 2:
Name: Digby in Indiana
Description: Any way you like it.
Address: 14 Cactus Lane
City: Fort Wayne
State: IN

Item 3:
Name: American Samoa Digby
Description: Do what you want to do.
Address: 1455 Centurion Avenue
City: Paradise
State: AS

The dimension looks something like this:

<dimensions ignoreCase="true" accentFolding="true" storeOriginalWord="true">
  <dimension id="general_search" type="text">
    <field id="Name" />
    <field id="Description" />
    <field id="Location" key="Address,City,State" fieldPositionIncrement="0" noAnalysis-ref="USStates, CAProvinces" />
  </dimension>
</dimensions>

How it Works

The noAnlaysis attribute specifies a word set that is applied verbatim to the text to be indexed -- the words are not changed at all, so any exact match to a word in the text to be indexed will be maintained as-is. Any non-matches will be further analyzed and subject to stop word removal.

Example with Sample Data

"AS" is the abbreviation for American Samoa, "IN" is the abbreviation for Indiana and "OR" is the abbreviation for Oregon. All three of these words are also English stop words.

Without using noAnalysis word sets, these three words would be changed to lowercase and then matched against the stop word list for English and removed altogether. That is problematic because then you couldn't search for "Fort Wayne, IN" or "Salem, OR".

Note: The words must exactly match the noAnalysis word. For example, "or" and "Or" and "oR" will not match, be converted to "or" and then thrown out because they matched a stop word.