Making Content Findable Through SEO

Most users arrive at government web sites by using search engines so it makes a lot of sense to prepare content so that search engines can find it. It's so important that there is a cottage industry devoted to the science and art of Search Engine Optimization (SEO).

Here at GilsUtah we've been crawling and indexing Utah state agency and local government websites now for over year and we've discovered some all-too-common practices that present barriers to spiders. We've found that the content of some agencies is almost entirely blocked. In other cases UtahGov Search, Google, and others can retrieve some content, but only after manual intervention.

Here are some common problems preventing public access:

(1) Linking within javascripts. There are right ways and wrong ways to do this. Unfortunately, most of the content of an entire branch of government, an entire department, and many division sites is being missed because of this. The trend seems to be worsening.

(2) Creating urls with question marks. Some database and content management systems create dynamic urls with question marks. While most search engines provide a workaround, the workaround can cause other problems. There are usually ways for webmasters to manipulate their scripts to create static looking html urls that are both search engine friendly and easier for users to remember and bookmark.

(3) HTTPS protocol. Secure Socket Layers (SSL) cannot be penetrated by search engines. Agencies sometimes use this for publications and areas when they don't need to. Limit SSL to your financial transactions and other uses where encryption is necessary.

(4) File naming. You'd be surprised how often content creators include spaces in file names. The search engine retrieves them, but inserts "%20" as the escaped encoding for the US-ASCII space character. Users often find that the resulting links are bad or that the urls have become cryptic and undecipherable. It's not advisable, but if necessary use underscores or other unreserved characters like such as - ! and . instead of spaces and avoid other reserved characters like these: & : = / ; ? + and $.

(5) Directory hierarchies. Some agencies dump their entire content, including images and scripts, into a single directory. You should create subdirectories for administrative functions or programs that naturally lend themselves to being in their own directory. This aids search engine crawling and rule writing.

(6) No site map. It's amazing the number of sites that still lack site maps. Every site should have a site map linked (using a static A HREF link) from at least the homepage. This helps get around the javascript linking problem, and site maps can be used to as crawling starting pages.

(7) Use robots.txt files, appropriately. All search engines respect robots.txt files. If you want directories and files excluded, use robots.txt (or .htaccess protection) instead of hiding resources or limiting them to the innerweb. Be careful, though. I can think of at least one agency whose important services are inaccessible because of an improper use of robots.txt.

What we hope to do in the coming year is to create a dialogue amongst agency webmasters and content creators to come up with best practices for optimizing our sites for search engines. We'll be offering workshops here at the Utah State Library and creating an easy to use and open knowledge base and code library of some nature so that we can share our discoveries and communicate.

Some of this gets technical beyond my experience, so I'll need your help. For starters, you can leave comments here with this story or links to resources that you've found helpful. Please contact us at the Government Information Locator Service with suggestions or to let us know that you're someone that we should get with. You can also subscribe here to receive helpful news by email.

Posted by Ray_Matthews on October 01, 2003 at 10:24 AM | Permalink | Comments (1) | Send this story to a friend!

I would also suggest using frames is a bad thing for SEO. My site is a parody on how to do bad SEO.

Posted by: black hat SEO at December 22, 2003 06:13 AM