Sunday, January 18, 2004

Searching for ... truth?

Search engines are an incredible invention. I don't think many of us could locate anything on the internet (not specifically associated with a brand name) without them. As more pages are indexed, and as designers become more proficient at designing appropriate meta-tags, web pages are easier to find than letting your fingers do the walking.

But there is one domain in which traditional search engines are weak... categorization of blogs info. Because many blogs have archives that contain many unrelated posts, search engines that look for combinations of words are fooled into selecting totally irrelevant pages, simply because one word from a Monday post, plus one word from a Tuesday post, plus one word from a Wednesday post happen to match the three-word search phrase.

Since most blogs are either hosted on specific blog domains, or use specific tools (and associated templates) - all of which should be parseable by a sophisticated search spider, I believe the technology already exists for blogs to be properly indexed on a post-by-post basis, and not just on a document/page basis.

So, why this particular post? Because someone found this site by searching for a particularly offensive search phrase. And because my blog is archived on a monthly basis, all of my posts for a particular month were combined into a single document which happened to contain the "f" word (in a context related to being taken advantage of by a retail establishment), along with other non-offensive words that, by coincidence, happen to have been used (individually) over the course of the month. The resulting, very offensive phrase, was reported as a "match". Not only a match, but a match that appeared in the first page of search results!

That's just not right!

No comments: