I Like it, I Love it, I Want Some More of It

In information retrieval some words are powerful / potent. They are really descriptive and get right to the point of what someone is looking for. Other words have little to no value. The reason the concept of stop words came about is that you really couldn't tell much about a document by it including words like a, an, the, and, are, etc. The flip side of stop words are words which have a high discrimination value. Recently I was searching to see if there was a FedEx office in the town where my mom lives, and in spite of there not being one, Google still returned multiple pages (the home page and the store locator page) from the FedEx.com website in the search results. That was a great search result, and Google was smart to place more weight on the core concept word in the search (FedEx) while placing less weight on the location.

Words which have a low discrimination value may have a higher discrimination value when combined with neighboring words. Hot and dog might have a different meaning when they are next to each other. As explained in this Wired article:

Take, for instance, the way Google

Leave a Comment

Your email address will not be published. All fields are required.