Search Engine Basics   «Prev  Next»

Lesson 3 Information retrieval services
ObjectiveDescribe the different categories of information retrieval services.

Information Retrieval Services

The information available online today is incredibly diverse, varying not only in subject matter, but also in format. Accordingly, the search services available reflect this diversity. Some are automated, and some rely on you to work your way through categories.
The first step, then, is to learn what kinds of information retrieval services are available to you.
Let us take a quick tour of representative sites in the main categories of information retrieval services before discussing how to search them:


Directories are also known as Web catalogs and are compiled and maintained by human editors and researchers.
Another characteristic of a directory is a list of categories in hierarchical order (most broad to most specific). You can "search" by clicking on a category and making selections from categories that are increasingly specific.
Some examples of directories are:
Directories carry weight and can hurt your rankings. This section will mainly focus on general directories, but the end of this section does include a few tips on how to find niche directories (or even other general directories) and how to determine if they are worth getting a listing on.

Yahoo Directory

The Yahoo Directory was started in 1994 under the name "Jerry and David's Guide to the World Wide Web" but in 1996 became Yahoo. At the time Yahoo was primarily a directory with search functionality and (interestingly) neither SEO nor Internet Marketing were even categories at the time.
Through the late 1990s Yahoo pushed to become a web portal and in 2000 even signed a deal with Google that would see Google power Yahoo's search functionality. Their focus at the time was to acquire users through acquisitions such as GeoCities, bringing more people into their portal and keeping them there. Unfortunately Yahoo did not have the same user loyalty that Apple does and the walled-garden approached failed as users Googled their way out of the Yahoo network of sites.

  2. Yahoo!
Directories are discussed in more detail in the next module. For now, take a quick look at the main page (or home page) of each directory and note the number and type of categories that each offers.
Question: Do any seem more inviting to use?
Is one directory organized in a manner that makes more sense to you?
Or do they seem visually identical to each other?
Clicking on any of these links will open the Web site in a separate browser window, so you can switch between the lesson and the website.

Automated Information Retrieval (IR)

Automated information retrieval (IR) systems were originally developed to help manage the huge scientific literature that has developed since the 1940s. Many university and public libraries now use IR systems to provide access to books, journals, and other documents. Commercial IR systems offer databases containing millions of documents in myriad subject areas. Dictionary and encyclopedia databases are now widely available for PCs. Information Retrieval has been found useful in such disparate areas as office automation and software engineering. Indeed, any discipline that relies on documents to do its work could potentially use and benefit from IR.
An IR system matches user queries to documents stored in a database. A document is a data object, usually textual, though it may also contain other types of data such as photographs, graphs. Often, the documents themselves are not stored directly in the IR system, but are represented in the system by document surrogates. This web page is a document and could be stored in its entirety in an IR database. One might instead, however, choose to create a document surrogate for it consisting of the title, author, and abstract. This is typically done for efficiency to reduce the size of the database and searching time.
An IR system must support certain basic operations. There must be a way to enter documents into a database, change the documents, and delete them. There must also be some way to search for documents, and present them to a user. IR systems vary greatly in the ways they accomplish these tasks.

  1. Tour Search Engines
  2. Tour Metasearch Engines
  3. Tour Subject pages and Link pages

The terms search engine and engine to refer to any service that allows you to compose your own search query. Any service that provides a compiled directory or allows you to perform searches is called a search service or information retrieval service.
As you now are aware, very often the question you will ask before beginning a search is not "Where do I find a search site?" but, rather, "Which one of all these services do I start with?" This is not a trivial question. A directory, with its smaller number of hand-selected sites, may be more immediately useful than a search engine if you are searching for beginning-level information on a popular topic. A search engine, by its continual automated Web-roaming, may be more useful if you are looking for very specific information or an obscure topic.

Information Retrieval Models

Modeling in Information Retrieval is a complex process aimed at producing a ranking function
A ranking function is a function that assigns scores to documents with regard to a given query .
This process consists of two main tasks:
  1. The conception of a logical framework for representing documents and queries
  2. The definition of a ranking function that allows quantifying the similarities among documents and queries

Modeling and Ranking

Information Retrieval systems usually adopt index terms to index and retrieve documents
Index term:
  1. In a restricted sense: it is a keyword that has some meaning on its own; usually plays the role of a noun
  2. In a more general form: it is any word that appears in a document

  1. Retrieval based on index terms can be implemented efficiently
  2. Also, index terms are simple to refer to in a query
  3. Simplicity is important because it reduces the effort of query formulation

Retrieval Service Categories

Click the link below to review the categories of information retrieval services just viewed.
Retrieval Service Categories