Search Engines

1. Introduction: A search engine is a program or information retrieval system designed to help one in retrieving a list of references or information, meeting a specific criterion from its own databases that are stored on a computer. The computer may be a public server on the World Wide Web, a computer inside a corporate or proprietary network, or a personal computer.

 

2. History of Search Engine: The earliest Internet search engine was Archie, which was created in 1990 by Alan Emtage a student at Mc Gill University in Montreal for anonymous FTP sites. This is the grandfather of all search engines. In 1993, the University of Navada System Computing Service group developed Veronica, which was created as a type of searching device similar to Archie but for gopher files. This is treated as the grandmother of search engines.

In June 1993, Matthew Gray, then at MIT, produced what was probably the first web robot, the Perl-based World Wide Web Wanderer, and used it to generate an index called “Wandex”. The purpose of the Wanderer was to measure the size of the World Wide Web, which it did until late 1995. The web's second search engine Archie-Like Indexing on the Web (Aliweb) appeared in November 1993 due to the effort of Martgn Koster. Aliweb did not use a web robot, but instead depended on being notified by website administrators of the existence in the index file of each site in a particular format.

 

3. Component of a Search Engine: A general search engine typically functions by considering three components.

a) Crawler / Spider / Robots: The Web crawler or spider or robot is a computer program. Web crawling is the process of locating, fetching, and storing Web pages. It starts from a seed pages to locate new pages by parsing the downloaded pages and extracting the hyperlinks within. Extracted hyperlinks are stored in a First In, First Out (FIFO) fetch queue for further retrieval. Crawling continues until the fetch queue gets empty or a satisfactory number of pages are downloaded. Each time a spider visits a web page it scans all the text and follows every link it sees.

Some search engines such as Google store all the scan pages but some other store only the words of the scan pages in an ever increasing databases. Theses store pages are known as cached pages. The contents of each page are then analyzed and it catalogues the URL and a list of words in an index database for use in later queries.

b) Indexer: The downloaded content is concurrently parsed by an indexer and transformed into an inverted index. It represents the downloaded collection in a compact and efficiently queryable form. The indexes are regularly updated to operate quickly and efficiently. The database of search engine is most often created by spiders or robots automatically.

c) Query Processor: The query processor is responsible for evaluating user queries and returning to the users the pages relevant to their query. The search engine allows one to ask for content meeting specific criteria (typically those containing a given word or phrase) into a search “box”. When a user makes a query typically by giving keywords the engine looks up the index and provides a listing of the best matching web page according to its criteria, usually with a short summary containing the document’s title and sometimes a part of the text. The list is often sorted with respect to some measure of relevance of the results. Because these databases are very large, search engines often return thousands of results.

4. Ranking of Site at Search Engine: Best matches and what order the results should be shown in, varies widely from one search engine to another. The method also changes over time as internet usage changes and new techniques evolve.

People at large accepted Google to be more useful in retrieving the relevant result. Google is based on Page rank. Page rank is based on citation analysis that was developed in the 1950s by Dr. Eugene Garfield at the University of Pennsylvania. The page rank takes into consideration how many other websites and web page linking pages and the number of links on these pages contribute to the page rank of the linked page. This makes it possible for Google to order its results by how many website links to each found page.

The researchers at NEC Research Institute claim to have improved upon Google’s patented page rank technology by using web crawler to find “Communities” of website. This technology instead of ranking pages uses an algorithm that follows link on a webpage to find other pages that link back to the first one and so on from page to page. The algorithms “remember” where it has been and index the number of cross links and relate these into grouping. In this way virtual communities of web pages are found.

 

5. Types of Search Engines: Configurable Unified Search Index (CUSI) search engines, like All-in One Search Page and W3 Search Engines are pages which list search engines.

a) Based on Coverage of Information: The search engines can be categorized based on the coverage as-

i) Web Search Engine: It searches for information on the public Web.

ii) Enterprise Search Engines: It searches on intranets.

iii) Personal Search Engines: It searches individual personal computers.

iv) Custom Search Engine: It searches within the contents defined by the user(s).

v) Meta Search Engine: It searches the search engines.

b) Based on Type of Contents: Based on the contents that are considered for search, search engine can be-

i) General Purpose Search Engine: It searches all types of contents over the web. Example: Google (http://www.google.com).

ii) Discussion Group Search Engine: It searches only discussion groups. Example: Google groups (https://groups.google.com/), Yahoo groups (https://in.groups.yahoo.com/) etc..

iii) Blog Search Engines: It searches only Blogs. Example: Blogspot Blog Search (https://www.searchblogspot.com), etc

iv) Image Search Engine: It searched the image or photograph. Example: Google images (http://images.google.co.in).

v) Maps  Search Engine: It searches the maps. Example: Google maps (https://www.google.co.in/maps/).

vi) Video Search Engine: It searches the videos. Example: AOL On (https://www.aol.com/video), Google videos (https://www.google.com/videohp), YouTube (https://www.youtube.com), etc.

vii) News Search Engine: It searches the news. Example: Google News (https://news.google.com).

viii) Books Search Engine: It searches for electronic books or printed books. Example: Google books (http://books.google.com).

ix) Subject Directory Search Engine: It searches the list of websites. Example: DMOZ.org (https://dmoz-odp.org/), Looksmart (http://www.looksmart.com/), etc.

c) Others: Some other types of search engines are-

i) Crawler Based Search Engine: In crawler based search engine, the robot program indexes the entire content of pages including the titles, text, and links and so on. Example: Google (http://www.google.com).

ii) Human-Powered Search Engine: The Human-Powered Search engines search the pages or websites that are collected for index by the human. Examples: Anoox (http://www.anoox.com), etc.

iii) Natural Language Search Engine: A natural language search engine provide an interface where user can ask questions as opposed to keyword search and the search engine returned targeted answers to user questions. Example: Ask (http://www.ask.com), etc.

iv) Personalized Web Search: Personalized search refers to search experiences that are tailored specifically to an individual's interests by incorporating information from the browser cookie record or other about the individual beyond specific query provided. Google developed a personalized web search whereby the user can set up a profile and retrieve the results based on their interests. The web search result of Google and Bing are personalized.

v) Real Time Search Engine: In a real time search engine, for each user query an individual crawl is started over the fresh copies of the Web document i.e the original one but not the cached one, and up-to date versions of the relevant pages are selected. In a real time search engine a parallel and distributed system enables sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users’ quality-of-service requirements. Example: Social Mentions Search (https://www.social-searcher.com/social-mention/).

  6. Examples of Search Engine: Some of the popular types of search engines, which create new milestone in the origin and development of search engines, are discussed below

a) Lycos: Lycos (http://www.lycos.com) was started at Carnegi Mellon University as a research project in 1994 and it was one of the first engines. It ceases crawling the web for its own listing in April 1999 and instead uses crawler based results provided by Fast i.e All the Web.com. Now it is owned by Terra Lycos, a company formed with Lycos and Terra Networks merged in October 2000.

b) Yahoo: Yahoo (http://www.yahoo.com), the huge subject tree was started by two Stanford graduate students David Flo and Jerry Yang  in January 1994 as Jerry and David's guide to the World Wide Web. In March 1995 it become Yahoo!. In 2004 Yahoo launched its own search engines based on the combined technologies of its acquisition and providing a service that gave pre-eminence to the web search engine over the directory.

c) Ask: Ask.com (http://www.ask.com) is a question answering-focused web search engine founded in 1995 by Garrett Gruener and David Warthen in Berkeley, California. It lets one to search by asking questions and being responded with what seemed to be the right answer to everything. i.e it can be said that it delivers search results based on one’s question.

d) Google: Google (http://www.google.com) was originally a Stanford University project by student Larry Page and Sergey Brain called Back Rub. In 1998 the name had been changed to Google and the project jumped off campus and became private company. Its success was based in part on the concept of link popularity and page rank which is very adept at returning the relevant results. Finally, unlike other search engines, Google offers a cached copy of each result.

e) Bing: Bing (http://www.bing.com) known previously as Live Search, Windows Live Search, and MSN Search is a web search engine from Microsoft. Bing was released on June 1, 2009. Bing is available in many languages and has been localized for many countries. Even if the language of the search and of the results is the same, Bing delivers substantially different results in different parts of the world.

 

7. Importance of Search Engine: Search engines are the most popular destination on the internet. However, to use search engines effectively, it is essential to apply techniques that narrow the results and push the most relevant pages to the top of the results list. The importance of search engines are as follows-

a) Starting Point: When people are looking for something online they go to a search engine first as because it become increasingly difficult to remember the vast amount of information we come across daily.

b) Provide Different Point of Access: Search engine provide different access point to locate and use a particular website by entering any known information about the site. Without search engine, to try to find what you need can be like finding a needle in a massive packet of rice.

c) Accessing Dead Sites: The cached pages maintained by some search engines are very useful when the content of the web page has been updated and the search terms are no longer in it, or the web page is no longer available or the site’s server is down. So, the cache pages of the search engine can be used when a particular website is withdrawn or no longer available elsewhere.

 

8. Conclusion: Now-a-days, we have thousands of search engines for searching over internet. Each of the search engines makes an appearance over the web; continues for some time, then the new one emerges and the old one falls to decay and disuse. Some of the well known search engines are Google (http://www.google.com), Yahoo! (http://www.yahoo.com), Bing (http://www.bing.com), and Ask/Aj/Ask Jeeves (http://www.ask.com), Excite (http://www.excite.com), etc.


How to Cite this Article?

APA Citation, 7th Ed.:  Barman, B. (2020). A comprehensive book on Library and Information Science. New Publications.

Chicago 16th Ed.:  Barman, Badan. A Comprehensive Book on Library and Information Science. Guwahati: New Publications, 2020.

MLA Citation 8th Ed:  Barman, Badan. A Comprehensive Book on Library and Information Science. New Publications, 2020.

Comments