Tuesday, May 29, 2012

How Search Engine Works

Search Engine Working Process

1. Web Crawling
2. Caching
3. Indexing
4. Searching


1. Web Crawling- Search Engine works by storing information about web pages. Crawlers are used to create a copy of visited web pages for search engines to provide fast searches. We can use robots.txt to prevent  crawlers to crawl web pages. Different search engines use different processes of crawling.

Search Engine        Crawlers
Google                  Google-bot
Yahoo                    yahoo! Slurp
Msn                       MSN bots
Altavista                Alta Vista bot
Ask                      Teoma bots

2. Caching-  Web cache is like a mediator between web server(main server) and client server and watches requests come by and saving the copies of responses such as images, html pages. Then If there is any request for same URL,  they can response that instead of asking main server again.

Types of Cache:-

1. Browser Cache 2. Proxy Cache 3. Gateway Cache

3. Indexing- Web Indexing is a process in which search engine store data to retrieve fast and accurate information. For example an index of 1000 documents can take milliseconds, but a sequential  scan take hours.

Three Distinct Parts Of Web Indexing-

1. Web crawler finds and fetches web documents.
2.Indexer sorts every word on every document and stores resulting index of words in huge database.
3.The query processor, which compares your search query to the index and recommends the documents that considers most relevant.

4. Searching- Web search is a query that user put into search engine to take most relevant information.

No comments:

Post a Comment