|
|
|
|
|
Web Crawling
Author(s): Christopher Olston;Marc Najork
Source: Journal:Foundations and Trends® in Information Retrieval ISSN Print:1554-0669, ISSN Online:1554-0677 Publisher:Now Publishers Volume 4 Number 3,
Document Type: Article Pages: 72 (175-246) DOI: 10.1561/1500000017
Abstract: This is a survey of the science and practice of web crawling.
While at first glance web crawling may appear to be merely an application
of breadth-first-search, the truth is that there are many challenges
ranging from systems concerns such as managing very large data structures
to theoretical questions such as how often to revisit evolving content
sources. This survey outlines the fundamental challenges and describes
the state-of-the-art models and solutions. It also highlights avenues
for future work.
|
|
|
|