Working with some bright undergraduates, we submitted a paper to the 11th Workshop on Algorithms and Models for the Web Graph (WAW’15). It is a pity I cannot make it this year owing to other responsibilities, especially since I have never been to the Netherlands. I haven’t missed WAW for many years. In fact, I have co-chaired it for the last three years!
In the paper, we consider a simplified model for crawlers. Crawlers are autonomous programs that map out the link structure of complex networks such as the web graph. The big data retrieved by crawlers is indexed for use in search engines like Google or Bing. Our motivation was the notion of a walk in a graph; imagine a particle moving from vertex-to-vertex along edges. Vertices and edges can be repeated, but the assumption is that every vertex is visited. As such, the walks we consider generalize Hamilton walks, which traverse each vertex exactly once. That is the best scenario for a crawler, but not realistic. The crawler could revisit the same vertex (that is, web page) several times in its walk.
In our model, vertices are initially assigned unique non-positive integer weights from 0,-1,-2, …, -n+1, where n is the order of the graph. The robot crawler starts at the dirtiest vertex (that is, the one with the smallest weight), which immediately gets its weight updated to 1. Then at each subsequent time-step it moves greedily to the dirtiest neighbour of the current vertex. On moving to such a vertex, we update the weight to the positive integer equalling the time-step of the process. The process stops when all weights are positive (that is, when all nodes have been cleaned).
The robot crawler numbers of a graph are defined as the minimum, maximum, or average number of steps it takes to visit each node over all the possible weightings. For example, the minimum number of steps for G is denoted by rc(G). Here, rc(G) = n precisely when G has a Hamilton path.
We give values for these parameters for paths, trees, and complete multi-partite graphs. We also considered the robot crawler on random graphs and preferential attachment graphs.
As I mentioned, four of the co-authors are undergraduate students, including one from Mexico City studying with me as part of the MITACS Globalink program.