Theory of Probability and Mathematical Statistics
(Teoriya Imovirnostei ta Matematychna Statystyka)
PAGERANK FOR NETWORKS, GRAPHS AND MARKOV CHAINS
C. ENGSTROM, S. SILVESTROV
Abstract: In this work it is described how a partitioning of a graph into components can be used to calculate PageRank in a large network and how such a partitioning can be used to re-calculate PageRank as the network changes. Although considered problem is that of calculating PageRank, it is worth to note that the same partitioning method could be used when working with Markov chains in general or solving linear systems as long as the method used for solving a single component is chosen appropriately. An algorithm for calculating PageRank using a modied partitioning of the graph into strongly connected components is described. Moreover, the paper focuses also on the calculation of PageRank in a changing graph from two dierent perspectives, by considering specic types of changes in the graph and calculating the dierence in rank before and after certain types of edge additions or removals between components. Moreover, some common specic types of graphs for which it is possible to nd analytic expressions for PageRank are considered, and in particular the complete bipartite graph and how PageRank can be calculated for such a graph. Finally, several open directions and problems are described.
Keywords: PageRank, random walk, Markov chain, graph, strongly connected component.
1. Google web graph. Google programming contest, 2002, http://snap.stanford.edu/data/web-Google.html.
2. F. Andersson and S. Silvestrov, The mathematics of Internet search engines, Acta Appl. Math. 104 (2008), 211-242.
3. K. E. Avrachenkov, J. A. Filar, and P. G. Howlett, Analytic Perturbation Theory and Its Applications, SIAM, Philadelphia, PA, 2013.
4. A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, Pagerank computation and the structure of the web: Experiments and algorithms, Proceedings of the Eleventh International Conference on World Wide Web, Alternate Poster Tracks, 2002.
5. L. Becchetti and C. Castillo, The distribution of pagerank follows a power-law only for particular values of the damping factor, Proceedings of the 15th international conference on World Wide Web, ACM Press, 2006.
6. R. Bellman, Introduction to Matrix Analysis, Classics in Applied Mathematics, Society for Industrial and Applied Mathematics, 1970.
7. A. Berman and R. Plemmons, Nonnegative Matrices in the Mathematical Sciences, No. del 11 in Classics in Applied Mathematics, Society for Industrial and Applied Mathematics, 1994.
8. S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Proceedings of the Seventh International Conference on World Wide Web (Amsterdam, The Netherlands, 1998), Elsevier Science Publishers B. V., 107-117.
9. D. Dhyani, S. S. Bhowmick, and W.-K. Ng, Deriving and verifying statistical distribution of a hyperlink-based web page quality metric, Data Knowl. Eng. 46, no. 3 (Sept. 2003), 291-315.
10. C. Engstrom and S. Silvestrov, A componentwise pagerank algorithm, Applied Stochastic Models and Data Analysis (ASMDA 2015), The 16th Conference of the ASMDA International Society, 2015, 186-199.
11. C. Engstrom and S. Silvestrov, Non-normalized pagerank and random walks on n-partite graphs, 3rd Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2014), 2015, 193-202.
12. C. Engstrom and S. Silvestrov, Using graph partitioning to calculate pagerank in a changing network, 4th Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2016), 2016, 155-164.
13. C. Engstrom and S. Silvestrov, Calculating pagerank in a changing network with added or removed edges, International Conference in Nonlinear Problems in Aviation and Aerospace (ICNPAA), AIP Conference Proceedings, vol. 1798, 2017.
14. C. Engstrom and S. Silvestrov, Generalisation of the damping factor in PageRank for weighted networks, Modern Problems in Insurance Mathematics (D. Silvestrov and A. Martin-Lof, eds.),EAA series, Springer, Cham, 2014, 313-333.
15. C. Engstrom and S. Silvestrov, PageRank, a look at small changes in a line of nodes and the complete graph, Engineering Mathematics II. Algebraic, Stochastic and Analysis Structures for Networks, Data Classication and Optimization (S. Silvestrov and M. Rancic, eds.), Springer Proc. Math. Stat., vol. 179, Springer, Heidelberg, 2016, 223-247.
16. C. Engstrom and S. Silvestrov, PageRank, connecting a line of nodes with a complete graph, Engineering Mathematics II. Algebraic, Stochastic and Analysis Structures for Networks, Data Classication and Optimization (S. Silvestrov and M. Rancic, eds.), Springer Proc. Math. Stat., vol. 179, Springer, Heidelberg, 2016, 249-274.
17. F. Gantmacher, The Theory of Matrices, Chelsea, 1959.
18. M. Gyllenberg and D. S. Sivestrov, Quasi-Stationary Phenomena in Nonlinearly Perturbed Stochastic Systems, De Gruyter Exp. Math., vol. 44, Walter de Gruyter, Berlin, 2008.
19. T. Haveliwala and S. Kamvar, The second eigenvalue of the google matrix, Technical Report 2003-20, Stanford InfoLab, 2003.
20. H. Ishii, R. Tempo, E.-W. Bai, and F. Dabbene, Distributed randomized pagerank computation based on web aggregation, Decision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference, CDC/CCC 2009, Proceedings of the 48th IEEE Conference, 3026-3031.
21. S. Kamvar and T. Haveliwala, The condition number of the pagerank problem, Technical Report 2003-36, Stanford InfoLab, June 2003.
22. S. Kamvar, T. Haveliwala, and G. Golub, Adaptive methods for the computation of pagerank, Linear Algebra Appl. 386 (2004), 51-65.
23. V. S. Koroliuk and N. Limnios, Stochastic Systems in Merging Phase Space, World Scientic, Singapore, 2005.
24. P. Lancaster, Theory of Matrices, Academic Press, 1969.
25. A. N. Langville and C. D. Meyer, A reordering for the pagerank problem, SIAM J. Sci. Comput. 27, no. 6 (Dec. 2005), 2112-2120.
26. C. P. Lee, G. H. Golub, and S. A. Zenios, A two-stage algorithm for computing pagerank and multistage generalizations, Internet Mathematics 4 (2007), 299-327.
27. J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection, http://snap.stanford.edu/data, June 2014.
28. D. Silvestrov and S. Silvestrov, Asymptotic expansions for stationary distributions of perturbed semi-Markov processes, Engineering Mathematics II. Algebraic, Stochastic and Analysis Structures for Networks, Data Classication and Optimization (S. Silvestrov and M. Rancic, eds.), Springer Proc. Math. Stat., vol. 179, Springer, Heidelberg, 2016, 151-222.
29. D. Silvestrov and S. Silvestrov, Asymptotic expansions for stationary distributions of non-linearly perturbed semi-Markov processes. I, II (2016). Part I: arXiv:1603.04734, Part II: arXiv:1603.04743.
30. D. Silvestrov and S. Silvestrov, Nonlinearly Perturbed Semi-Markov Processes, SpringerBriefs Probab. Math. Stat., 2017. (To appear.)
31. R. Tarjan, Depth rst search and linear graph algorithms, SIAM J. Comput. 1 (1972), no. 2, 146-160.
32. Q. Yu, Z. Miao, G. Wu, and Y. Wei, Lumping algorithms for computing google's pagerank and its derivative, with attention to unreferenced nodes, Information Retrieval 15 (2012), no. 6, 503-526.