Stop Thinking, Just Do!

Sungsoo Kim's Blog

PageRank History

tagsTags

28 April 2014


Article Source

  • Title: PageRank
  • Source: From Wikipedia, the free encyclopedia

PageRank

페이지랭크(PageRank)는 월드 와이드 웹과 같은 하이퍼링크 구조를 가지는 문서에 상대적 중요도에 따라 가중치를 부여하는 방법이다. 이 알고리즘은 서로간에 인용과 참조로 연결된 임의의 묶음에 적용할 수 있다.

페이지랭크는 스탠퍼드 대학교에 재학 중이던 래리 페이지와 세르게이 브린이 새로운 검색 엔진에 대한 연구 기획의 일부로 개발한 것이다. 이 기획은 1995년 시작되어, 1998년 구글이라 불리는 시범 서비스로 발전하였다. 페이지와 브린은 페이지랭크에 기반한 검색 기술을 바탕으로 구글 사를 설립하였다.


“Google search algorithm” redirects here. For other search algorithms used by Google, see Google Penguin and Google Panda.

Mathematical PageRanks for a simple network, expressed as percentages. (Google uses a logarithmic scale.) Page C has a higher PageRank than Page E, even though there are fewer links to C; the one link to C comes from an important page and hence is of high value. If web surfers who start on a random page have an 85% likelihood of choosing a random link from the page they are currently visiting, and a 15% likelihood of jumping to a page chosen at random from the entire web, they will reach Page E 8.1% of the time. (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have PageRank zero. In the presence of damping, Page A effectively links to all pages in the web, even though it has no outgoing links of its own.

PageRank is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page,[1] one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google:

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

Facts about Google and Competition

It is not the only algorithm used by Google to order search engine results, but it is the first algorithm that was used by the company, and it is the best-known.[2][3] Google uses an automated web spider called Googlebot to actually count links and gather other information on web pages.


페이지 랭크는 더 중요한 페이지는 더 많은 다른 사이트로부터 링크를 받는다는 관찰에 기초하고 있다. 예를들어 페이지 A가 페이지 B,C,D 로 총 3개의 링크를 걸었다면 B는 A의 페이지 랭크 값의 1 \over 3 만큼을 가져온다.

또한 페이지 랭크에서는 랜덤 서퍼(Random Sufer)라는 페이지를 임의로 방문하며 탐색하는 모델을 가정한다. 이 모델에서는 위 예의 A페이지를 방문한 서퍼는 A페이지를 보고 만족하여 탐색을 중단하거나, 혹은 A페이지에서 만족하지 못하여 다른 페이지를 방문할 것이다. 이러한 확률(Damping Factor)을 \alpha라 한다면, B페이지는 \alpha * {1 \over 3}만큼 페이지 랭크를 받게 된다.

페이지 랭크는 이와 같은 방법을 통해 페이지간 페이지 랭크 값을 주고 받는 것을 반복하다보면, 전체 웹 페이지가 특정한 페이지 랭크 값을 수렴한다는 사실을 통해 각 페이지의 최종 페이지 랭크를 계산한다.


History

The idea of formulating a link analysis problem as an eigenvalue problem was probably first suggested in 1976 by Gabriel Pinski and Francis Narin, who worked on scientometrics ranking scientific journals.[6] PageRank was developed at Stanford University by Larry Page and Sergey Brin in 1996[7] as part of a research project about a new kind of search engine.[8] Sergey Brin had the idea that information on the web could be ordered in a hierarchy by “link popularity”: a page is ranked higher as there are more links to it.[9] It was co-authored by Rajeev Motwani and Terry Winograd. The first paper about the project, describing PageRank and the initial prototype of the Google search engine, was published in 1998:[4] shortly after, Page and Brin founded Google Inc., the company behind the Google search engine. While just one of many factors that determine the ranking of Google search results, PageRank continues to provide the basis for all of Google’s web search tools.[10]

The name “PageRank” plays off of the name of developer Larry Page, as well as the concept of a web page.[11] The word is a trademark of Google, and the PageRank process has been patented (U.S. Patent 6,285,999). However, the patent is assigned to Stanford University and not to Google. Google has exclusive license rights on the patent from Stanford University. The university received 1.8 million shares of Google in exchange for use of the patent; the shares were sold in 2005 for $336 million.[12][13]

PageRank was influenced by citation analysis, early developed by Eugene Garfield in the 1950s at the University of Pennsylvania, and by Hyper Search, developed by Massimo Marchiori at the University of Padua. In the same year PageRank was introduced (1998), Jon Kleinberg published his important work on HITS. Google’s founders cite Garfield, Marchiori, and Kleinberg in their original papers.[4][14]

A small search engine called “RankDex” from IDD Information Services designed by Robin Li was, since 1996, already exploring a similar strategy for site-scoring and page ranking.[15] The technology in RankDex would be patented by 1999[16] and used later when Li founded Baidu in China.[17][18] Li’s work would be referenced by some of Larry Page’s U.S. patents for his Google search methods.[19]

References

  1. “Google Press Center: Fun Facts”. www.google.com. Archived from the original on 2009-04-24. 
  2. http://www.google.com/competition/howgooglesearchworks.html.  Missing or empty (help)
  3. Sullivan, Danny. “What Is Google PageRank? A Guide For Searchers & Webmasters”. Search Engine Land. 
  4. Brin, S.; Page, L. (1998). “The anatomy of a large-scale hypertextual Web search engine”. Computer Networks and ISDN Systems 30: 107–117. doi:10.1016/S0169-7552(98)00110-X. ISSN 0169-7552edit
  5. Gyöngyi, Zoltán; Berkhin, Pavel; Garcia-Molina, Hector; Pedersen, Jan (2006), “Link spam detection based on mass estimation”, Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB ‘06, Seoul, Korea), pp. 439–450 .
  6. Gabriel Pinski and Francis Narin. “Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics”. Information Processing & Management 12 (5): 297–312. doi:10.1016/0306-4573(76)90048-0
  7. Raphael Phan Chung Wei (2002-05-16). “Resources”. New Straits Times (Computimes; 2 ed.). 
  8. Page, Larry, “PageRank: Bringing Order to the Web” at the Wayback Machine (archived May 6, 2002), Stanford Digital Library Project, talk. August 18, 1997 (archived 2002)
  9. 187-page study from Graz University, Austria, includes the note that also human brains are used when determining the page rank in Google
  10. “Google Technology”. Google.com. Retrieved 2011-05-27. 
  11. David Vise and Mark Malseed (2005). The Google Story. p. 37. ISBN 0-553-80457-X
  12. Lisa M. Krieger (1 December 2005). “Stanford Earns $336 Million Off Google Stock”. San Jose Mercury News, cited by redOrbit. Retrieved 2009-02-25. 
  13. Richard Brandt. “Starting Up. How Google got its groove”. Stanford magazine. Retrieved 2009-02-25. 
  14. Page, Lawrence](http://en.wikipedia.org/wiki/Larry_Page “Larry Page”); Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The PageRank citation ranking: Bringing order to the Web.  Cite uses deprecated parameters (help), published as a technical report on January 29, 1998 PDF
  15. Li, Yanhong (August 6, 2002). “Toward a qualitative search engine”. Internet Computing, IEEE (IEEE Computer Society) 2 (4): 24–29. doi:10.1109/4236.707687
  16. USPTO, “Hypertext Document Retrieval System and Method”, U.S. Patent number: 5920859, Inventor: Yanhong Li, Filing date: Feb 5, 1997, Issue date: Jul 6, 1999
  17. Greenberg, Andy, “The Man Who’s Beating Google”, Forbes magazine, October 05, 2009
  18. “About: RankDex”, rankdex.com
  19. Cf. especially Lawrence Page, U.S. patents 6,799,176 (2004) “Method for scoring documents in a linked database”, 7,058,628 (2006) “Method for node ranking in a linked database”, and 7,269,587 (2007) “Scoring documents in a linked database”2011
  20. Matt Cutts’s blog: Straight from Google: What You Need to Know, see page 15 of his slides.
  21. Taher Haveliwala and Sepandar Kamvar. (March 2003). “The Second Eigenvalue of the Google Matrix” (PDF). Stanford University Technical Report: 7056. arXiv:math/0307056. Bibcode:2003math……7056N
  22. Gianna M. Del Corso, Antonio Gullí, Francesco Romani (2005). “Fast PageRank Computation via a Sparse Linear System”. Internet Mathematics. Lecture Notes in Computer Science 2 (3): 118. doi:10.1007/978-3-540-30216-2_10. ISBN 978-3-540-23427-2
  23. Arasu, A. and Novak, J. and Tomkins, A. and Tomlin, J. (2002). “PageRank computation and the structure of the web: Experiments and algorithms”. Proceedings of the Eleventh International World Wide Web Conference, Poster Track. Brisbane, Australia. pp. 107–117. 
  24. Massimo Franceschet (2010). “PageRank: Standing on the shoulders of giants”. arXiv:1002.2858 [cs.IR].
  25. Nicola Perra and Santo Fortunato.; Fortunato (September 2008). “Spectral centrality measures in complex networks”. Phys. Rev. E, 78 (3): 36107. arXiv:0805.3322. Bibcode:2008PhRvE..78c6107P. doi:10.1103/PhysRevE.78.036107
  26. Vince Grolmusz (2012). “A Note on the PageRank of Undirected Graphs”. ArXiv 1205 (1960): 1960. arXiv:1205.1960. Bibcode:2012arXiv1205.1960G
  27. Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, Eli Upfal (2012). “Fast Distributed PageRank Computation”. arXiv:1208.3071 [cs.DC,cs.DS].
  28. Google Webmaster central discussion on PR
  29. Fishkin, Rand; Jeff Pollard (April 2, 2007). “Search Engine Ranking Factors - Version 2”. seomoz.org. Retrieved May 11, 2009.  Cite uses deprecated parameters (help)[unreliable\ source?]
  30. Dover, D. Search Engine Optimization Secrets Indianapolis. Wiley. 2011.
  31. Viniker, D. The Importance of Keyword Difficulty Screening for SEO. Ed. Schwartz, M. Digital Guidebook Volume 5. News Press. p 160–164.
  32. “Ranking of listings : Ranking - Google Places Help”. Google.com. Retrieved 2011-05-27. 
  33. https://en.wikipedia.orghttp://en.wikipedia.org/wiki/Google_Directory#Google_Directory
  34. “How to report paid links”. mattcutts.com/blog. April 14, 2007. Retrieved 2007-05-28. 
  35. Jøsang, A. (2007). “Trust and Reputation Systems” (PDF). In Aldini, A. Foundations of Security Analysis and Design IV, FOSAD 2006/2007 Tutorial Lectures. 4677. Springer LNCS
    1. pp. 209–245. doi:10.1007/978-3-540-74810-6
  36. SEOnotepad09_36-0) SEOnotepad. “Myth of the Google Toolbar Ranking”
  37. Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Bosagh Zadeh WTF: The who-to-follow system at Twitter, Proceedings of the 22nd international conference on World Wide Web
  38. Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel.; Rodriguez; Van De Sompel (December 2006). “Journal Status”. Scientometrics 69 (3): 1030. arXiv:cs.GL/0601030. Bibcode:2006cs……..1030B
  39. 39) Benjamin M. Schmidt and Matthew M. Chingos (2007). “Ranking Doctoral Programs by Placement: A New Method” (PDF). PS: Political Science and Politics 40 (July): 523–529. 
  40. B Jiang (2006). “Ranking spaces for predicting human movement in an urban environment”. International Journal of Geographical Information Science 23 (7): 823–837. arXiv:physics/0612011. doi:10.1080/13658810802022822
  41. Jiang B., Zhao S., and Yin J. (2008). “Self-organized natural roads for predicting traffic flow: a sensitivity study”. Journal of Statistical Mechanics: Theory and Experiment. P07008 (7): 008. arXiv:0804.1630. Bibcode:2008JSMTE..07..008J. doi:10.1088/1742-5468/2008/07/P07008
  42. Roberto Navigli, Mirella Lapata. “An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation”. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(4), IEEE Press, 2010, pp. 678–692.
  43. Andrea Esuli and Fabrizio Sebastiani. “PageRanking WordNet synsets: An Application to Opinion-Related Properties” (PDF). In Proceedings of the 35th Meeting of the Association for Computational Linguistics, Prague, CZ, 2007, pp. 424–431. Retrieved June 30, 2007. 
  44. “Working Papers Concerning the Creation of Google”. Google. Retrieved November 29, 2006. 
  45. Cho, J., Garcia-Molina, H., and Page, L. (1998). “Efficient crawling through URL ordering”. Proceedings of the seventh conference on World Wide Web (Brisbane, Australia). 
  46. Burns, Judith (2009-09-04). “Google trick tracks extinctions”. BBC News. Retrieved 2011-05-27. 
  47. G Ivan and V. Grolmusz (2011). “When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks”. Bioinformatics (Vol. 27, No. 3. pp. 405-407) 27 (3): 405–7. doi:10.1093/bioinformatics/btq680. PMID 21149343
  48. D Banky and G. Ivan and V. Grolmusz (2013). “Equal opportunity for low-degree network nodes: a PageRank-based method for protein target identification in metabolic graphs”. PLoS One (Vol. 8, No. 1. e54204) 8 (1): 405–7. Bibcode:2013PLoSO…854204B. doi:10.1371/journal.pone.0054204. PMID 23382878
  49. “Preventing Comment Spam”. Google. Retrieved January 1, 2005. 
  50. “PageRank Sculpting: Parsing the Value and Potential Benefits of Sculpting PR with Nofollow”. SEOmoz. Retrieved 2011-05-27. 
  51. “PageRank sculpting”. Mattcutts.com. 2009-06-15. Retrieved 2011-05-27. 
  52. “PageRank Distribution Removed From WMT”. PageRank Distribution Removed From WMT. Retrieved October 16, 2009 
  53. WhatCulture!. 6 October 2011 http://whatculture.com/technology/google-pagerank-is-not-dead.php missing title (help). Retrieved 7 October 2011. 
  54. Google Panda Update: Say Goodbye to Low-Quality Link Building. Search Engine Watch. 08.02.11  Check date values in: (help)
  55. So…You Think SEO Has Changed. 19.03.14.  Check date values in: (help)

comments powered by Disqus