Correlation types Website correlation




1 correlation types

1.1 similar structure
1.2 same server or subnet
1.3 same owner
1.4 similar content

1.4.1 google


1.5 same category
1.6 same tracking id





correlation types

there several known types of correlation, each demonstrating different strengths , weaknesses. practical website correlation process may require combining 2 or more of these methods.


similar structure

to save time , effort, website owners duplicate major portions of website code across many domains. similarity of code structure can provide enough information correlation. organizations known have publicly search-able databases kind of correlation include:



http://www.delineal.com

note: websites can utilize same structure have no relationship each other (as when websites coincidentally utilize same content management system).


same server or subnet

also known correlated reverse dns lookup. websites may served same server, on 1 or more ip address, on 1 or more subnet. several organizations retain archives of ip address data , correlate data. examples include:



http://www.domaintools.com

note: correlation via method may misleading because websites exist on same server (aka shared hosting) have no relationship each other.


same owner

websites may authored same person or organization. website owners required provide contact information registrar obtain domain name. domain ownership can determined via whois protocol provides no mechanism searching or correlating ownership. several organizations retain archives of whois information , provide searching , correlation services. examples include:



http://whoisology.com
http://www.domaintools.com

note: website ownership information can falsified, outdated, or hidden public view. website correlation via method can accurate, misleading, or impossible depending on information contained in whois records.


similar content

search engines provide search-able databases of indexed website content. search engine results lists correlated content similarity.


google

on google.com type related:website_name_here.com find websites related name or phrases
find unique-sounding phrase on website use search engine(s) locate phrase literally on other websites

in search box, place quotes around phrase literal phrase search
instead of copyright 2010 xyzcompany use copyright 2010 xyzcompany



note: method of correlation inherently slow because 1 must guess phrases search for. also, related websites may not contain literally similar content (as when site translated language).


same category

websites categorized or tagged via automated or manual means. examples of publicly accessible website categorization databases include:



http://www.similarsitesearch.com/
http://similarsites.com
http://similarsites.de
http://www.similarsitecheck.com
http://www.similarto.us
dmoz

note: manual categorization , tag (metadata) methods inherently subjective. automated categorization , tagging methods inherently subject varying weaknesses , strengths of underlying categorization algorithms.


same tracking id

tracking ids, used analytics or affiliate identification embedded in website code. these ids can used correlation because imply common management of websites. publicly available websites correlating tracking id include:



http://ewhois.com




^ bruce & wiebe, recognizing subjectivity: case study in manual tagging , natural language engineering, 1999
^ fabrizio sebastiani. machine learning in automated text categorization. acm computing surveys, 34(1):1–47, 2002.






Comments

Popular posts from this blog

Camerini.27s algorithm for undirected graphs Minimum bottleneck spanning tree

Discography Anthony Phillips

Roads and bridges List of places named for Douglas MacArthur