Correlation types Website correlation
1 correlation types
1.1 similar structure
1.2 same server or subnet
1.3 same owner
1.4 similar content
1.4.1 google
1.5 same category
1.6 same tracking id
correlation types
there several known types of correlation, each demonstrating different strengths , weaknesses. practical website correlation process may require combining 2 or more of these methods.
similar structure
to save time , effort, website owners duplicate major portions of website code across many domains. similarity of code structure can provide enough information correlation. organizations known have publicly search-able databases kind of correlation include:
http://www.delineal.com
note: websites can utilize same structure have no relationship each other (as when websites coincidentally utilize same content management system).
same server or subnet
also known correlated reverse dns lookup. websites may served same server, on 1 or more ip address, on 1 or more subnet. several organizations retain archives of ip address data , correlate data. examples include:
http://www.domaintools.com
note: correlation via method may misleading because websites exist on same server (aka shared hosting) have no relationship each other.
same owner
websites may authored same person or organization. website owners required provide contact information registrar obtain domain name. domain ownership can determined via whois protocol provides no mechanism searching or correlating ownership. several organizations retain archives of whois information , provide searching , correlation services. examples include:
http://whoisology.com
http://www.domaintools.com
note: website ownership information can falsified, outdated, or hidden public view. website correlation via method can accurate, misleading, or impossible depending on information contained in whois records.
similar content
search engines provide search-able databases of indexed website content. search engine results lists correlated content similarity.
on google.com type related:website_name_here.com find websites related name or phrases
find unique-sounding phrase on website use search engine(s) locate phrase literally on other websites
in search box, place quotes around phrase literal phrase search
instead of copyright 2010 xyzcompany use copyright 2010 xyzcompany
note: method of correlation inherently slow because 1 must guess phrases search for. also, related websites may not contain literally similar content (as when site translated language).
same category
websites categorized or tagged via automated or manual means. examples of publicly accessible website categorization databases include:
http://www.similarsitesearch.com/
http://similarsites.com
http://similarsites.de
http://www.similarsitecheck.com
http://www.similarto.us
dmoz
note: manual categorization , tag (metadata) methods inherently subjective. automated categorization , tagging methods inherently subject varying weaknesses , strengths of underlying categorization algorithms.
same tracking id
tracking ids, used analytics or affiliate identification embedded in website code. these ids can used correlation because imply common management of websites. publicly available websites correlating tracking id include:
http://ewhois.com
^ bruce & wiebe, recognizing subjectivity: case study in manual tagging , natural language engineering, 1999
^ fabrizio sebastiani. machine learning in automated text categorization. acm computing surveys, 34(1):1–47, 2002.
Comments
Post a Comment