Docstoc’s DocCash Provides Incentive for Copyright Abuse and Spam

Wednesday, March 17th, 2010

Docstoc is an online document sharing service that allows users to upload files like Microsoft Office documents, text files, and pdf files and share them with the greater internet community. Launched in November 2007, the service has become very popular as a way to find and distribute content in those formats and now offers more than 13 million documents.

In May 2009 Docstoc offered DocCash, announced as “a service where users can now make money by uploading documents to Docstoc”. In the DocCash program, users are compensated a portion of all Google AdSense earnings generated when the documents they uploaded to Docstoc are viewed. The service expressly prohibits the uploading and sharing of documents when the user does not own copyright, and will remove content and even ban users who violate the policy when brought to their attention. However, this environment is ripe for copyright abuse, far too easy and inviting for those individuals looking to make a quick dollar.

Take for example the following Docstoc user profiles, all publically available. For each user account, we made note of the number of documents they uploaded as of this writing and the time elapsed between that user’s first and last upload (also as of this writing).


Example 1′s profile page

Example 1
Number of documents: 4,033
Time between first and last file uploaded: less than 24 hours. All files uploaded on March 3, 2010.


Example 2′s profile page

Example 2
Number of documents: 3,683
Time between first and last file uploaded: less than 24 hours. All files uploaded on March 7, 2010.


Example 3′s profile page

Example 3
Number of documents: 4,283
Time between first and last file uploaded: less than 24 hours. All files uploaded on March 7, 2010.


Example 4′s profile page

Example 4
Number of documents: 17,142
Time between first and last file uploaded: 4 days, from November 26, 2009 to December 30, 2009.

Although very remotely possible, it is very unlikely that the owners of these accounts own the copyright to such large amounts of content. It is more likely that these account owners scraped search engine results pages for queries like filetype:doc or filetype:pdf and then took advantage of Docstoc’s API to upload files in an automated manner, allowing for the volume of content to be posted so quickly.

In fact, Cyveillance has uncovered significant number of documents posted through DocStoc that include copyright statements of those other than the account owners. It is critical for brand and copyright owners to vigorously protect their intellectual property and, when identified, pursue the offenders. If not, brand equity is at risk in addition to the potential loss of common copyright protection as their content becomes public domain.

In the following two examples, the account owners attempt to earn money by uploading vast amounts of content to the site. In this case however, it appears the account owners have scraped content from different sources across the web, stitched small parts bits to form meaningless paragraphs on a single topic, and uploaded the content as a rich text file to Docstoc. The spammer is likely hoping that esoteric content, although of low value (or no value), will generate traffic from long tail search queries.


Example 5′s profile page

Example 5
Number of documents: 64,166
Time between first and last file uploaded: 6 days, from March 9, 2010 to March 15, 2010.


Example 6′s profile page

Example 6
Number of documents: 2,510
Time between first and last file uploaded: 6 days, from February 25, 2010 to today.

Like youtube.com, blogspot.com, and other sites where content can be added by users, spam and the display of copyrighted content is an issue. The situation is made even worse when uploading such content is incented with cash to upload content. Like the other services mentioned, Docstoc has come of age but is responsible to offer an environment that clearly discourages copyright abuse and should take strong steps to ensure the content uploaded by its users is not in violation of their own policies. Otherwise they will become known as a passive accomplice in copyright abuse and spam generation.

To minimize the chance that one’s own content that should not be made public is copied from one’s website and posted by others in services like Docstoc, Cyveillance recommends that companies regularly check to make sure that their sensitive internal documents as well as public, but copyrighted documents are not posted online by others, including their vendors, partners, or employees. As we encourage with our own customers brand and copyright owners need to take an aggressive posture in their own protection otherwise their own investments are diminished.

Typosquatting and Brand Owners; Comments from Ben Edelman

Tuesday, March 2nd, 2010

In mid-February Harvard researchers Tyler Moore and Benjamin Edelman posted their research on the prevalence of typosquatting, the practice of registering and monetizing domains that would likely only be visited on accident when internet users misspell the web address of legitimate websites. Among several findings in their work, titled Measuring the Perpetrators and Funders of Typosquatting, they report that 80% of typo domains lead to pay per click ads, and almost two-thirds of typo domains can be traced to just five individual advertisers using Google AdSense.

Edelman was kind enough to answer a few questions about their research.

Cyveillance: Your paper is premised on the idea that typosquatting unethically diverts traffic from legitimate online destinations. You open one of your paragraphs with the line, “Most large domain registrants present themselves as ‘domain parkers’ or domainers.” Some readers may be confused about your position on domaining as an industry. Can you clarify your stance on domaining in general?

Ben Edelman: I don’t see much genuine value coming from the domaining business. Yes, some users guess domain names, and domainers can cause results to be shown to users who might otherwise receive error messages. But most web browsers already show results that are at least as useful as domainers’ placeholders – often better, with genuine organic results rather than merely advertisements.

Meanwhile, domainers cause some important harms: For one, as detailed in my article, domainers deplete advertisers’ budgets. Domainers also make it more costly for entrepreneurs to obtain the domains required to run actual substantive businesses: A domain might truly be unclaimed, in the sense that no one has ever used it for anything interesting, but a domainer would nonetheless be able to withhold that domain from a would-be user until they agree on a price. Combine these harms with the remarkably widespread ongoing problem of typosquatting, as presented in my article, and the net value-add of domainers is far from clear.

Domainers will vigorously defend their right to advance-register large numbers of domains, as if this is some kind of moral entitlement. I’m not so sure. In many areas, landowners are (and, historically, have been) required to improve their property lest they be a blight or eyesore to others. The analogy here is less direct: Which domains are “near” an unimproved domainer domain? But certainly unimproved domains harm others, by impeding what could be direct navigations, and by driving up costs to others. Indeed, limits on domain purchases have ample precedent – dating back to Jon Postel’s early restrictions on how many domains a single person or entity could request, and similar restrictions in certain ccTLDs. At least as against domainers with thousands, tens of thousands, or even hundreds of thousands of domains, these ideas do ring true to me.

Cyveillance: In your attempts to collect information about the behavior of typosquatting domains, some websites prevented your systems from gathering information about them. Can you discuss which servers attempted to prevent your analysis? Are you aware of any direct or indirect response to your investigation on their part?

Ben Edelman: Google has pointed out that it will disable typosquatting domains in response to a trademark holder’s specific request. Indeed, but what about infractions that come to Google’s attention some other way, such as in my article or in a complaint from the general public? What about infractions that are readily apparent to Google, thanks to Google’s excellent semantic analysis software? Google does as little as it can – letting Google and its partners continue to profit as widely as they can. Once Google is on actual knowledge that a domain is a variation of a trademark – either because a member of the public says so, or because Google’s own software figured it out – I’d like to see Google avoid targeting ads to that domain. And there’s a strong case that that’s exactly the behavior that the ACPA requires.

Meanwhile, trademark holders have ample grounds to be angry. And reading my article, I believe a new set of trademark holders is remembering that there’s more they could do here.

Cyveillance: Many merchants make use of affiliates to promote their products and services on the internet. You mentioned that “Few affiliate merchants affirmatively allow typosquatting, and most disallow it when it comes to their attention.” What recommendations, if any, do you have for merchants in this situation? Why do you believe most do not prohibit typosquatting among their affiliates to begin with?

Ben Edelman: An easy first step is a specific contractual prohibition on affiliates registering or using typosquatting domains. But merchants then need to follow through on this prohibition by implementing effective, robust enforcement. And merchants would do well to penalize violators, including through litigation. Recall Lands End v. Remy, wherein Lands End sued several LinkShare affiliates who had used typosquatting domains to claim affiliate commissions they had never properly earned.

Cyveillance: Your article states that there are “two main uses for traffic diverted to typo domains: placing pay-per-click ads and redirecting to other (often competing) domains.” Both situations cost brand owners money. This may seem obvious, but just to be sure: which is worse for a brand owner in your opinion?

Ben Edelman: They’re both unlawful, and they’re both unacceptable.

Cyveillance: You conclude by offering that the parties with the most ability to reduce typosquatting are the ad platforms of Google and Yahoo. Do you expect to see either company modify its practices based data like that found in your investigation?

Ben Edelman: I see the two main ways to compel ad platforms to change their practices: litigation and public outcry. Both are underway.

Cyveillance: Based on your research what advice do you have for brand owners when faced with the problem of typosquatting?

Ben Edelman: Trademark owners need not write off typosquatting as an unavoidable cost of doing business. Perpetrators are identifiable, and legal remedies are clear. In few other contexts do sophisticated companies sit back and let themselves get cheated. I don’t see why they’d want to do that here.


Many thanks to Edelman for taking the time to answer these questions.