How do I copy a copy protected web page?


How do I copy/paste from sites that don't permit it? There is info I'd like to send to a friend without a computer but has a machine that only sends/receives plain text. I want to send her stuff from this site as an example but they don't permit copying/pasting. Is there anyway around that?
As you might expect, the website in question is trying to protect its content from theft. They have valuable information and I'm sure that people try to steal and republish their content frequently. That is, of course, quite illegal and a violation of international copyright law.
So I'll assume that's NOT what you have in mind. (Though technically even what you have in mind - while morally acceptable in my opinion - may still be in violation of that law.)
Copy protection on websites - be it just for pictures or for entire pages of content - is in my opinion pretty close to useless. It keeps honest people honest and that's about as far as it goes.
Web pages, emails, whatever: if it can be seen, it can be copied.

Above Board Techniques

By "Above Board" all I really mean is using normal website behaviour to gain access to the text in ways that perhaps the web site owner hadn't thought to prevent ... yet.
The most common: printing.
"... if it can be seen, it can be copied."
In this case, if you install a print-to-PDF printer driver such asPDFCreator and print that page to create a PDF, two interesting things happen:
  • You have a nice PDF of the page. Perhaps that might be enough to get your friend a copy of the page. Certainly it has the highest "fidelity" in that it'll include all the formatting and images as the original web page.
  • That PDF may, itself, have copy enabled. In my test of the website in question, I was able to print to PDF, and then select the desired text from the PDF and copy it elsewhere.
Another approach is to use the File -> Save As... option in the browser when viewing the page, and save it "as" plain text format. The results may vary from browser to browser, but you're likely to get a good starting point from which you can then copy the desired text.
Yet another approach is to use the "View Source" option available in most browsers which will allow you to view the underlying HTML for the page, and copy out the relevant content as needed. You'll want to clean up the results, though, removing the HTML mark-up to make the results readable.

Underground Techniques

By "underground" I mean actually taking steps to actively disable whatever copy protection has been placed on the web page or image.
Two techniques come to mind:
  • Disable Javascript. Many sites will use Javascript to implement copy protection. Disabling Javascript, in turn, disables the copy protection completely. (That happened to be the case with the example site. It also disabled a number of popup ads as a bonus.) The easiest way is to use Firefox and the "NoScript" plugin which allows you enable or disable Javascript on a site-by-site basis.
  • Disable or circumvent CSS. CSS, for Cascading Style Sheets, is actually an incredibly powerful approach to defining web page look and feel and behaviour. Using CSS it's quite possible to disable or modify the way web pages behave. It's also easy to turn off: in FireFox click on ViewPage Style and then click on No Style. The page will be re-rendered without CSS and the result, which typically visually unappealing, may well be copy-able.
There may be other approaches as well, depending on the specific techniques used to disable copying, but those are probably the 95% solution.

Off The Wall Techniques

"Off the wall" as in things that sound really stupid or something you'd never think of, but are last resort measures.
If nothing else they're proof of my original statement: if it can be seen, it can be copied.
  • Take a picture. Get your digital camera and take a picture of the screen. Instant copy.
  • Take a screen shot. Tools like SnagIt will not only automatically "page down" to get an image of the entire page (in perfect resolution, unlike your camera), but it also includes a "copy text" options that may well copy text for which traditional clipboard copy has been disabled.
  • OCR. Short for "Optical Character Recognition", OCR tools can take that "picture" of a web page (ideally the screen shot since it has the best quality) and extract from it all the visible text as editable text.
There are probably more odd and unique ways that I'm not thinking of.

If It Can Be Seen, It Can Be Copied.

I present this not as a "how to" for people wanting to make illegal copies of web sites, or even for people who want to do more acceptable things like share otherwise inaccessible content with others.
My intent here is really to point out the futility of copy protection schemes.
If you must present your information in a way that humans can read, listen or watch it then there exists a way for that content to be copied. Placing roadblocks just punishes those who would view or use your content in ways that are, ultimately, only beneficial to you without stopping those who would steal it anyway.
If someone can see it, they can copy it, forward it, publish it, whatever.
That's simply the nature of today's technology.
Not that they should, but they can.

1 comments:

This really interesting post. Thanks for sharing!