CSS-based anti-copy method

noisypixy · Aug 25, 2017

TL;DR: https://jsbin.com/xowabuziwe/edit?output

---

I don't remember reading anything similar to this method, so I'll post it just for the archive. This is just a proof of concept aimed at programmers who'd like to play with this idea.

Final users (e.g. bloggers) won't benefit much from this thread; their only "gain" would be knowing something like this can be done. The link now has a simple GUI that allows anyone to obfuscate rich HTML just by copy-pasting and clicking a button.

As always, this method is not bulletproof, and I even say which function to use if you want to bypass it since I have no interest in hiding it anyway.

Use a lot of empty <span> elements and abuse CSS ::before and ::after pseudo-elements.

For readers: Can be read with JavaScript disabled.

For readers: Doesn't mess with text flow (words won't break at weird places).

Against scrapers: Requires the use of either (1) a JavaScript interpreter or (2) both HTML+CSS parser (emphasis on "CSS parser", which is rare).

Against scrapers: JavaScript-based scrapers require the use of a relatively-obscure function (window.getComputedStyle) in a not-so-straightforward way.

Against copy-pasters: Can't be copied via normal methods, even with JavaScript disabled.

Doesn't work at all with Internet Explorer (can be copied normally) (thanks to @coyoteelabs for pointing this out).

Requires inserting custom HTML and CSS (free WordPress sites can't do this, for example).

Doesn't work with formatting (so you can't have bold text, for example).

For readers: Doesn't work for screen reader software.

Against scrapers: OCR still works.

https://jsfiddle.net/6jtutrgL/

UPDATE 2017-08-25 01:51 AM UTC
More user-friendly version available at https://jsbin.com/xowabuziwe/edit?output, with formatting support.

Usage

You put your HTML in the first textarea (the one that says "Insert your HTML here.")

Click the "Obfuscate" button (it's right below that textarea).

The other 2 textareas will be filled with the obfuscated HTML and CSS respectively.

---
Code
Code:
#!/usr/bin/env python3
#
# This script is released to the public domain. No attribution required.
#
# Last updated: 2017-08-24

import binascii

# File containing the text that will be obfuscated.
#
# You must separate paragraphs by an empty line.
SOURCE_FILE = 'source.txt'

# File where the obfuscated HTML will be saved.
OUTPUT_HTML = 'output-html.txt'

# File where the obfuscated CSS will be saved.
OUTPUT_CSS = 'output-css.txt'

# ID of the element that will wrap the obfuscated text.
CONTAINER_ID = 'obfuscated-text'

#------------------------------------------------------------------------------
# No need to edit anything below this comment.
#------------------------------------------------------------------------------

with open(SOURCE_FILE, 'r', encoding='utf-8') as f:
    SOURCE_TEXT = f.read()

class Word:

    def __init__(self, word, paragraph_index, word_index):
        self.text = word.strip()
        self.paragraph_index = paragraph_index
        self.word_index = word_index

    def split(self):
        parts = []
        for i in range(0, len(self.text), 2):
            parts.append(self.text[i:i+2])

        return parts

    def html(self):
        result = ''

        for part in self.split():
            result += '<span></span>'

        return '<span>{}</span>'.format(result)

    def css(self):
        result = ''

        for i, part in enumerate(self.split()):
            result += '>'.join([
                '#' + CONTAINER_ID,
                'p:nth-child(' + str(self.paragraph_index) + ')',
                'span:nth-child(' + str(self.word_index) + ')',
                'span:nth-child(' + str(i + 1) + ')::before'
            ])
            result += '{'
            result += 'content:\'\\{}\''.format(binascii.b2a_hex(part[0].encode()).decode())
            result += '}'

            if len(part) > 1:
                result += '>'.join([
                    '#' + CONTAINER_ID,
                    'p:nth-child(' + str(self.paragraph_index) + ')',
                    'span:nth-child(' + str(self.word_index) + ')',
                    'span:nth-child(' + str(i + 1) + ')::after'
                ])
                result += '{'
                result += 'content:\'\\{}\''.format(binascii.b2a_hex(part[1].encode()).decode())
                result += '}'

        return result

class Paragraph:

    def __init__(self, text, index):
        self.text = text.strip()
        self.index = index
        self.words = [Word(w, self.index, i + 1) for i, w in enumerate(self.text.split())]

    def html(self):
        words = [w.html() for w in self.words]

        return '<p>{}</p>'.format(' '.join(words))

    def css(self):
        words = [w.css() for w in self.words]

        return ''.join(words)

paragraphs = [Paragraph(p, i + 1) for i, p in enumerate(SOURCE_TEXT.split('\n\n'))]

with open(OUTPUT_HTML, 'w', encoding='utf-8') as f:
    print('<div id="'+CONTAINER_ID+'">', file=f)
    print(paragraphs[0].html(), file=f)
    print('</div>', file=f)

with open(OUTPUT_CSS, 'w', encoding='utf-8') as f:
    print(paragraphs[0].css(), file=f)
Usage
First, put the code in an "obfuscate.py" file (the name is irrelevant).

Then put your text in the "source.txt" file in the same directory as the "obfuscate.py" file. The text must separate paragraphs by an empty line:
Code:
This is paragraph 1. Line breaks
are
okay,
so this is still paragraph 1.
The script will replace these line breaks with a space.

This is paragraph 2 ... ... lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla semper rhoncus dapibus. Aenean faucibus vehicula tempor. Donec dignissim elit dolor, consectetur molestie erat viverra nec. Suspendisse potenti. Nullam vitae tempor erat. Nam in odio porttitor, mattis libero sed, vehicula nibh. Ut condimentum sit amet magna vitae vehicula. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ut massa a orci dignissim placerat. Sed nibh mi, vulputate vitae blandit id, posuere a nunc. Donec eu est urna.
Open a terminal in the directory of the script and run "python3 obfuscate.py". It will generate "output-html.txt" and "output-css.txt" files with the respective HTML and CSS.

Serendipity · Aug 25, 2017

*Grudgingly gives like to genius programmer*

Crazyh3 · Aug 25, 2017

noisypixy said: ↑
I don't remember reading anything similar to this method, so I'll post it just for the archive. This is just a proof of concept aimed at programmers who'd like to play with this idea.

Final users (e.g. bloggers) won't benefit much from this thread; their only "gain" would be knowing something like this can be done.

As always, this method is not bulletproof, and I even say which function to use if you want to bypass it since I have no interest in hiding it anyway.

Description
Use a lot of empty <span> elements and abuse CSS ::before and ::after pseudo-elements.

Sample result
https://jsfiddle.net/6jtutrgL/

Advantages

For readers: Can be read with JavaScript disabled.

For readers: Doesn't mess with text flow (words won't break at weird places).

Against scrapers: Requires the use of either (1) a JavaScript interpreter or (2) both HTML+CSS parser (emphasis on "CSS parser", which is rare).

Against scrapers: JavaScript-based scrapers require the use of a relatively-obscure function (window.getComputedStyle) in a not-so-straightforward way.

Against copy-pasters: Can't be copied via normal methods, even with JavaScript disabled.

Disadvantages

Requires inserting custom HTML and CSS (free WordPress sites can't do this, for example).

Doesn't work with formatting (so you can't have bold text, for example).

For readers: Doesn't work for screen reader software.

Against scrapers: OCR still works.

Code
Code:
#!/usr/bin/env python3
#
# This script is released to the public domain. No attribution required.
#
# Last updated: 2017-08-24

import binascii

# File containing the text that will be obfuscated.
#
# You must separate paragraphs by an empty line.
SOURCE_FILE = 'source.txt'

# File where the obfuscated HTML will be saved.
OUTPUT_HTML = 'output-html.txt'

# File where the obfuscated CSS will be saved.
OUTPUT_CSS = 'output-css.txt'

# ID of the element that will wrap the obfuscated text.
CONTAINER_ID = 'obfuscated-text'

#------------------------------------------------------------------------------
# No need to edit anything below this comment.
#------------------------------------------------------------------------------

with open(SOURCE_FILE, 'r', encoding='utf-8') as f:
    SOURCE_TEXT = f.read()

class Word:

    def __init__(self, word, paragraph_index, word_index):
        self.text = word.strip()
        self.paragraph_index = paragraph_index
        self.word_index = word_index

    def split(self):
        parts = []
        for i in range(0, len(self.text), 2):
            parts.append(self.text[i:i+2])

        return parts

    def html(self):
        result = ''

        for part in self.split():
            result += '<span></span>'

        return '<span>{}</span>'.format(result)

    def css(self):
        result = ''

        for i, part in enumerate(self.split()):
            result += '>'.join([
                '#' + CONTAINER_ID,
                'p:nth-child(' + str(self.paragraph_index) + ')',
                'span:nth-child(' + str(self.word_index) + ')',
                'span:nth-child(' + str(i + 1) + ')::before'
            ])
            result += '{'
            result += 'content:\'\\{}\''.format(binascii.b2a_hex(part[0].encode()).decode())
            result += '}'

            if len(part) > 1:
                result += '>'.join([
                    '#' + CONTAINER_ID,
                    'p:nth-child(' + str(self.paragraph_index) + ')',
                    'span:nth-child(' + str(self.word_index) + ')',
                    'span:nth-child(' + str(i + 1) + ')::after'
                ])
                result += '{'
                result += 'content:\'\\{}\''.format(binascii.b2a_hex(part[1].encode()).decode())
                result += '}'

        return result

class Paragraph:

    def __init__(self, text, index):
        self.text = text.strip()
        self.index = index
        self.words = [Word(w, self.index, i + 1) for i, w in enumerate(self.text.split())]

    def html(self):
        words = [w.html() for w in self.words]

        return '<p>{}</p>'.format(' '.join(words))

    def css(self):
        words = [w.css() for w in self.words]

        return ''.join(words)

paragraphs = [Paragraph(p, i + 1) for i, p in enumerate(SOURCE_TEXT.split('\n\n'))]

with open(OUTPUT_HTML, 'w', encoding='utf-8') as f:
    print('<div id="'+CONTAINER_ID+'">', file=f)
    print(paragraphs[0].html(), file=f)
    print('</div>', file=f)

with open(OUTPUT_CSS, 'w', encoding='utf-8') as f:
    print(paragraphs[0].css(), file=f)
Usage
First, put the code in an "obfuscate.py" file (the name is irrelevant).

Then put your text in the "source.txt" file in the same directory as the "obfuscate.py" file. The text must separate paragraphs by an empty line:
Code:
This is paragraph 1. Line breaks
are
okay,
so this is still paragraph 1.
The script will replace these line breaks with a space.

This is paragraph 2 ... ... lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla semper rhoncus dapibus. Aenean faucibus vehicula tempor. Donec dignissim elit dolor, consectetur molestie erat viverra nec. Suspendisse potenti. Nullam vitae tempor erat. Nam in odio porttitor, mattis libero sed, vehicula nibh. Ut condimentum sit amet magna vitae vehicula. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ut massa a orci dignissim placerat. Sed nibh mi, vulputate vitae blandit id, posuere a nunc. Donec eu est urna.
Open a terminal in the directory of the script and run "python3 obfuscate.py". It will generate "output-html.txt" and "output-css.txt" files with the respective HTML and CSS.
Click to expand...
Can't you just copy the whole html section, then use find+replace on all the <span> to remove them so that you are left with the text?

I guess this doesn't fall under normal methods.
*edit* Tried it and ended up with incomprehensible stuff instead of letters.

Theo Thanasia · Aug 25, 2017

*claps slowly*

noisypixy · Aug 25, 2017

Crazyh3 said: ↑

Can't you just copy the whole html section, then use find+replace on all the <span> to remove them so that you are left with the text?
Click to expand...

Nope, that's the point of the separate CSS.

If you use only the HTML you can't know the letters.

If you use only the CSS it'll be somewhat complex to get the words (since you'll have only letters without spaces, you'd need a good dictionary).

And the CSS uses the hex representation for characters (so you'll get "61" instead of "a"). I strictly avoided putting the letters directly.

Actually, if someone implements this method, they can pass the CSS through a CSS preprocessor to "merge" the rules as an additional step of this method. That would:

Make the find-and-search approach useless.

Force scrapers to have the HTML if they want to get something useful.

Those CSS preprocessors are widely used in web development, so configuring one wouldn't be so hard.

SnowTime · Aug 25, 2017

rip, if I didn't have so much formatting this would have been so nice
\o/
Thanks for the effort though, learned something new
<- Total newb at programming

Zalpha · Aug 25, 2017

The people who want to copy-paste will find a way, so really only the lazy or ill informed/skilled copy-pasters will be stopped, other than that regular readers maybe affected. I know I run into a few problems on websites because of my AD blockers, but I wont turn them off if the site try and force me too do so.

TUSF · Aug 25, 2017

Crazyh3 said: ↑

Can't you just copy the whole html section, then use find+replace on all the <span> to remove them so that you are left with the text?

I guess this doesn't fall under normal methods.
*edit* Tried it and ended up with incomprehensible stuff instead of letters.
Click to expand...

Doesn't work. All the text is in the CSS file, in an obfuscated manner; you need both the HTML and CSS. The HTML knows where the letters are, but not what they are, while the CSS knows what the letters are, and in what general order (a bit of added obfuscating could also make it impossible for the CSS to know either) but the CSS file doesn't know where the spaces are or anything.

noisypixy said: ↑

Doesn't work with formatting (so you can't have bold text, for example).

Click to expand...

This is really only a fault of how it's implemented. You can give formatting rules to the letters and words directly from the CSS.
For example, putting in:
Code:
#obfuscated-text>p:nth-child(1)>span:nth-child(2){color:#00F;font-weight: 900;}
makes the second word of your Lorem Ipsum turn bold blue.

noisypixy · Aug 25, 2017

Zalpha said: ↑

The people who want to copy-paste will find a way, so really only the lazy or ill informed/skilled copy-pasters will be stopped, other than that regular readers maybe affected. I know I run into a few problems on websites because of my AD blockers, but I wont turn them of the site if they try and force me too.
Click to expand...

Yup:

noisypixy said: ↑

As always, this method is not bulletproof, and I even say which function to use if you want to bypass it since I have no interest in hiding it anyway.
Click to expand...

Zx · Aug 25, 2017

This incredibly easy to get past, there are image to text converters.

Take a picture of text, use program to convert picture to text, done.

noisypixy · Aug 25, 2017

Zx said: ↑

This incredibly easy to get past, there are image to text converters.

Take a picture of text, use program to convert picture to text, done.
Click to expand...

noisypixy said: ↑

Disadvantages

(...)

Against scrapers: OCR still works.

Click to expand...

Zx · Aug 25, 2017

I admit, I did not read your entire post.

Lol

noisypixy · Aug 25, 2017

Zx said: ↑

I admit, I did not read your entire post.

Lol
Click to expand...

I wouldn't do it either, I'm scared of text walls xD.

Darkaeluz · Aug 25, 2017

@Tony, @Parth37955 make this a sticky please

ArtistsTech · Aug 25, 2017

https://www.w3.org/standards/webdesign/accessibility

noisypixy · Aug 25, 2017

ArtistsTech said: ↑

https://www.w3.org/standards/webdesign/accessibility
Click to expand...

noisypixy said: ↑

Disadvantages

(...)

For readers: Doesn't work for screen reader software.

Click to expand...

noisypixy · Aug 25, 2017

SnowTime said: ↑

rip, if I didn't have so much formatting this would have been so nice
Click to expand...

Formatting support + more friendly GUI added: https://jsbin.com/xowabuziwe

Post updated.

lnv · Aug 25, 2017

Is there really a point with going so far? We all know scrapers are going to get into it one way or the other... the people who make these scraping bots are not amateur programmers. So its mostly a matter of time. For you to make it really work each site would need its own unique formatting to make it a pain for each individual to set up. Otherwise, putting together a way to scrape it is a piece of cake really.

https://jsfiddle.net/6jtutrgL/1/

noisypixy · Aug 25, 2017

lnv said: ↑

Is there really a point with going so far?
Click to expand...

Fun.

I was also tempted to make a variation of this to hide text in 1x1 elements (can be bypassed by checking size with Element.getBoundingClientRect).

EDIT:

lnv said: ↑

https://jsfiddle.net/6jtutrgL/1/
Click to expand...

I'd do it like this btw: https://jsfiddle.net/6jtutrgL/2/

lnv · Aug 25, 2017

noisypixy said: ↑

Fun.

I was also tempted to make a variation of this to hide text in 1x1 elements (can be bypassed by checking size with Element.getBoundingClientRect).
Click to expand...

Well you can do a lot of tricks to mess up what I did from making some fake text via display: none or position absolute and setting the top off the screen. But again these are all just minor grievances.

But as long as you are enjoying it.

Edit: Oh maybe add inline-flex as the display type and play around with 'order', that might be annoying too

Log in

CSS-based anti-copy method

noisypixy Sacatunn que pen, que summum que tun.

Serendipity No one, nothing

Crazyh3 Well-Known Member

Theo Thanasia Medicine Master

noisypixy Sacatunn que pen, que summum que tun.

SnowTime Busy Busy Busy, I Dug Too Many Holes

Zalpha Well-Known Member

TUSF Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

Zx Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

Zx Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

Darkaeluz 『Whosays25 Onii-chan』, 『He who gave up on Love』

ArtistsTech Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

noisypixy Sacatunn que pen, que summum que tun.

lnv ✪ Well-Known Hypocrite

noisypixy Sacatunn que pen, que summum que tun.

lnv ✪ Well-Known Hypocrite

Log in

CSS-based anti-copy method

noisypixy Sacatunn que pen, que summum que tun.

Serendipity No one, nothing

Crazyh3 Well-Known Member

Theo Thanasia Medicine Master

noisypixy Sacatunn que pen, que summum que tun.

SnowTime Busy Busy Busy, I Dug Too Many Holes

Zalpha Well-Known Member

TUSF Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

Zx Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

Zx Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

Darkaeluz 『Whosays25 Onii-chan』, 『He who gave up on Love』

ArtistsTech Well-Known Member

noisypixy Sacatunn que pen, que summum que tun.

noisypixy Sacatunn que pen, que summum que tun.

lnv ✪ Well-Known Hypocrite

noisypixy Sacatunn que pen, que summum que tun.

lnv ✪ Well-Known Hypocrite

Useful Searches