CSS-based anti-copy method

Discussion in 'Tech Discussion' started by noisypixy, Aug 25, 2017.

  1. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
    TL;DR: https://jsbin.com/xowabuziwe/edit?output

    ---

    I don't remember reading anything similar to this method, so I'll post it just for the archive. This is just a proof of concept aimed at programmers who'd like to play with this idea.

    Final users (e.g. bloggers) won't benefit much from this thread; their only "gain" would be knowing something like this can be done. The link now has a simple GUI that allows anyone to obfuscate rich HTML just by copy-pasting and clicking a button.

    As always, this method is not bulletproof, and I even say which function to use if you want to bypass it since I have no interest in hiding it anyway.

    Use a lot of empty <span> elements and abuse CSS ::before and ::after pseudo-elements.
    • For readers: Can be read with JavaScript disabled.
    • For readers: Doesn't mess with text flow (words won't break at weird places).
    • Against scrapers: Requires the use of either (1) a JavaScript interpreter or (2) both HTML+CSS parser (emphasis on "CSS parser", which is rare).
    • Against scrapers: JavaScript-based scrapers require the use of a relatively-obscure function (window.getComputedStyle) in a not-so-straightforward way.
    • Against copy-pasters: Can't be copied via normal methods, even with JavaScript disabled.
    • Doesn't work at all with Internet Explorer (can be copied normally) (thanks to @coyoteelabs for pointing this out).
    • Requires inserting custom HTML and CSS (free WordPress sites can't do this, for example).
    • Doesn't work with formatting (so you can't have bold text, for example).
    • For readers: Doesn't work for screen reader software.
    • Against scrapers: OCR still works.

    UPDATE 2017-08-25 01:51 AM UTC

    More user-friendly version available at https://jsbin.com/xowabuziwe/edit?output, with formatting support.

    Usage
    1. You put your HTML in the first textarea (the one that says "Insert your HTML here.")
    2. Click the "Obfuscate" button (it's right below that textarea).
    3. The other 2 textareas will be filled with the obfuscated HTML and CSS respectively.
    ---
    Code
    Code:
    #!/usr/bin/env python3
    #
    # This script is released to the public domain. No attribution required.
    #
    # Last updated: 2017-08-24
    
    import binascii
    
    # File containing the text that will be obfuscated.
    #
    # You must separate paragraphs by an empty line.
    SOURCE_FILE = 'source.txt'
    
    # File where the obfuscated HTML will be saved.
    OUTPUT_HTML = 'output-html.txt'
    
    # File where the obfuscated CSS will be saved.
    OUTPUT_CSS = 'output-css.txt'
    
    # ID of the element that will wrap the obfuscated text.
    CONTAINER_ID = 'obfuscated-text'
    
    #------------------------------------------------------------------------------
    # No need to edit anything below this comment.
    #------------------------------------------------------------------------------
    
    with open(SOURCE_FILE, 'r', encoding='utf-8') as f:
        SOURCE_TEXT = f.read()
    
    class Word:
    
        def __init__(self, word, paragraph_index, word_index):
            self.text = word.strip()
            self.paragraph_index = paragraph_index
            self.word_index = word_index
    
        def split(self):
            parts = []
            for i in range(0, len(self.text), 2):
                parts.append(self.text[i:i+2])
    
            return parts
    
        def html(self):
            result = ''
    
            for part in self.split():
                result += '<span></span>'
    
            return '<span>{}</span>'.format(result)
    
        def css(self):
            result = ''
    
            for i, part in enumerate(self.split()):
                result += '>'.join([
                    '#' + CONTAINER_ID,
                    'p:nth-child(' + str(self.paragraph_index) + ')',
                    'span:nth-child(' + str(self.word_index) + ')',
                    'span:nth-child(' + str(i + 1) + ')::before'
                ])
                result += '{'
                result += 'content:\'\\{}\''.format(binascii.b2a_hex(part[0].encode()).decode())
                result += '}'
    
                if len(part) > 1:
                    result += '>'.join([
                        '#' + CONTAINER_ID,
                        'p:nth-child(' + str(self.paragraph_index) + ')',
                        'span:nth-child(' + str(self.word_index) + ')',
                        'span:nth-child(' + str(i + 1) + ')::after'
                    ])
                    result += '{'
                    result += 'content:\'\\{}\''.format(binascii.b2a_hex(part[1].encode()).decode())
                    result += '}'
    
            return result
    
    class Paragraph:
    
        def __init__(self, text, index):
            self.text = text.strip()
            self.index = index
            self.words = [Word(w, self.index, i + 1) for i, w in enumerate(self.text.split())]
    
        def html(self):
            words = [w.html() for w in self.words]
    
            return '<p>{}</p>'.format(' '.join(words))
    
        def css(self):
            words = [w.css() for w in self.words]
    
            return ''.join(words)
    
    paragraphs = [Paragraph(p, i + 1) for i, p in enumerate(SOURCE_TEXT.split('\n\n'))]
    
    with open(OUTPUT_HTML, 'w', encoding='utf-8') as f:
        print('<div id="'+CONTAINER_ID+'">', file=f)
        print(paragraphs[0].html(), file=f)
        print('</div>', file=f)
    
    with open(OUTPUT_CSS, 'w', encoding='utf-8') as f:
        print(paragraphs[0].css(), file=f)
    
    

    Usage
    First, put the code in an "obfuscate.py" file (the name is irrelevant).

    Then put your text in the "source.txt" file in the same directory as the "obfuscate.py" file. The text must separate paragraphs by an empty line:
    Code:
    This is paragraph 1. Line breaks
    are
    okay,
    so this is still paragraph 1.
    The script will replace these line breaks with a space.
    
    This is paragraph 2 ... ... lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla semper rhoncus dapibus. Aenean faucibus vehicula tempor. Donec dignissim elit dolor, consectetur molestie erat viverra nec. Suspendisse potenti. Nullam vitae tempor erat. Nam in odio porttitor, mattis libero sed, vehicula nibh. Ut condimentum sit amet magna vitae vehicula. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ut massa a orci dignissim placerat. Sed nibh mi, vulputate vitae blandit id, posuere a nunc. Donec eu est urna.
    

    Open a terminal in the directory of the script and run "python3 obfuscate.py". It will generate "output-html.txt" and "output-css.txt" files with the respective HTML and CSS.
     
    Last edited: Aug 25, 2017
    RakuraiSenshi, kenar, Hahhaa and 33 others like this.
  2. Serendipity

    Serendipity No one, nothing

    Joined:
    Mar 12, 2016
    Messages:
    457
    Likes Received:
    739
    Reading List:
    Link
    *Grudgingly gives like to genius programmer*
     
  3. Crazyh3

    Crazyh3 Well-Known Member

    Joined:
    Dec 23, 2015
    Messages:
    883
    Likes Received:
    1,234
    Reading List:
    Link
    Can't you just copy the whole html section, then use find+replace on all the <span> to remove them so that you are left with the text?

    I guess this doesn't fall under normal methods.
    *edit* Tried it and ended up with incomprehensible stuff instead of letters.
     
    Last edited: Aug 25, 2017
    kenar and noisypixy like this.
  4. Theo Thanasia

    Theo Thanasia Medicine Master

    Joined:
    Dec 12, 2015
    Messages:
    220
    Likes Received:
    138
    Reading List:
    Link
    *claps slowly*
     
  5. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
    Nope, that's the point of the separate CSS.
    • If you use only the HTML you can't know the letters.
    • If you use only the CSS it'll be somewhat complex to get the words (since you'll have only letters without spaces, you'd need a good dictionary).
    And the CSS uses the hex representation for characters (so you'll get "61" instead of "a"). I strictly avoided putting the letters directly.

    Actually, if someone implements this method, they can pass the CSS through a CSS preprocessor to "merge" the rules as an additional step of this method. That would:
    1. Make the find-and-search approach useless.
    2. Force scrapers to have the HTML if they want to get something useful.
    Those CSS preprocessors are widely used in web development, so configuring one wouldn't be so hard.
     
  6. SnowTime

    SnowTime Busy Busy Busy, I Dug Too Many Holes

    Joined:
    Oct 23, 2015
    Messages:
    2,620
    Likes Received:
    3,612
    Reading List:
    Link
    rip, if I didn't have so much formatting this would have been so nice
    \o/
    Thanks for the effort though, learned something new
    <- Total newb at programming :blobpeek:
     
    kenar and AliceShiki like this.
  7. Zalpha

    Zalpha Well-Known Member

    Joined:
    May 23, 2016
    Messages:
    581
    Likes Received:
    488
    Reading List:
    Link
    The people who want to copy-paste will find a way, so really only the lazy or ill informed/skilled copy-pasters will be stopped, other than that regular readers maybe affected. I know I run into a few problems on websites because of my AD blockers, but I wont turn them off if the site try and force me too do so.
     
    Last edited: Aug 25, 2017
  8. TUSF

    TUSF Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    323
    Likes Received:
    236
    Reading List:
    Link
    Doesn't work. All the text is in the CSS file, in an obfuscated manner; you need both the HTML and CSS. The HTML knows where the letters are, but not what they are, while the CSS knows what the letters are, and in what general order (a bit of added obfuscating could also make it impossible for the CSS to know either) but the CSS file doesn't know where the spaces are or anything.

    This is really only a fault of how it's implemented. You can give formatting rules to the letters and words directly from the CSS.
    For example, putting in:
    Code:
    #obfuscated-text>p:nth-child(1)>span:nth-child(2){color:#00F;font-weight: 900;}
    makes the second word of your Lorem Ipsum turn bold blue.
     
  9. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
    Yup:
     
  10. Zx

    Zx Well-Known Member

    Joined:
    Feb 8, 2016
    Messages:
    460
    Likes Received:
    396
    Reading List:
    Link
    This incredibly easy to get past, there are image to text converters.

    Take a picture of text, use program to convert picture to text, done.
     
  11. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
     
    kenar and AliceShiki like this.
  12. Zx

    Zx Well-Known Member

    Joined:
    Feb 8, 2016
    Messages:
    460
    Likes Received:
    396
    Reading List:
    Link
    I admit, I did not read your entire post.

    Lol
     
  13. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
    I wouldn't do it either, I'm scared of text walls xD.
     
    Zx likes this.
  14. Darkaeluz

    Darkaeluz 『Whosays25 Onii-chan』, 『He who gave up on Love』

    Joined:
    Oct 21, 2015
    Messages:
    893
    Likes Received:
    1,799
    Reading List:
    Link
  15. ArtistsTech

    ArtistsTech Well-Known Member

    Joined:
    Nov 5, 2015
    Messages:
    80
    Likes Received:
    71
    Reading List:
    Link
  16. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
    kenar and AliceShiki like this.
  17. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
    Formatting support + more friendly GUI added: https://jsbin.com/xowabuziwe

    Post updated.
     
    Last edited: Aug 25, 2017
  18. lnv

    lnv ✪ Well-Known Hypocrite

    Joined:
    Jan 24, 2017
    Messages:
    7,702
    Likes Received:
    9,044
    Reading List:
    Link
    Is there really a point with going so far? We all know scrapers are going to get into it one way or the other... the people who make these scraping bots are not amateur programmers. So its mostly a matter of time. For you to make it really work each site would need its own unique formatting to make it a pain for each individual to set up. Otherwise, putting together a way to scrape it is a piece of cake really.

    https://jsfiddle.net/6jtutrgL/1/
     
    kenar, noisypixy and Sherrynity like this.
  19. noisypixy

    noisypixy Sacatunn que pen, que summum que tun.

    Joined:
    Jun 25, 2016
    Messages:
    716
    Likes Received:
    950
    Reading List:
    Link
    Last edited: Aug 25, 2017
    AliceShiki likes this.
  20. lnv

    lnv ✪ Well-Known Hypocrite

    Joined:
    Jan 24, 2017
    Messages:
    7,702
    Likes Received:
    9,044
    Reading List:
    Link
    Well you can do a lot of tricks to mess up what I did from making some fake text via display: none or position absolute and setting the top off the screen. But again these are all just minor grievances.

    But as long as you are enjoying it.

    Edit: Oh maybe add inline-flex as the display type and play around with 'order', that might be annoying too
     
    Last edited: Aug 25, 2017