Please help me with OCR (Optical Character Recognition)

Discussion in 'Tech Discussion' started by Chosen, Oct 20, 2017.

  1. Liyus

    Liyus Laksha's Desu~ Cat

    Joined:
    Nov 10, 2015
    Messages:
    4,216
    Likes Received:
    4,757
    Reading List:
    Link
    Keysign, Chosen and Raneday like this.
  2. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
    All of it are like that
    That's Vip chapters etc...
     
  3. elengee

    elengee Daoist Ninefaps

    Joined:
    Mar 15, 2016
    Messages:
    13,488
    Likes Received:
    25,896
    Reading List:
    Link
    What, and just because it's black it's an issue?!?! BLACK SCREENSHOTS MATTER! :hmm:
     
    Keysign and Chosen like this.
  4. Raneday

    Raneday Not Rane

    Joined:
    Apr 24, 2016
    Messages:
    16,647
    Likes Received:
    36,634
    Reading List:
    Link
  5. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
  6. MasterCuddler

    MasterCuddler Handsome Chicken

    Joined:
    Apr 30, 2016
    Messages:
    2,636
    Likes Received:
    3,806
    Reading List:
    Link
    Not sure wat u want but here’s clearer pic?
    [​IMG]
     
    Last edited: Oct 20, 2017
  7. Cosmic_

    Cosmic_ [Novel Addict] [Lazy Writer] [Meh Editor]

    Joined:
    Sep 26, 2016
    Messages:
    2,680
    Likes Received:
    2,390
    Reading List:
    Link
  8. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
    Well, idk
    Thanks for the clear picture but still same problem maybe it is really because there's underlines
     
  9. erowarrior

    erowarrior Well-Known Member

    Joined:
    Dec 11, 2015
    Messages:
    1,232
    Likes Received:
    771
    Reading List:
    Link
    WTF is that?
    It show a blue image to me but when i zoom it become a chapter o-o

    AWESOME

    Edit: here PNG file o-o
     

    Attached Files:

    Chosen likes this.
  10. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
    Same error etc...
     
  11. asdfghjkl

    asdfghjkl jnvfnvfutucbdtcbbyhn

    Joined:
    Oct 24, 2015
    Messages:
    245
    Likes Received:
    294
    Reading List:
    Link
    Overall ocr sucks but the best result I could get with a sample paragraph from this image was using google's cloud vision api.
    https://cloud.google.com/vision/docs/reference/rest/v1/images/annotate#AnnotateImageRequest
    with the following request body
    Code:
    {
      "requests": [
        {
          "image": {
            "content": "iVBORw0KGgoAAAANSUhEUgAAAVsAAACCCAYAAADoiWu+AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAABD+SURBVHhe7Zzbjis7DkP3///0mRFwBHAEXSi7Ut2ZzQUIsSVSclUnfuw//wghhPg4umyFEOIFdNkKIcQL6LIVQogX0GUrhBAvoMtWCCFeQJetEEK8gC5bIYR4AV22QgjxArpshRDiBXTZAn/+/PmfyKjyNzzRc9vj6ec78T19hid58z10MP2eeo9Pnf0n+zw1+xP83pN9GPujxGBBbewRIyPmp33GiQep9Ns+zonv6TNs6ea8+R46mH6VZnsW19tnFQysbuKkT/T43j5jRKpcjFPOnV8E87JivtIZXQ25mTXNYHpUdNqpdhoZno/aLD5F1ft05sbnz+aRMfXr6pMXQe3pWQzTVLGF9cQ5Hl7DTyfunUlX+RjOnV8EvqBqbXQ1I8t1MD22PZ3TPqbLwulqHawOqTwnvU55+gwbH2pPzmG1LJyuhsQ8q4uc+ipOfOhh1pFYm/Ybzp1fRPeis5dXvdDMGwOJeyPqp8jIdF0gcW94rqshlmOjAzXoifETTHPjGZmIYC6rG1Xe6Hp2NQRz6K0io6tXHoYTr3uiF/ddLbLRTpw7vwh7QRhZrgrkdp/BaCKZh+3Tedm+NznD8/HTqXxvcnIG83gwoK7ydL2ymue6WkXnNSa/gRpG38HOY8P1GVmezbGcO78IfEG+Zl5a1NzuI1M9o/KwvTKd5TZ9Xc9Ehufxk4mnYPpGTaVzvB4/O1CDvikcXDtRg1R5Y+prdH7EdK5lPZFTf6XHfKaJ86o+htU8NuzUPwQ+3Dbc71R5hM1v9xGrdxHJck5XQ6a+sX7Tt8K1G8+nqM6wPVt8JsaPmkrf9clqXc9uRvRV0RE1k77DvZsenZY9l9U2Mzd8pusvw19gfJHVS2Xz3g8DiXvEa5Wm82awetNlgeA+1gz0TRHBXFxX8Umq/pu5qK3WGYy262G1LBDcx5rhuUoXPdneA8E8BkPUMT6cgeHg2oh7J3qqOOHM9WXgy4lrfHm4x7xzu0ewlukmLxsRJof7TJ/BzkKYOez8U27nRt20R7BW6Vi/E3PdjKrGrDsq3al/8lk902CO6clojCo/ceb6MvDl+DrLOdXLnHSf6sPAejJd58WarU8jA/NRj/FJqv7M3EzD5gzLY2RUeSOrbfUO1qLO9p3X6XSM39j6PZ/VMRfrseYRYXMMZ64vA18OvtTpMxLz272T5T1XeSZYXzc741N9DaxX2i4/9WfwHrHX1Luqb/KY2/Yzpp4RthZ1nc+w+o0fqbRTD6tn4eAe80iWj7nKy3Du/CLsBWEguI86rBnZPgYS90aWc7raBOt94kyWY6MD69GHkdHVNlR9ut6fqFVs+z1xtm3fDNdvfJO2q2c1zPl628NgvAx37i8BX1K1Rrb5iqjv/F7bznBYn+myqOhqCKtD0FP5nzhbx8ncjifOhEzPn0UFW/N11Hf+iGm3eoZKl+Uxx/TvNIx/4r7DB7AHO42f9t/GT8+/jdvz3/oVd/HT7/+n51sgmMvqG86dX8z0wqr65kV3Wqt5dDCaT7GZe3pG9GU9furZb3j6zE/0Y3qgZjPTtB4nPOXL+mBuM8e0HhlTvWKnFkIIcYQuWyGEeAFdtkII8QK6bIUQ4gV02QohxAvoshVCiBfQZSuEEC+gy1YIIV5Al60QQryALlshhHgBXbZCCPECumyFEOIFdNkKIcQL6LIVQogXOL5st/9ejOG25yfOdML6X6/9onPHuOWJHhNPz8B+p703vkr79HOx/NRc56fnO0+f47jbUwexPjFO6bxxxhQVXc1hNMhWf4M/XxYTmSdGpMpto+O0zuSn3hUb3/Z8iGnYYKm0mx6nnJzXQN8UG7b6jrFTPOgmkKxuwcL4Nv0qmB6TZnuOJ859y8kZnnhXkU3P7hOjI6tjbvJXsL5OdzK78kxzunDNp8jmPcGmVzwDGxuOnmw7xDjxIOiP6yq2sB7X4awunKw2xSfI5lRRwdY6XQajj/09HKaHU/XKwslqHl5niF4Pp6shmGc0E1G78d6SzbJcDAZWZ2y0ztazn/Bf3jhYBP3bNcNW3/GTs0+4PW+33/RmtZv5J6B/6pXVLVcFEveG57paBPOMZoLp9ymyeTHHnulUV/kwz/ZG1g4b0kVFpq0iA/PbdYfpWC3L1C/Wp/1TWN/TcHCNVJpKn8FqTVeF129A/9Qr1jN91aPTsn0wl9WdynsaT5H1xnANEvcVrA6ZPCc9nbWzG8bWTnpYHsNh1h0bHYaT+bNcpOvB+J9iM8u0U7jOwXVH5c+I9WyfhVPlnajtyLTRU/XI8pbb6J2uZkz1yFb/FNncmMO9rU8jkuUyWF1k5aoO6bC1SnfiZ9YdrA6JnmmfgZpq/WluZ0W/77fPs/VFnYdTrTO87j2miDC5TGNM3ps+hueresVW/xQ2N5sdc+z5fuo5KujT+MHts4uKWMu0rH+77mB1SPRMe8Nyp/EJsjkxJioN5qc+nXbqP30auM7I6k/6jaqH5bNAcJ/VYi4y1RHvV8WnqWZN+wzXMFrE9FOcQjlxQDdsU5v2Eaz72j6nmGA0kcmz6ennPDnHKT6rm8mcp9JgfjMj01Y5zPs6fhq4zsjqt37D850/q8Uc7rteCKtDzNP5Tnpu8P7x04izmbNkfU649SNjp82Dbmue63wOaio90yfyCQ/b03Xx85PgDFt3kZHpPJxqjWR5VrvZVz2dbe9IVfd8589qW70Ta9Me8dqm/9PEM+C8OHs6y1bfceONrDvZ8C4qqlrnQXBG5vFcVuvY6o3KszkDaqp1hWkYXSR6uh7sOTIwn2kYHxLzm33V09n2jlgdA2G8kc7T9e9qzpSzdRcVU30CvVmfmJtmbfUdN97IulM3vKpNeeaBUJPpN72Qrd7oPFM/q0dNtu/6THUW71PFxImm8zD9jEmH9VttlcPI8HxVN7LaRo971td52B4Rq3X1DuYM3h+joqp1no5TX8a601MPGnM3D3XTazvX9Se+ytPlt54NXQ+mv58PI5LlKlgtzssCiXvEa52mgunrVFrLZ1ERa7hnfZv+SFczpjpL1ifmqlmfOOOJpyLtZAO+NW7Pf+u/jdv5t/7buJ1/61f83fHm92fL2lENmYZvDshqmZkd7Hk+zU+do5tb1TZn3T7X7XvI/Lc9N0yzTs/y1jN0c37DGTJM/4bnCd6fKIQQfyG6bIUQ4gV02QohxAvoshVCiBfQZSuEEC+gy1YIIV5Al60QQryALlshhHgBXbZCCPECumyFEOIFdNkKIcQL6LIVQogX0GUrhBAvoMtWCCFe4OqyPfnXZhlP9PmJf5nmPPVcW27fA2qfOOvp7Ceo+p3M2T7HFFtOPBU3vdz71HmefK4NT7yDJ7jqtD1IpX+zj2mmyJh00/4TZDM2c58+883sLZl/+zxdnT3fzYyKE08F08s19snGRObxiGSaLraceJwbb+Sq0+YgnXb7QK63zykiWQ6Z6k7Xm+1xSzZvMztqsR8Gy1NanO2RUeU33PaedCdn3MzGyNj0iniu62G1LCJZbstpj5vZT5zbuerEHsR0WThdLdLVGCY/2x91tq7iU2DvbD3Nr/yRrhZhtabLwsG1EfeO593fRYXXoj6LjEwXYwvrQV3l2fayzykiMcdotlR+y1fhdZbo7+IE2pUNrCLS5Vi909UYJj/bH3XM+mm6mb7v5lf+SNSdBoL7WDM6/cRGyzD1u6lbbRsI7mPNqfIVWc+TOYyHhfVnOstVwcDqGOhO2dAncqzesHyseS6LjEwXo6LSMesnqeZX+QirM7oaEntWRE3mYTQR01TBkmkn/209YnqPCdRUeqYP4rO7QLI6EwwbrRG1mXfTz9jqO+hO7ME3uSxvdPmq9gbTubL4BLE37mM+A/VGpTO6muMaRhvJPDGHe1tjOLhGsjz6Yx33sZaBfapgcW38rMA6eqZwsnys46eBayPuGVjPtnd2ztjjpuctdKds6FO5WO/0WW3CPKeBxL2D+Wr9FN7TPjGcau14btI5Xc3JerJknpir+mLe1lV0ZHXPTd6niXOn+VivtFMPB2dO4eCaZePZ9s/0MVdpTmMDrc4ab3JZILjf1qqo6GodlQ/z1foJmDndfFaHdDWD7eOYpgvXIHHvYJ7RZJz4rHYaFVir1hFG1/kRVpdhXia2bD2ZPuae6HkK3Yl5EOM0h/tM73Q1hhu/eTGcmMf4FNibWUdOdBlsn4rME3NV3zi7io6s7rnJm/GEZ9o7mGc0HbFXFRVdzZjqGVtPpff8G2fooDvZUDYibM45rTGYfwqGSsf6b8E52do+u7NU/sim1mkzXD/1qfpintFkTLNu/ROZfpPDyKjyEdSd9ro9Q2Trm+ZP/bI6m2OgXTcH2R54U7N9FSewvko3+W/OhmCPau1MuazuVDV2Tofro2/aO5i3dRUdWK+0XY9Ys30WGbd53G97RWKvKiaihvFUbL1+Rg+E6ZVp2BzDmetf2KHbA29qlbbr0cH6XGefUyBZ7gTsMfXL6qx/8iJdn0jU4t7WMTIwz2gyvN7pNrVpnvPUvImT81SeTS+PGxg/zqr0nq/qTlZncwxnrn9hh5oui4pNrdJ2PTpY383c07Mh2GPql9VZf6xttB1dX2am57Iay22PzH9zHueJHg7bKz5DFR2ZhvFVsL5Ol52nIquxOYbUZc3+1vjp5//p+bdxe/5b/98et+/v1n8bt/Nv/WyccOb6l9OhFUy/qKk827Ox+km3nXvK7Rz0V702M1jt7bl/A937moKB1TFMvWLdzzkFajewnm3fyOS/6X/qvXsiIYQQFLpshRDiBXTZCiHEC+iyFUKIF9BlK4QQL6DLVgghXkCXrRBCvIAuWyGEeAFdtkII8QK6bIUQ4gV02QohxAvoshVCiBfQZSuEEC+gy1YIIV7gqy/b23/Ddko3lznTE+f+qWdHnj7Db3gmIT7FK99u+xExsWXjibOYqDitOVHj82J0TPWKOGOKjqm+5el+Qvwmfs23++SHdvrjrHxsv0439cjqbM6p9FWcwPiiBmd6bNjqhfgmfs23++SHdvrjrHxsvxt/pmFzDtZ8Xem7PhXT7Cwyuvw2hPh2Xv0Wdz+akx/U6Y+w8k3nywLBfawZVX3SIjHve1Y/sdFP2k/OFuLbePXb3f2YTn5opz/Oysf0Q021dmI9hpPVLCKdBtdIlY/EfgyTfluf9kJ8M69+m7sfz8kPyzxVdFT1yWegxtf22UUk5hhNhOlhTH0cRmeaKqr6BGqinvEL8S28+m32HyATDBvdaUQwV62RmJ96OlU/Y6Pv+iCsDomeaZ+BmmotxP8Dv/Ibzf7QTn+QlY/ph5rt/M3cjdawfBUMrM5wbfQwPUxzGkJ8M7/yG8z+sE5/gJVv6of1qLV9Fs62ZhHJNBZey6jyEVZnVDM3PSLm9RDi/5Ff+c1mf3CnP8zKx/RzTdRmXkZjMF6nyt+y6eta+/TwfSTLRaKf8RimY7VC/DT0N/X0i+0+Jhxcd7C6SOW7mcvmMm68CHq2/o3etdGT9Zj6Yr1aZ2y0QvwG6G+pfaHf+lKzc07O03mYfq6J2sxb9bM81jZex3t0faYeyIk281guRkZWy/ad36k0QvwmVt/St77U7JyT83SeqV+s4z7zxrpHZJPb9DCqfGSrs0/Wg3S+Lp/Vul5C/DbSb6p/if/G+Pbn1/l1+Yrfya/6Zn7yh8L0vpmfed/44T/xXOw5P/08b7wvIX4KfbuFEOIFdNkKIcQL6LIVQogX0GUrhBAvoMtWCCFeQJetEEK8gC5bIYR4AV22QgjxArpshRDiBXTZCiHEC+iyFUKIF9BlK4QQL6DLVgghXkCXrRBCfJx//vkP2QVeNlPVwbgAAAAASUVORK5CYII="
          },
          "features": [
            {
              "type": "TEXT_DETECTION"
            }
          ],
          "imageContext": {
            "languageHints": [
              "zh"
            ]
          }
        }
      ]
    }
    Content is just a base64-encoded string of the image.
    Code:
      "description": "一瞬泅发生的事情,让木叶的很多人都沿\n反映了过来, 看到不断进攻的忍者,和天空\n的信号弹, 他们才知道这是木叶遭受了攻击\n马 开始组织了起来。\n",
    
    Sucks but at least it did much better than tesseract and Microsoft's ocr api (These are the two most common backends used by free ocr programs). Unfortunately "DOCUMENT_TEXT_DETECTION" isnt supported with zh because that would improve the handling of the underline.

    Baidu's free ocr engine is probably worth trying out since they focus on Chinese characters rather than the Latin alphabet. Some info on that here https://ocr.space/blog/2015/09/baidu-ocr-api.html but I didnt test it out my self to see how it handled the sample.
     

    Attached Files:

    Chosen likes this.
  12. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
    Thanks for the help i will try it
    Edit: Tried it but it is about 60-70% not 100% so.... Anyway thanks for trying to help at least for now i have got something that can make about 70% better than what i was using
     
    Last edited: Oct 21, 2017
  13. erowarrior

    erowarrior Well-Known Member

    Joined:
    Dec 11, 2015
    Messages:
    1,232
    Likes Received:
    771
    Reading List:
    Link
    I say that just use chrome and use copy app to copy from source
     
  14. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
    I tried it but it is still the same
     
  15. GekkoZockt

    GekkoZockt Well-Known Member

    Joined:
    Apr 25, 2017
    Messages:
    43
    Likes Received:
    32
    Reading List:
    Link
    I'd recommend downloading the html code and write a script that extracts the text instead of using ocr.
     
    Chosen likes this.
  16. lnv

    lnv ✪ Well-Known Hypocrite

    Joined:
    Jan 24, 2017
    Messages:
    7,702
    Likes Received:
    9,044
    Reading List:
    Link
    if you have a page no need to go OCR, try printing it to XPS or PDF from your browser (you can even copy and paste sometimes from the print preview). Or save the webpage as. I assure you that if it is being displayed on a page, it could be broken.
     
    Chosen likes this.
  17. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
    Well, it is really image not text and that's my problem because if it was text i could just use google translate addon or whatever that translates the page directly
    Well, like i said above it really is image on the site so...
     
    GekkoZockt likes this.
  18. GekkoZockt

    GekkoZockt Well-Known Member

    Joined:
    Apr 25, 2017
    Messages:
    43
    Likes Received:
    32
    Reading List:
    Link
    Well they make it difficult with that lines. You can't even remove the lines without damaging the words. Hmm...
    I'm afraid to tell you that there is no way to get 100% or even 90% correct recognition of characters. Even those fancy API's won't solve the problem. Your best bet is to contact the author and ask for the raw's. If you can delete the lines without damaging the characters you'll also get really good results but that would be a lot of work.
     
    Chosen likes this.
  19. Jeebus

    Jeebus Well-Known Member

    Joined:
    Jun 20, 2017
    Messages:
    904
    Likes Received:
    780
    Reading List:
    Link
    So, I went to the website. It's only using JavaScript to block you from copying. You should be able to use a script blocker or Just Read to skirt around that problem and copy the text. From there, you just paste it into a translator, and you should be set. No need for taking screenshots and using OCR.
     
    Chosen likes this.
  20. Chosen

    Chosen Well-Known Member

    Joined:
    Oct 20, 2015
    Messages:
    154
    Likes Received:
    32
    Reading List:
    Link
    Again VIP chapters not normal