https://chrome.google.com/webstore/detail/just-read/dgmanlpmmkibanfdgjocnabmcaclkmod when using this you can copy e past, there are other extension too....
Again VIP chapters this website is much harder than qidian etc... I really hate this website but what can i do it is pretty attractive for fan fictions lover etc..
I’m confused. I opened the attached image file in another tab and the link was this:http://forum.novelupdates.com/attachments/1395929-gif.23773/ Seems clear enough to me or is it that the characters are pixelly so the OCR is not working?
Well, idk Thanks for the clear picture but still same problem maybe it is really because there's underlines
WTF is that? It show a blue image to me but when i zoom it become a chapter o-o AWESOME Edit: here PNG file o-o
Overall ocr sucks but the best result I could get with a sample paragraph from this image was using google's cloud vision api. https://cloud.google.com/vision/docs/reference/rest/v1/images/annotate#AnnotateImageRequest with the following request body Spoiler: Request body Code: { "requests": [ { "image": { "content": "iVBORw0KGgoAAAANSUhEUgAAAVsAAACCCAYAAADoiWu+AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAABD+SURBVHhe7Zzbjis7DkP3///0mRFwBHAEXSi7Ut2ZzQUIsSVSclUnfuw//wghhPg4umyFEOIFdNkKIcQL6LIVQogX0GUrhBAvoMtWCCFeQJetEEK8gC5bIYR4AV22QgjxArpshRDiBXTZAn/+/PmfyKjyNzzRc9vj6ec78T19hid58z10MP2eeo9Pnf0n+zw1+xP83pN9GPujxGBBbewRIyPmp33GiQep9Ns+zonv6TNs6ea8+R46mH6VZnsW19tnFQysbuKkT/T43j5jRKpcjFPOnV8E87JivtIZXQ25mTXNYHpUdNqpdhoZno/aLD5F1ft05sbnz+aRMfXr6pMXQe3pWQzTVLGF9cQ5Hl7DTyfunUlX+RjOnV8EvqBqbXQ1I8t1MD22PZ3TPqbLwulqHawOqTwnvU55+gwbH2pPzmG1LJyuhsQ8q4uc+ipOfOhh1pFYm/Ybzp1fRPeis5dXvdDMGwOJeyPqp8jIdF0gcW94rqshlmOjAzXoifETTHPjGZmIYC6rG1Xe6Hp2NQRz6K0io6tXHoYTr3uiF/ddLbLRTpw7vwh7QRhZrgrkdp/BaCKZh+3Tedm+NznD8/HTqXxvcnIG83gwoK7ydL2ymue6WkXnNSa/gRpG38HOY8P1GVmezbGcO78IfEG+Zl5a1NzuI1M9o/KwvTKd5TZ9Xc9Ehufxk4mnYPpGTaVzvB4/O1CDvikcXDtRg1R5Y+prdH7EdK5lPZFTf6XHfKaJ86o+htU8NuzUPwQ+3Dbc71R5hM1v9xGrdxHJck5XQ6a+sX7Tt8K1G8+nqM6wPVt8JsaPmkrf9clqXc9uRvRV0RE1k77DvZsenZY9l9U2Mzd8pusvw19gfJHVS2Xz3g8DiXvEa5Wm82awetNlgeA+1gz0TRHBXFxX8Umq/pu5qK3WGYy262G1LBDcx5rhuUoXPdneA8E8BkPUMT6cgeHg2oh7J3qqOOHM9WXgy4lrfHm4x7xzu0ewlukmLxsRJof7TJ/BzkKYOez8U27nRt20R7BW6Vi/E3PdjKrGrDsq3al/8lk902CO6clojCo/ceb6MvDl+DrLOdXLnHSf6sPAejJd58WarU8jA/NRj/FJqv7M3EzD5gzLY2RUeSOrbfUO1qLO9p3X6XSM39j6PZ/VMRfrseYRYXMMZ64vA18OvtTpMxLz272T5T1XeSZYXzc741N9DaxX2i4/9WfwHrHX1Luqb/KY2/Yzpp4RthZ1nc+w+o0fqbRTD6tn4eAe80iWj7nKy3Du/CLsBWEguI86rBnZPgYS90aWc7raBOt94kyWY6MD69GHkdHVNlR9ut6fqFVs+z1xtm3fDNdvfJO2q2c1zPl628NgvAx37i8BX1K1Rrb5iqjv/F7bznBYn+myqOhqCKtD0FP5nzhbx8ncjifOhEzPn0UFW/N11Hf+iGm3eoZKl+Uxx/TvNIx/4r7DB7AHO42f9t/GT8+/jdvz3/oVd/HT7/+n51sgmMvqG86dX8z0wqr65kV3Wqt5dDCaT7GZe3pG9GU9furZb3j6zE/0Y3qgZjPTtB4nPOXL+mBuM8e0HhlTvWKnFkIIcYQuWyGEeAFdtkII8QK6bIUQ4gV02QohxAvoshVCiBfQZSuEEC+gy1YIIV5Al60QQryALlshhHgBXbZCCPECumyFEOIFdNkKIcQL6LIVQogXOL5st/9ejOG25yfOdML6X6/9onPHuOWJHhNPz8B+p703vkr79HOx/NRc56fnO0+f47jbUwexPjFO6bxxxhQVXc1hNMhWf4M/XxYTmSdGpMpto+O0zuSn3hUb3/Z8iGnYYKm0mx6nnJzXQN8UG7b6jrFTPOgmkKxuwcL4Nv0qmB6TZnuOJ859y8kZnnhXkU3P7hOjI6tjbvJXsL5OdzK78kxzunDNp8jmPcGmVzwDGxuOnmw7xDjxIOiP6yq2sB7X4awunKw2xSfI5lRRwdY6XQajj/09HKaHU/XKwslqHl5niF4Pp6shmGc0E1G78d6SzbJcDAZWZ2y0ztazn/Bf3jhYBP3bNcNW3/GTs0+4PW+33/RmtZv5J6B/6pXVLVcFEveG57paBPOMZoLp9ymyeTHHnulUV/kwz/ZG1g4b0kVFpq0iA/PbdYfpWC3L1C/Wp/1TWN/TcHCNVJpKn8FqTVeF129A/9Qr1jN91aPTsn0wl9WdynsaT5H1xnANEvcVrA6ZPCc9nbWzG8bWTnpYHsNh1h0bHYaT+bNcpOvB+J9iM8u0U7jOwXVH5c+I9WyfhVPlnajtyLTRU/XI8pbb6J2uZkz1yFb/FNncmMO9rU8jkuUyWF1k5aoO6bC1SnfiZ9YdrA6JnmmfgZpq/WluZ0W/77fPs/VFnYdTrTO87j2miDC5TGNM3ps+hueresVW/xQ2N5sdc+z5fuo5KujT+MHts4uKWMu0rH+77mB1SPRMe8Nyp/EJsjkxJioN5qc+nXbqP30auM7I6k/6jaqH5bNAcJ/VYi4y1RHvV8WnqWZN+wzXMFrE9FOcQjlxQDdsU5v2Eaz72j6nmGA0kcmz6ennPDnHKT6rm8mcp9JgfjMj01Y5zPs6fhq4zsjqt37D850/q8Uc7rteCKtDzNP5Tnpu8P7x04izmbNkfU649SNjp82Dbmue63wOaio90yfyCQ/b03Xx85PgDFt3kZHpPJxqjWR5VrvZVz2dbe9IVfd8589qW70Ta9Me8dqm/9PEM+C8OHs6y1bfceONrDvZ8C4qqlrnQXBG5vFcVuvY6o3KszkDaqp1hWkYXSR6uh7sOTIwn2kYHxLzm33V09n2jlgdA2G8kc7T9e9qzpSzdRcVU30CvVmfmJtmbfUdN97IulM3vKpNeeaBUJPpN72Qrd7oPFM/q0dNtu/6THUW71PFxImm8zD9jEmH9VttlcPI8HxVN7LaRo971td52B4Rq3X1DuYM3h+joqp1no5TX8a601MPGnM3D3XTazvX9Se+ytPlt54NXQ+mv58PI5LlKlgtzssCiXvEa52mgunrVFrLZ1ERa7hnfZv+SFczpjpL1ifmqlmfOOOJpyLtZAO+NW7Pf+u/jdv5t/7buJ1/61f83fHm92fL2lENmYZvDshqmZkd7Hk+zU+do5tb1TZn3T7X7XvI/Lc9N0yzTs/y1jN0c37DGTJM/4bnCd6fKIQQfyG6bIUQ4gV02QohxAvoshVCiBfQZSuEEC+gy1YIIV5Al60QQryALlshhHgBXbZCCPECumyFEOIFdNkKIcQL6LIVQogX0GUrhBAvoMtWCCFe4OqyPfnXZhlP9PmJf5nmPPVcW27fA2qfOOvp7Ceo+p3M2T7HFFtOPBU3vdz71HmefK4NT7yDJ7jqtD1IpX+zj2mmyJh00/4TZDM2c58+883sLZl/+zxdnT3fzYyKE08F08s19snGRObxiGSaLraceJwbb+Sq0+YgnXb7QK63zykiWQ6Z6k7Xm+1xSzZvMztqsR8Gy1NanO2RUeU33PaedCdn3MzGyNj0iniu62G1LCJZbstpj5vZT5zbuerEHsR0WThdLdLVGCY/2x91tq7iU2DvbD3Nr/yRrhZhtabLwsG1EfeO593fRYXXoj6LjEwXYwvrQV3l2fayzykiMcdotlR+y1fhdZbo7+IE2pUNrCLS5Vi909UYJj/bH3XM+mm6mb7v5lf+SNSdBoL7WDM6/cRGyzD1u6lbbRsI7mPNqfIVWc+TOYyHhfVnOstVwcDqGOhO2dAncqzesHyseS6LjEwXo6LSMesnqeZX+QirM7oaEntWRE3mYTQR01TBkmkn/209YnqPCdRUeqYP4rO7QLI6EwwbrRG1mXfTz9jqO+hO7ME3uSxvdPmq9gbTubL4BLE37mM+A/VGpTO6muMaRhvJPDGHe1tjOLhGsjz6Yx33sZaBfapgcW38rMA6eqZwsnys46eBayPuGVjPtnd2ztjjpuctdKds6FO5WO/0WW3CPKeBxL2D+Wr9FN7TPjGcau14btI5Xc3JerJknpir+mLe1lV0ZHXPTd6niXOn+VivtFMPB2dO4eCaZePZ9s/0MVdpTmMDrc4ab3JZILjf1qqo6GodlQ/z1foJmDndfFaHdDWD7eOYpgvXIHHvYJ7RZJz4rHYaFVir1hFG1/kRVpdhXia2bD2ZPuae6HkK3Yl5EOM0h/tM73Q1hhu/eTGcmMf4FNibWUdOdBlsn4rME3NV3zi7io6s7rnJm/GEZ9o7mGc0HbFXFRVdzZjqGVtPpff8G2fooDvZUDYibM45rTGYfwqGSsf6b8E52do+u7NU/sim1mkzXD/1qfpintFkTLNu/ROZfpPDyKjyEdSd9ro9Q2Trm+ZP/bI6m2OgXTcH2R54U7N9FSewvko3+W/OhmCPau1MuazuVDV2Tofro2/aO5i3dRUdWK+0XY9Ys30WGbd53G97RWKvKiaihvFUbL1+Rg+E6ZVp2BzDmetf2KHbA29qlbbr0cH6XGefUyBZ7gTsMfXL6qx/8iJdn0jU4t7WMTIwz2gyvN7pNrVpnvPUvImT81SeTS+PGxg/zqr0nq/qTlZncwxnrn9hh5oui4pNrdJ2PTpY383c07Mh2GPql9VZf6xttB1dX2am57Iay22PzH9zHueJHg7bKz5DFR2ZhvFVsL5Ol52nIquxOYbUZc3+1vjp5//p+bdxe/5b/98et+/v1n8bt/Nv/WyccOb6l9OhFUy/qKk827Ox+km3nXvK7Rz0V702M1jt7bl/A937moKB1TFMvWLdzzkFajewnm3fyOS/6X/qvXsiIYQQFLpshRDiBXTZCiHEC+iyFUKIF9BlK4QQL6DLVgghXkCXrRBCvIAuWyGEeAFdtkII8QK6bIUQ4gV02QohxAvoshVCiBfQZSuEEC+gy1YIIV7gqy/b23/Ddko3lznTE+f+qWdHnj7Db3gmIT7FK99u+xExsWXjibOYqDitOVHj82J0TPWKOGOKjqm+5el+Qvwmfs23++SHdvrjrHxsv0439cjqbM6p9FWcwPiiBmd6bNjqhfgmfs23++SHdvrjrHxsvxt/pmFzDtZ8Xem7PhXT7Cwyuvw2hPh2Xv0Wdz+akx/U6Y+w8k3nywLBfawZVX3SIjHve1Y/sdFP2k/OFuLbePXb3f2YTn5opz/Oysf0Q021dmI9hpPVLCKdBtdIlY/EfgyTfluf9kJ8M69+m7sfz8kPyzxVdFT1yWegxtf22UUk5hhNhOlhTH0cRmeaKqr6BGqinvEL8S28+m32HyATDBvdaUQwV62RmJ96OlU/Y6Pv+iCsDomeaZ+BmmotxP8Dv/Ibzf7QTn+QlY/ph5rt/M3cjdawfBUMrM5wbfQwPUxzGkJ8M7/yG8z+sE5/gJVv6of1qLV9Fs62ZhHJNBZey6jyEVZnVDM3PSLm9RDi/5Ff+c1mf3CnP8zKx/RzTdRmXkZjMF6nyt+y6eta+/TwfSTLRaKf8RimY7VC/DT0N/X0i+0+Jhxcd7C6SOW7mcvmMm68CHq2/o3etdGT9Zj6Yr1aZ2y0QvwG6G+pfaHf+lKzc07O03mYfq6J2sxb9bM81jZex3t0faYeyIk281guRkZWy/ad36k0QvwmVt/St77U7JyT83SeqV+s4z7zxrpHZJPb9DCqfGSrs0/Wg3S+Lp/Vul5C/DbSb6p/if/G+Pbn1/l1+Yrfya/6Zn7yh8L0vpmfed/44T/xXOw5P/08b7wvIX4KfbuFEOIFdNkKIcQL6LIVQogX0GUrhBAvoMtWCCFeQJetEEK8gC5bIYR4AV22QgjxArpshRDiBXTZCiHEC+iyFUKIF9BlK4QQL6DLVgghXkCXrRBCfJx//vkP2QVeNlPVwbgAAAAASUVORK5CYII=" }, "features": [ { "type": "TEXT_DETECTION" } ], "imageContext": { "languageHints": [ "zh" ] } } ] } Content is just a base64-encoded string of the image. Spoiler: Result Code: "description": "一瞬泅发生的事情,让木叶的很多人都沿\n反映了过来, 看到不断进攻的忍者,和天空\n的信号弹, 他们才知道这是木叶遭受了攻击\n马 开始组织了起来。\n", Sucks but at least it did much better than tesseract and Microsoft's ocr api (These are the two most common backends used by free ocr programs). Unfortunately "DOCUMENT_TEXT_DETECTION" isnt supported with zh because that would improve the handling of the underline. Baidu's free ocr engine is probably worth trying out since they focus on Chinese characters rather than the Latin alphabet. Some info on that here https://ocr.space/blog/2015/09/baidu-ocr-api.html but I didnt test it out my self to see how it handled the sample.
Thanks for the help i will try it Edit: Tried it but it is about 60-70% not 100% so.... Anyway thanks for trying to help at least for now i have got something that can make about 70% better than what i was using
I'd recommend downloading the html code and write a script that extracts the text instead of using ocr.
if you have a page no need to go OCR, try printing it to XPS or PDF from your browser (you can even copy and paste sometimes from the print preview). Or save the webpage as. I assure you that if it is being displayed on a page, it could be broken.
Well, it is really image not text and that's my problem because if it was text i could just use google translate addon or whatever that translates the page directly Well, like i said above it really is image on the site so...
Well they make it difficult with that lines. You can't even remove the lines without damaging the words. Hmm... I'm afraid to tell you that there is no way to get 100% or even 90% correct recognition of characters. Even those fancy API's won't solve the problem. Your best bet is to contact the author and ask for the raw's. If you can delete the lines without damaging the characters you'll also get really good results but that would be a lot of work.
So, I went to the website. It's only using JavaScript to block you from copying. You should be able to use a script blocker or Just Read to skirt around that problem and copy the text. From there, you just paste it into a translator, and you should be set. No need for taking screenshots and using OCR.