Resource Blob Raw Text Cleaner + Bonus Google Docs Version

Discussion in 'Translator's Corner' started by Blob Translations, Jan 14, 2018.

  1. Blob Translations

    Blob Translations Group of the cutest and coolest blobs around

    Joined:
    Oct 7, 2017
    Messages:
    60
    Likes Received:
    417
    Reading List:
    Link
    This is a raw cleaner from our group which has been out for a few days and shared with a small group during the past few days to help with bugs and development. Since we feel like it's ready, we decided to release it to the public.

    If you would like to give it a try, please head over to https://www.blobtranslations.com/blob-raw-text-cleaner/

    Features
    • Bold, italics, underlining and other HTML tags are kept intact
    • Optional one button to copy to clipboard
    • Excessive lines/paragraphs are automatically fixed
    • Your format is saved when copied to other programs such as Google docs
    • Clean and easy to use layout

    blobraw-1.jpg blobraw-2.jpg

    If you want the Google Docs version, please check out the next post in this thread. The Google Docs addon was created by @yuzuki.
     
    Last edited: Jan 14, 2018
    iampsyx, Guan Zhong, Dolfitos and 8 others like this.
  2. yuzuki

    yuzuki [sweet night] [plum blossoms]

    Joined:
    Nov 24, 2015
    Messages:
    662
    Likes Received:
    4,875
    Reading List:
    Link
    View the manual here: https://yuzukicode.blogspot.com/2018/01/google-docs-blobtl-raw-cleaner-add-on.html

    Introduction
    This is a simple Google Docs add-on that will highlight all lines containing non-English text.

    This app works through a whitelist/blacklist strategy. Every character in a line is converted to its UTF-16 code and checked against a whitelist/blacklist. If the character is on the whitelist, all instances of this character are ignored. If the character is on the blacklist, the entire line is marked as "Raw Text". The whitelist takes precedence over the blacklist.

    Currently, the blacklist contains the unicode blocks for hiragana, katakana, hangul syllables, and the unicode block for 20,000 of the most common ideographs used in Chinese and Japanese. The unicode blocks containing East-Asian punctuation/symbols is NOT on the blacklist by default. Here are the exact blacklist contents:
    The whitelist is currently empty.

    To suggest things to be added to the whitelist/blacklist, you can contact me (@yuzuki on NUF). Alternatively, you can get the source code (which is open-source) yourself and run your own version.

    This add-on is still actively being developed, and I'm currently working an in-app menu that will allow you to define your own personal whitelist/blacklist. Suggestions and bug reports are always very welcome.


    Installation:
    Go to BlobTL Raw Cleaner page in Google Docs add-on store (it's free).

    Click the install button and follow the prompts.

    The first time I installed it, an empty google doc popped up. If the installation worked properly, you should see something like this:

    [​IMG]


    You can close this empty doc.

    Usage:
    Go to the google document you are working on.

    Go to the Add-ons section and select the BlobTL Raw Cleaner.

    I recommend choosing Highlight Raw Text first to check what the script will actually delete.


    [​IMG]

    [​IMG]




    If you are satisfied with the selection, you can delete the selection with the [Backspace] or [Delete] key on your keyboard.

    If you accidentally delete things that you did not want to, you can press Ctrl+Z (undo) to undo the option.

    Ignoring lines:
    Suppose you have a line that you don't want the script to delete. For instance, a paragraph that contains a kaomoji or some other weird character that isn't in the typical western character set.

    You can tell the script to ignore a line by putting two hash tags (##) at the beginning of a line.

    Then, the script will ignore that line.

    [​IMG]

    [​IMG]


    Marking lines for the highlighter:
    You may occasionally encounter situations where you want to manually mark lines for highlighting. For example, some lines in the raw may only contain punctuation.

    You can manually mark a line for deletion by placing two percent signs (%%) at the start of the line.

    [​IMG]

    [​IMG]



    Uninstalling:
    Go to Add-ons > Manage add-ons...

    Then click Manage > Remove

    [​IMG]

    [​IMG]

    Change log:
    • 5 - Fixed bug with shift+enter - 2018-01-14 14:49
    • 4 - Added a line repairing utility - 2018-01-14 01:21
    • 3 - Fixed a bug in the selector - 2018-01-13 23:32
    • 2 - Added a raw selector - 2018-01-13 22:50
    • 1 - Initial functional version - 2018-01-10 13:54

    Credits, Source Code, and License:
    The source code for this add-on is freely available here.

    MIT License. Please credit Blob Translations if you modify or redistribute it.

    Special thanks to @Tony for reviewing the add-on and giving comments, and @BlancFrost for testing. Also, this wouldn't exist if @Action didn't request the feature in the first place.

    Finally, you can find similar tools at these following places:
     
    Last edited: Jan 14, 2018
    iampsyx, Guan Zhong, RizYun and 8 others like this.
  3. yuzuki

    yuzuki [sweet night] [plum blossoms]

    Joined:
    Nov 24, 2015
    Messages:
    662
    Likes Received:
    4,875
    Reading List:
    Link
    Updated the Google Docs add-on so that it covers shift-enter now. It should work in pretty much all circumstances now.

    You can read the manual, see the source code, and see the extra stuffs here: https://yuzukicode.blogspot.com/2018/01/google-docs-blobtl-raw-cleaner-add-on.html
    Version 5 Changes:
    • Updated the highlighter so that now it's able to pick up mini-line-breaks that were made with shift+enter
    • Added functionality so that lines marked with (%%) at the beginning will be marked for highlighting.
    • Removed the "Repair Lines" menu option because it's irrelevant now
    • Removed the "Delete Raw Text" menu option because I think it's better practice to encourage people to highlight before deleting.
    To-do:
    • An in-app menu so that users can have their personal whitelist/blacklist
    • A menu setting for how "strict" the blacklist should be. Currently, it's 100% strict (one blacklisted character will mark the line for highlighting). Tony suggested it may be nice to allow the user to change the strictness (e.g. to 50%).
     
    iampsyx, Guan Zhong, Dolfitos and 5 others like this.
  4. Elawn

    Elawn 『Binge Reader』

    Joined:
    Dec 29, 2016
    Messages:
    781
    Likes Received:
    1,142
    Reading List:
    Link
    Damn. :sweating_profusely:
     
  5. LysUltima

    LysUltima Riichi! Tsumo! Toitoi! Suuankou!?

    Joined:
    Aug 6, 2017
    Messages:
    2,144
    Likes Received:
    5,554
    Reading List:
    Link
    Finally doesn't delete my whole chapter lol
     
  6. m7vpc

    m7vpc Well-Known Member

    Joined:
    Jun 2, 2017
    Messages:
    566
    Likes Received:
    735
    Reading List:
    Link
    Time to earn some $$ by being a MTL translator.
     
  7. IMM

    IMM 『X.O.X.O』

    Joined:
    Dec 13, 2016
    Messages:
    480
    Likes Received:
    462
    Reading List:
    Link
    I feel like all MTL are like this >.>
     
  8. Guan Zhong

    Guan Zhong Well-Known Member

    Joined:
    Jan 12, 2017
    Messages:
    801
    Likes Received:
    2,258
    Reading List:
    Link
    Works really well, thanks!
     
  9. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,355
    Reading List:
    Link
    My love for blobs have increased tenfold! Thank you! <3
     
    iampsyx and AliceShiki like this.
  10. Blob Translations

    Blob Translations Group of the cutest and coolest blobs around

    Joined:
    Oct 7, 2017
    Messages:
    60
    Likes Received:
    417
    Reading List:
    Link
    Thanks for the feedback :blobokhand:
     
  11. yuzuki

    yuzuki [sweet night] [plum blossoms]

    Joined:
    Nov 24, 2015
    Messages:
    662
    Likes Received:
    4,875
    Reading List:
    Link
    Another Update to the Google Docs add-on so that it includes a sidebar and the ability to adjust your personal/custom blacklist and whitelist.

    You can read the manual, see the source code, and see the extra stuffs here: https://yuzukicode.blogspot.com/2018/01/google-docs-blobtl-raw-cleaner-add-on.html
    Version 6 Changes:
    • Added a sidebar
    • You can now check/uncheck the rules that you want on your blacklist or whitelist
    • You can now add custom characters to your blacklist whitelist
    • You can now change the sensitivity of the blacklist. A blacklist with 100% sensitivity means that 1 blacklisted character in a line will mark the entire line for highlighting. A blacklist with 50% sensitivity means that 50% of the characters must be blacklisted in order for the line to be highlighted.
    Please let me know if I broke anything while making this update........

    The add-on may ask you to authorize the app the first time you use it, but it should only ask you to do that once.

    New Feature:

    Changing the settings:
    If you would like to change the behavior for the add-on, you can go to: BlobTL Raw Cleaner > Settings.

    [​IMG]

    Here, you can modify the rules that the highlighter uses to highlight lines.

    If the highlighter is not working how you want it to, you can go to: BlobTL Raw Cleaner > Highlight Raw Characters to see which characters in the text are triggering the app.

    [​IMG]
    For example, in this instance, the づ character in the kaomoji is causing the entire line to be blacklisted. To prevent this line from being blacklisted, there are four possible strategies:

    1. Add a "##" at the beginning of the line so the entire line is ignored.
    2. Add the "づ" character to the custom whitelist.
    3. Lower the blacklist sensitivity from 100%.
    4. Uncheck Hiragana/Katakana from the blacklist settings (if you're not a Japanese translator)

    Option (1) was previously discussed.

    Option (2) involves opening the Settings menu and adding the offending character to the Whitelisted Characters text box. Each character that you put into the text box will be added to the whitelist. Any extra spaces and commas are ignored.

    Option (3) involves opening the Settings menu and changing the slider on the Blacklist Sensitivity. 100% sensitivity is the strictest option (default). This means that if there is a single character in a line that is blacklisted, the entire line will be marked for highlighting.

    Sometimes, this may be too strict for some people's purposes. Perhaps you would only like the line to be highlighted if 50% of the characters are Chinese/Japanese/Korean. If this is the case, you should change the slider to 50% sensitivity.

    Please remember to press the save button after modifying any settings. Keep in mind that your User Settings are shared across all of your google documents. Saving the settings in one document will carry over to all the other documents you use.
     
    Last edited: Jan 18, 2018
  12. Kiki0246

    Kiki0246 Top Notch Fujoshi, Owner of ISO TLs

    Joined:
    Sep 24, 2016
    Messages:
    306
    Likes Received:
    463
    Reading List:
    Link
    Thank you so much Blob Translations! Seriously, the part I hate most is cleaning out the raws >.<"
     
    Last edited: Jan 19, 2018
    AliceShiki likes this.
  13. Blob Translations

    Blob Translations Group of the cutest and coolest blobs around

    Joined:
    Oct 7, 2017
    Messages:
    60
    Likes Received:
    417
    Reading List:
    Link
    I'm a bit late to reply but you're welcome!