Resource Mass Search and Replace with Notepad++

Discussion in 'Translator's Corner' started by miscalculation2, Oct 17, 2022.

  1. miscalculation2

    miscalculation2 Active Member

    Joined:
    Jan 19, 2020
    Messages:
    8
    Likes Received:
    2
    Reading List:
    Link
    Post-editors will probably know the feeling of having to replace the same terms over and over again (often names of cities or people). You can of course do search & replace for each individual term but that takes time and is not feasible if you want to do a mass search & replace.

    My current solution is to use Notepad++ and regex, with thanks to this poster on Stackexchange.
    I don't think it's the most elegant solution (so if anyone else has a more refined approach do tell). Still, it may be of use to someone out here so I'll just post it here.

    Step 1 - terminology list
    While not absolutely necessary, I find this overview quite handy (column 1 = terms to be replaced, column 2 = new terms).

    [​IMG]


    Step 2 - convert into regex string
    Use whatever you like to create a regex string that can be used by Notepad++. As for me, I wrote a trashy Python script that reads the Excel terminology list and outputs a regex string in the correct format.

    [​IMG]

    Step 3 - paste regex string into Notepad++
    Paste your MTL text into Notepad++. Open search & replace window and paste your two regex strings in it. Don't forget to enable regular expressions. That's basically it.

    [​IMG]

    Python code for people who don't code

    Code:
    import pandas as pd
    import openpyxl
    
    import pandas as pd
    file = 'tab_search_replace.xlsx'
    data = pd.ExcelFile(file)
    #print(data.sheet_names) #this returns the all the sheets in the excel file
    
    df = data.parse('Sheet1')
    
    ps = openpyxl.load_workbook('tab_search_replace.xlsx')
    sheet = ps['Sheet1']
    
    bad_words = []
    good_words = []
    
    for row in range(2, sheet.max_row + 1):
    
        bad_word = sheet['A' + str(row)].value
        good_word = sheet['B' + str(row)].value
    
        bad_words.append(bad_word)
        good_words.append(good_word)
    
    find_string = ""
    replace_string = ""
    
    for bad_word in bad_words:
        find_string += '('
        find_string += bad_word
        find_string += ')'
        find_string += '|'
    
    counter = 1
    for good_word in good_words:
        replace_string += '('
        replace_string += '?'
        replace_string += str(counter)
        replace_string += good_word
        replace_string += ')'
        counter += 1
    
    find_string = find_string[:-1]
    
    print(find_string)
    print()
    print(replace_string)
     
  2. Eisvarith Devirea

    Eisvarith Devirea [Mza {Valkyrie Awakens} Mya]

    Joined:
    May 20, 2019
    Messages:
    179
    Likes Received:
    161
    Reading List:
    Link
    Um. don't you know sublime text? the meaning and its very existence? you can download packages for it too..... and its much more simple.
    I'm genuinely confused. does anyone not use sublime text or atom?
     
  3. Alvastar

    Alvastar Well-Known Member

    Joined:
    Nov 15, 2015
    Messages:
    645
    Likes Received:
    619
    Reading List:
    Link
    I was just about to comment about this one. Sublime Text is simply sublime when it comes to making this stuff less of a hassle.

    It's quite easy on the memory of your system, too.
     
  4. Eisvarith Devirea

    Eisvarith Devirea [Mza {Valkyrie Awakens} Mya]

    Joined:
    May 20, 2019
    Messages:
    179
    Likes Received:
    161
    Reading List:
    Link
    I wholeheartedly agree to that, couldn't have said it better myself.

    sublime text is just simply that sublime.
     
    miscalculation2 likes this.
  5. dahlys

    dahlys Active Member

    Joined:
    Jul 22, 2018
    Messages:
    2
    Likes Received:
    1
    Reading List:
    Link
    If you know python why not just do the text replacement in python? Import the excel file into a list/array, import the text file you want modified, then replace. You can then package into an exe file to share if needed.
     
    Last edited: Oct 18, 2022
    miscalculation2 likes this.
  6. NorthBladeTL

    NorthBladeTL Well-Known Member

    Joined:
    Aug 13, 2021
    Messages:
    104
    Likes Received:
    122
    Reading List:
    Link
    Try this python code (probably not the most cpu efficient method but code is short so whatever):
    Code:
    import pandas as pd
    from tkinter import messagebox
    from tkinter.filedialog import askopenfilename
    
    #choose excel file dialog
    messagebox.showinfo('Mass Search and Replace', 'Choose excel dictionary (.xlsx)')
    dictionary = askopenfilename()
    
    #choose text file dialog
    messagebox.showinfo('Mass Search and Replace', 'Choose text file (.txt)')
    txtfile = askopenfilename()
    
    #read excel file
    df = pd.read_excel(dictionary)
    
    #get each column words
    wrongWords = df['Bad'].values.tolist()
    rightWords = df['Good'].values.tolist()
    
    #read text file
    with open(txtfile, 'r', encoding='utf8') as file :
      filedata = file.read()
    
    #replace wrong words with right words
    repeats = len(wrongWords)
    for x in range(repeats):
       filedata = filedata.replace(wrongWords[x], rightWords[x])
    
    #rewrite the file
    with open(txtfile, 'w', encoding='utf8') as file:
      file.write(filedata)
    
    I added tkinter for dialog boxes to make choosing files easier, but you can change it to normal file location input if you prefer that.

    To people who don't have python:
    1. Install python from python.org, remember to tick "add python to PATH"
    2. Open command prompt
    3. Type: pip install openpyxl pandas tk
    4. Create MassSearchAndReplace.py using your favorite text editor (e.g. notepad)
    5. Copy the code into MassSearchAndReplace.py
    6. Double-click MassSearchAndReplace.py and follow onscreen instructions
    7. Excel file should be laid out exactly like the original post, otherwise replace the column headings "Bad" and "Good" in the code with whatever you wrote.
     
    Last edited: Oct 19, 2022
    miscalculation2 likes this.