Question ChatGPT API assisted machine translation with python. Looking for opinions.

Discussion in 'Translator's Corner' started by antimuon792, Mar 26, 2023.

?

What's your opinion on the quality?

  1. Excelent

    5.6%
  2. Good enough

    44.4%
  3. Meh

    27.8%
  4. Bad

    11.1%
  5. I wouldn't read this even it was the only text remaining in the entire world.

    11.1%
  1. antimuon792

    antimuon792 Member

    Joined:
    Mar 23, 2023
    Messages:
    10
    Likes Received:
    2
    Reading List:
    Link
    Heya! I wanted to test ChatGPT's capabilities as a dedicated JP-to-EN translation tool for web novels. So I made a small python program leveraging the API.

    I tested my script using the second chapter from the novel どうも、前世で殺戮の魔道具を作っていた子爵令嬢です by 優木凛々

    Original web novel chapter (soyetsu): n5177ic 2
    ChatGPT 3.5 JP-to-EN translation (pastebin): u1uuYT9p

    It feels good, but my knowledge of japanese reaches only to katakana and hiragana. I can't actually understand the source, so verifying the translation is imposible for me.

    Oh, and about cost. This was 17.999 tokens, so close to 0,04 USD using the gpt-3.5-turbo-0301 model.

    Thank you for your input!

    P.D.: Sorry about the inconvenient link format. This is my first post so I can't post links.
     
  2. Ca5e

    Ca5e Well-Known Member

    Joined:
    Aug 10, 2022
    Messages:
    157
    Likes Received:
    186
    Reading List:
    Link
    ChatGPT is basically google translate which fixes grammar. It's good enough after some human editing.
     
  3. antimuon792

    antimuon792 Member

    Joined:
    Mar 23, 2023
    Messages:
    10
    Likes Received:
    2
    Reading List:
    Link
    Yup! It seems to be more aware of context too. There was no human editing involved in creating the uploaded text and it doesn't feel as unnatural as google translate. There are still some things that feel wrong, but they are few and far in between in my opinion.

    I've tried to implement a cheat sheet of common terms previously translated (like proper nouns and the like) to retain consistency across multiple chapters too.

    Can't wait to get access to the GPT-4 model API. Preliminary tests show it's better at translating.
     
    Last edited: Mar 26, 2023
  4. UnGrave

    UnGrave ななひ~^^

    Joined:
    Jun 27, 2016
    Messages:
    4,076
    Likes Received:
    12,856
    Reading List:
    Link
    I've thrown the chapter into GPT-4 if you want to look at the results.

    Note, I also requested that it to "Translate the following chapter from a japanese webnovel. Make sure to maintain common web novel translation formatting conventions, such as leaving お兄ちゃん as onii-chan."

    This may have resulted in some weird side effects like line 65 in your pastebin is broken up into tiny chunks.
     
    Last edited: Mar 27, 2023
  5. antimuon792

    antimuon792 Member

    Joined:
    Mar 23, 2023
    Messages:
    10
    Likes Received:
    2
    Reading List:
    Link
    Nice! It certainly looks better than the one produced by GPT-3.5.

    I've been working around token counts in prompts with 3.5. If you send an entire chapter like the one I used for testing, then it exceeds the limit. And if you split each chapter in chunks, then the model start's making continuity mistakes. Like translating the same term differently, or changing the subject of the sentence at the start of each chunk. The same thing also happens while trying to keep continuity across chapters, even with GPT-4.

    That's why I'm working in making the python script handle the context of each chunk. Then, by sending a fitting prompt, you can guide the model into translating a chunk of text while keeping in mind the context that surrounds it.

    My script provides the model with
    • a cheat sheet of contextualy meaningful terms previously translated as it was generated after translating the previous chunk
    • the last line of the previous contiguous chunk
    • the first line of the next continuous chunk
    • and a concise context relevant summary of the story for the previous contiguous chunk
    After each translation, a new cheat sheet is created for the current chunk. It keeps weights for each term based on a weighted sum of 7 different dimensions: chunk frequency, document frequency, prev chunk frequency, summary consistency, context relevance, NER, and novelty.

    Before sending to the API, the cheat sheet is capped at 15 terms to keep prompts as small as posible. Sorting is done based on weight.

    I do three prompts per chunk: one for translation using the previously mentioned info; one for summary creation using the previous summary and the current translated chunk; and one more for cheat sheet creation using the current chunk.

    Preliminary testing shows that this works better, but I switched models to GPT-4 mid writing the program, so my current results are better from the moment the API responds.

    I also attempted to implement a way fix paragraphs and line breaks. I'm basing it on counting sentences per paragraph in the original text. It kinda works? It's a little too agressive and almost makes it one sentence per paragraph.

    I'll switch back models to GPT-3.5 and do more testing later. GPT-4 is awesome in what it can do, but it's 15 times as expensive.

    EDIT: Oh! I forgot to add, chapter splitting is done based on token count using tiktoken. 1000 tokens for GPT-3.5, and 3000 tokens for GPT-4.
     
    Last edited: Mar 28, 2023
  6. antimuon792

    antimuon792 Member

    Joined:
    Mar 23, 2023
    Messages:
    10
    Likes Received:
    2
    Reading List:
    Link
    I dunno if there's a rule against double posting. First time I've used a forum since 2014 (reddit doesn't count). Please let me know if there is one.

    I added some functionality to the program. Now it gets the novel from Syosetu, then translates it, and finally converts it both to an epub file and a markdown file. All it needs is a novel code and it starts working.

    I'll get it into a github repo and post it later today.
     
    Last edited: Mar 30, 2023
  7. 9790712701

    9790712701 New Member

    Joined:
    Apr 1, 2023
    Messages:
    1
    Likes Received:
    0
    Reading List:
    Link
    Could you please share your GitHub repository with me? I'd like to discuss some topics related to translating light novels using GPT-4, such as better prompt suggestions, how to divide the novel into smaller sections for input into GPT-4, and providing context throughout the process.
     
  8. antimuon792

    antimuon792 Member

    Joined:
    Mar 23, 2023
    Messages:
    10
    Likes Received:
    2
    Reading List:
    Link
    This is the repo where I'm working on the program: https://github.com/CombobulativeDesigns/gptwntranslator

    Config file gives you a little freedom on which models you want to use for which actions, so you can play around a little there.

    Documentation is hit or miss right now. I'm currently wokring on improving that and the test coverage.

    Later comes iterating over the prompts, and translating the metadata of the novel.

    You are more than welcomed to refine the prompts used if you can.