Discussion Novel Updates Analysis - Synopses [Feb 2020] (FINAL PART)

Discussion in 'Novel General' started by Scrya, Feb 28, 2020.

Thread Status:
Not open for further replies.
  1. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    Hello everyone, welcome to the final part of my novel updates analysis!

    If you have not seen the rest of my analysis before, you can check them out in the following links!
    Novel Updates Analysis - General Trends [Part 1]
    Novel Updates Analysis - Genre Analysis [Part 2]
    Novel Updates Analysis - Tag Analysis [Part 3]

    This thread will be much shorter than the previous one, but it will be more interactive! Especially for the second part, I will be introducing you all to something that I'm really into, and I hope that you will be able to learn something new!

    4. Novel Synopses Analysis
    1. Overview
      1. Wordcloud of Word-Pairs
      2. Distribution of Synopsis Length
      3. Novels with the Longest Synopses
    2. Finding Similarities in Synopses
      1. Brief Introduction to Word Embeddings
      2. Visualizing in 2D
      3. Visualizing in 3D
    3. Novel Recommender System
    5. Project Final Thoughts

    4.1. Overview

    4.1.1. Wordcloud of Word-Pairs

    So in the Genre Analysis thread, I showed a wordcloud of the top common words that can be found across all novel synopses. This time however, I will show a wordcloud of the top 200 common word-pairs (or bigrams)!

    bigrams.png
    We have "one day" as the most common word-pair. To be exact, we have a total of 615 instances of "one day" being used, compared to the runner-up "another world" at 412 instances. We have also many different combinations of "year". Such as "years later" or "years ago".

    I can also spot a few "red eye" and "eye red" somewhere to the top right. Red eyes sure are popular huh! What other interesting things can you guys spot?

    4.1.2. Distribution of Synopsis Length

    In general, the average number of words in a project synopsis is 107.67 words. When looking at the distributions according to language:

    desc_numwords.png
    The average number of words per CN project is 118.25.
    The average number of words per JP project is 97.34.
    The average number of words per KR project is 90.71.

    Other interesting tidbit is that we have 78 novels with no synopsis.
    And we have a novel with the synopsis "very straight". (No, it's not straight at all.)

    4.1.3. Novels with the Longest Synopses

    desc_longest.png
    The novel with the longest synopsis is "White Horse Neighing in the West Wind", closely followed by "Chu Wang Fei". Most of the projects listed here are CN projects.

    4.2. Finding Similarities in Synopses

    4.2.1. Brief Introduction to Word Embeddings

    Everyone knows that the word "man" is the opposite of the word "woman". Likewise, the word "boy" is the opposite of the word "girl".

    So we can group the words "man" and "boy" together as male-leaning words, while "woman" and "girl" can be grouped as female-leaning.

    However, we can also group the words "man" and "woman" together as adult-leaning words, and "boy" and "girl" as child-leaning words.

    Similarly, we can apply this concept into sentences, paragraphs, or in our context, novel synopses. Is this synopsis about BL? Is it geared more towards the male audience? What kind of atmosphere does the synopsis portray? Fluffy? Serious?

    Many people have been trying to model all these information of a text using groups of numbers. And these groups of numbers are what we call Word Embeddings.

    To represent a paragraph of text, we combine all the Word Embeddings of each word of a text, to form a Sentence Embedding.

    How do we make these word embeddings though? Traditional methods simply count how many times each word appear in a text, while advanced methods take into account how often each word appear next to or before other words.

    desc_comic1.png

    The current state-of-the-art method, is to make use of pre-trained neural networks to produce the word embeddings for you. To put it simply, it's like asking for an expert opinion about the meaning of your life.
    The most reliable "expert" right now is a neural network architecture called BERT. Yes. Remember Sesame's Street? That BERT.

    desc_comic2.png

    4.2.2. Visualizing in 2D

    So now that we have our embeddings for our synopsis. Let's try visualizing them in 2D!

    meteor.png

    ... Ah right. We have over 6000 novels, don't we? Let's try reducing the number of novels we're looking at to just the top 500 novels based on number of readers.

    similarity_based_on_desc_plot_resized.png

    Hmm, it's still kinda small. But the general idea is that, if a data point (representing a novel) is close to another data point, the two novels have similar synopses. This goes the same for the reverse - if a data point is far away from another data point, the two novels have vastly different synopses. Let's visualize the embeddings in 3D instead!

    4.2.3. Visualizing in 3D

    Rather than showing in screenshots, I decided to make a short video of me playing with the 3D projection!
    Do note that only novels with more than 2500 readers have been used for this visualization. So there's about 1439 novels/data points.



    If you wish to try out the visualization yourself, you can visit this link here. Do note that it's recommended to play with the visualization on a PC. If you wish to run the T-SNE simulations, do note that you will be using a large chunk of your CPU, or GPU if you have it.
    Tensorflow Embedding Projection

    Of course, if you don't feel like watching the video and don't have the time to try out the simulations, here's a screenshot of how the visualization look like at one point in the simulation.

    upload_2020-2-28_17-1-2.png

    Over here, you can see that the synopsis of Against the Gods have been predicted to be close to that of Unrivaled Tang Sect and The Great Ruler.

    desc_atg.PNG

    4.3. Novel Recommender System

    So, we can't just be creating embeddings for fun and just visualizing them, right? Other than tasks like text summarization, sentiment analysis, and text prediction software installed for your phone's keyboard, these embeddings can also be used for recommending things!

    Referring to the screenshot above, if a reader reads Against the Gods, we can recommend the reader The Great Ruler or Heaven's Devourer, since their synopses are predicted to be similar.

    To be more accurate with the recommendations, rather than just a single novel, you can also choose to put the entire list of novels that you have read before, and a recommender system can recommend you novels based on that list of novels! Incredible, isn't it!?

    Here's my implementation of a recommender system in action:

    [​IMG]

    And I got to say, I'm pretty satisfied with the results, as most of recommendations seem to be of the same genre as the novels used as inputs.

    5. Project Final Thoughts
    Hey everyone, hope that you have enjoyed my analyses! It was interesting to see your perspectives and insights on some of the stats!

    I don't think I told you guys but, I initially thought this entire thing wouldn't take me more than 10 pages on a word document. However, just the first thread alone took up like 10 pages. Whoops.
    Also, I realized that for each subsequent thread, things slowly began to become a little technical, especially this last one. HAHAHA.

    My entire thought process coming into this was, "Hey, I can brush up on some skills while doing this for fun." However, as I continued to work on this, my mind eventually trailed off to "Hmm, how would this data be useful? Is there apply my findings?" Which kinda resulted to how long of a project this was. (I can't believe I even recorded a video for this!?) Though, I hope that you guys enjoyed the ride!

    Well, that's it from me. Do you all have any other ideas of how some of NovelUpdates data can be used? Are there anything else that you want to see? Do let me know!

    For my first actual side-project other than my school work, I'm pretty satisfied with myself. I will eventually rewrite some portions and put my analyses on my site or something, to build up my portfolio, yeah! For now though, I think I will take a break and catch up on the translations I'm supposed to be working on. *coughs*
     
  2. LaDyViL

    LaDyViL New Member Staff Member

    Joined:
    Nov 9, 2015
    Messages:
    9,641
    Likes Received:
    22,607
    Reading List:
    Link
    Yes, that blob of ink finally made its appearance!!!

    Thanks for the hard work.
     
    Dr_H_16, Ophious, AliceShiki and 3 others like this.
  3. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    I initially didn't plan on putting it up okay! Then I was like, some people would probably be disappointed huh. :blobpeek:
     
  4. otaku31

    otaku31 Well-Known Member

    Joined:
    Nov 26, 2015
    Messages:
    5,608
    Likes Received:
    21,735
    Reading List:
    Link
    Very interesting! Thanks! :aww:

    BTW, I don't need a predictor to tell me that almost every f*cking cultivation novel is similar, both in content and synopsis. :blobpeek:
    Also, surprised to note how CN synopses manage to pack in over 118 words while avoiding saying anything about the actual story itself!
     
    Gitami, Dr_H_16, TokioftheBel and 4 others like this.
  5. DragonMage18

    DragonMage18 Outcast

    Joined:
    Dec 29, 2016
    Messages:
    995
    Likes Received:
    1,657
    Reading List:
    Link
    :aww::aww::aww:
    Someone actualy tok the time to do this.
    Great work:blob_plusone:

    Here, you deserve a cockie:cookie:
     
    Dr_H_16, TokioftheBel and Scrya like this.
  6. LaDyViL

    LaDyViL New Member Staff Member

    Joined:
    Nov 9, 2015
    Messages:
    9,641
    Likes Received:
    22,607
    Reading List:
    Link
    :blobpats::blobpats:
     
    Scrya likes this.
  7. ExcitableFoci

    ExcitableFoci Well-Known Member

    Joined:
    Oct 24, 2019
    Messages:
    2,065
    Likes Received:
    3,564
    Reading List:
    Link
    I didn't know ink stains could look so beautiful.
     
    Dr_H_16 and Scrya like this.
  8. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    Hahaha, that's true. But I think it's a good alternative recommender system to accommodate people who do not wish their recommendations be based off their own personal information!
     
    Fuyuneko and otaku31 like this.
  9. runsing

    runsing status : bleeding, health -10/s Novel Updates Staff

    Joined:
    Nov 4, 2015
    Messages:
    3,068
    Likes Received:
    6,410
    Reading List:
    Link
    several of the top 25 position, and possibly all the way to the hundredth, were there not because of they're genuinely lenghtily described, but it was because they either;
    1. have multiple synopsis (for different books/volume),
    2. contained excerpts from the story (including dialogues). more often than not, these so-called 'excerpts' are often fake, and could not be found anywhere in the story at all.
    3. contains nonsense/ads from the original raw sites, such as "please read the intro before buying the chapter". some other times, someone even put notice of how the translator split the chapters, or the translator taking vacation, or even, "visit our site for more novels like this"- in.the.freaking.synopsis.

    *no.2 is the most often of the three, btw.
     
    Wen Rou, Dr_H_16, AliceShiki and 5 others like this.
  10. GonZ555

    GonZ555 [Free Hugs]

    Joined:
    Nov 10, 2015
    Messages:
    1,918
    Likes Received:
    31,268
    Reading List:
    Link
    Kinda want that recommendation system..
     
    Dr_H_16, TokioftheBel and Scrya like this.
  11. insteadofdeath

    insteadofdeath Faith

    Joined:
    May 27, 2017
    Messages:
    242
    Likes Received:
    1,243
    Reading List:
    Link
    The best part of these posts for sure hahahaha
     
    LaDyViL and Scrya like this.
  12. AliceShiki

    AliceShiki 『Ms. Tree』『Magical Girl of Love and Justice』

    Joined:
    Apr 27, 2016
    Messages:
    22,922
    Likes Received:
    90,814
    Reading List:
    Link
    The only thing I would have liked to see (that isn't listed already), was some comparison between synopsis length and number of readers to see if there was any correlation between those two or not. I think it would be quite interesting to see it!

    Great job overall though! Was really fun reading through those threads~
     
    Dr_H_16 likes this.
  13. lamperouge0

    lamperouge0 Active Member

    Joined:
    Jul 23, 2018
    Messages:
    5
    Likes Received:
    7
    Reading List:
    Link
    Very interesting! Would you be willing to post your novel recommendation system?
     
Thread Status:
Not open for further replies.