Discussion Novel Updates Analysis (Feb 2020)

Discussion in 'Novel General' started by Scrya, Feb 23, 2020.

  1. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    358
    Likes Received:
    1,334
    Reading List:
    Link
    After a recent data analytics skill test for a job application that I’m sure I flunked, I realized I needed to brush up on my basics. So, I decided hey, why not look at novelupdates data!

    This is not the first time that this is being done. Dreams of Jianghu has been doing yearly reviews of NU trends on their site since 2018! Do check them out if you haven’t!

    https://dreamsofjianghu.ca/2018/08/06/asian-fan-translation-trends-2018/
    https://dreamsofjianghu.ca/2019/11/09/asian-fan-translation-trends-2019/

    Here’s a summary of what I have analyzed. In this thread, I will only be covering Section 1.
    *Things listed here are subjected to changes as I write the posts.

    0. How did I collect the data?

    1. General analysis
    1. Exploratory time analysis of projects
    2. Exploring projects by their ratings
    3. Exploring projects by their number of readers
    4. Final analysis with minimum chapters constraints
    5. Conclusion
    2. Genre analysis
    1. Overview
    2. Biannual growth trend
    3. Common genre combinations
    4. Statistics of genre combinations
      1. Ratings
      2. Number of readers
    5. Wordclouds of novel descriptions
      1. All novels
      2. Xianxia and Xuanhuan
      3. Yaoi and Shounen Ai
      4. Fantasy
      5. School Life
    6. Conclusion
    3. Tag analysis
    1. Overview
      1. Most Common Tags
      2. Least Used Tags
      3. Novels with Most Number of Tags
    2. Correlation of Number of Tags to Number of Readers [STATISTICALLY HEAVY]
    3. Most Common Tag Combinations
      1. 3-Tag Combinations
      2. 4-Tag Combinations
    4. Measuring Common 4-Tag Combinations
      1. Ratings
      2. Number of Readers
    5. Finding Subsets Between 2 Tags [MATH]
    6. Conclusion
    4. Novel synopsis analysis
    1. Overview
      1. Wordcloud of Word-Pairs
      2. Distribution of Synopsis Length
      3. Novels with the Longest Synopses
    2. Finding Similarities in Synopses
      1. Brief Introduction to Word Embeddings
      2. Visualizing in 2D
      3. Visualizing in 3D
    3. Novel Recommender System
    Things to note:
    • I will only be looking at CN/JP/KR novels as the make up the bulk of the projects on the site.
    • All time series analyses will start from the 2nd half of 2015, which is roughly when Novel Updates was first created.
    • All analyses are based on projects with chapters uploaded. This excludes projects with dead links/hidden chapters. As such, the values are a little undercut.
    • Conclusions made are based on merely the data collected and the analyses conducted.
    =======================================
    0. How did I collect the data?

    The entire data collection process was completed between 19th and 20th February this year, and the collection script was written in Python.

    The script automates across all the project pages on Novel Updates, and scrapes off the data that I find would be useful for my analysis. To visualize this, an example of some of the data I scrape off a page is shown below:

    scrape_example.png

    I used my translation project as an example, ‘cause I’m that narcissistic!

    =======================================
    1. General analysis

    As of 19th February 2020, there are a total of 6,054 projects on Novel Updates:

    Chinese: 3,128
    Japanese: 2,463
    Korean: 365
    Others: 98

    It was kind of surprising to see that there were only 365 Korean projects, seeing how they have been quite a hot topic in the past 2 years or so.

    Out of these 6,054 projects, only 5,683 projects contain active links.

    Let’s look at the line chart below that shows the number of projects over time, with respect to their country of origin.

    ------------------------------​

    1.1. Exploratory time analysis of projects

    num_projects_over_time.png

    From this chart, it can be seen that Japanese projects have been on a steady rise, but Chinese projects have been rising on an increasing rate! On 13th November 2018, the number of Chinese projects officially surpassed Japanese projects! Let’s dive a little deeper to look at the rates of increase, by looking at how many projects are added bi-annually.

    biannual_increase_projects.png

    Fumu. It seems like roughly about 200 JP projects are added every 6 months, while generally more and more CN projects are added, with a big spike between Dec 2018 and Jun 2019. The number of KR novels have also been showing a gradual increase since Jun 2018.

    Looking at the current trend, I’m predicting that we will see another 700-800 active projects being added by the end of this June, with possibly more Korean projects given their popularity.

    Anyway, while working on these charts, this got me thinking. How would the CN curve look like without official CN publisher-turned EN publishers (namely Webnovel and Tapread)?
    • Webnovel’s first project upload was on 1st March 2017, So we should see the CN curve branching out in the 1st half of the year 2017.
    • TapRead first project upload was on 25th February 2019. As TapRead has a small number of new projects, I don’t think it would affect the curve that much.

    biannual_growth_cn.png

    Interesting, it seems like the sharp increases between 2017 and 2019 weren’t because of these 2 rising publishers after all. Maybe it’s due to an increasing number of smaller translation groups? I did not scrape information about the translation groups, so I will have to leave this analysis for another time!

    ------------------------------​

    1.2. Exploring projects by their ratings

    Now then, let’s look at how the projects are distributed by their ratings. Do note that the minimum rating that you give is 1.0, and the maximum is 5.0. The average rating for a project on NU is 3.73. By language, the average ratings are:
    • Chinese: 3.72
    • Japanese: 3.74
    • Korean: 3.83
    Korean projects have a higher average rating than the other 2 languages, it seems. Let’s look at the actual distribution.
    dist_rating.png
    Hmm. The chart above doesn’t really give a fair comparison to KR projects, as they are significantly smaller in number. Let’s instead scale the numbers according to each language, and look at the density plots!
    scaled_dist_rating.png
    All the distributions are left-skewed, and Korean projects are generally rated near its mean score of 3.83. (Can be seen by its steeper slope near the mean.) This also means that readers generally do not give a Korean project a very high or low score that easily, as compared to the other 2 languages. Interesting!

    Chinese projects’ distribution has the shortest peak, which means that the ratings are spread more evenly than the other 2 languages. From the 4.5-5.0 range of the chart, you can also see that the Chinese distribution line is actually higher than the other 2 languages, which also means you can generally find more Chinese projects with that rating range.

    Japanese projects on the other hand, seem to have more novels within the 3.0-3.5 range, and less novels within the 4.1-4.5 range, when scaled to the other 2 languages.

    So with these charts alone, which translator/translation group, based on the language they’re translating, will thrive? I honestly think isn’t sufficient to conclude anything, so let’s move on to look at the distribution of readers.

    ------------------------------​

    1.3. Exploring projects by their number of readers

    On average, a project on NU has 1876 readers. In terms of language:
    • Chinese: 1736
    • Japanese: 2222
    • Korean: 2462
    Let’s go straight into the scaled distribution based on the number of readers!
    scaled_dist_readers.png Well, I can’t say that I’m not surprised to see such a biased right skew. The Chinese projects’ distribution has the tallest peak, while the Korean projects’ distribution has the smallest. This means Chinese projects are less fluid in terms of readers, usually close to its mean.

    There’s hardly any useful information to derive from here, so let’s move on.

    ------------------------------​

    1.4. Final analysis with minimum chapters constraints

    Translation groups tend to pick up novels that are usually longer in length. After all, more chapters would mean longer series longevity, more time to build a fanbase, more clicks, and thus more views. In this final part, we will take a look at the distributions of projects with at least 100 chapters, and see if we can make any conclusions!

    scaled_dist_rating_100.png
    Now, let’s look at the distribution of ratings again, this time with the constraint in place. Hardly any change could be seen to the Chinese and Japanese distributions, but the Korean distribution is skewed even more to the left! Generally, longer Korean projects have more ratings above 4.0 than longer Chinese and Japanese projects!

    For the curious, the average ratings for projects with more than 100 chapters are:
    • Chinese: 3.74
    • Japanese: 3.83
    • Korean: 4.04
    scaled_dist_readers_100.png

    Now let’s look at the number of readers for projects with the constraint in place! It’s definitely much more readable than the previous unconstrained chart for sure!

    The right skew for Chinese projects is much more pronounced than Japanese and Korean projects it seems, with a large bulk of Chinese projects having around 0-5000 readership. The Japanese and Korean distributions seem to be a lot smoother, with readership count spreading across more.

    When it comes to Chinese translation groups starting out, I would expect that unless it’s a big hit, it would be hard to gain readership for the first 100 chapters or so.

    The average readership, for projects over 100 chapters are:
    • Chinese: 3643
    • Japanese: 7327
    • Korean: 8418
    ------------------------------​

    1.5. Section 1 Conclusion
    1. There’s a general rising trend in the number of novels. I expect a bigger growth in KR projects.
    2. Though I believe there would still be a great number of CN projects in the following months, new readership for CN projects might be hard to obtain. This is most likely due to readers burnout.
    3. If committed to releasing long novels, a new Korean translation group would most likely outperform a new Chinese or Japanese translation group.
    And that’s the end of the 1st part of my analysis. If there are any questions, things I missed out, or things I probably misinterpreted, do let me know!

    Stay tuned for the next part! I guarantee that it's definitely more fun than just charts and numbers like this one! It will probably take me a few more hours to consolidate them all though... omg... Save me.

    *UPDATE: Click here for Part 2 on genre analysis!
    *UPDATE 2: Click here for Part 3 on tag analysis!
    *UPDATE 3: Click here for the final part on novel synopsis analysis!
     
    Last edited: Feb 28, 2020
    fox23, Dr_H_16, Lonelycity and 45 others like this.
  2. insteadofdeath

    insteadofdeath Faith

    Joined:
    May 27, 2017
    Messages:
    242
    Likes Received:
    1,208
    Reading List:
    Link
    I don't want to save you, then how will we get more threads like these HAHAHAH

    good job!

    I think it's kind of crazy how many Japanese projects there are, considering how many Japanese speakers there are vs Chinese speakers. Really interesting! and that's my totally non helpful takeway to this post hahahaha
     
    katiethairu33, LaDyViL, mir and 3 others like this.
  3. GonZ555

    GonZ555 New bun

    Joined:
    Nov 10, 2015
    Messages:
    1,720
    Likes Received:
    23,703
    Reading List:
    Link
    Yeah, i'm thinking korean novels will pick up readers really soon. But it's hard to say if they will reach up to JP and CN in term of popularity..
     
  4. Ha-Ha

    Ha-Ha Well-Known Member

    Joined:
    Aug 29, 2018
    Messages:
    59
    Likes Received:
    65
    Reading List:
    Link
    Altho its a little confusing I’m looking forward to the next part
     
  5. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    358
    Likes Received:
    1,334
    Reading List:
    Link
    Mind letting me know which part you found confusing? Were there too many technical terms to follow? Or do I need to improve my phrasing?

    Feel free to criticize! :)
     
    pass1478 likes this.
  6. Ophious

    Ophious 〖Escaping the Witch〗〖Hiding from Demonic Teacher〗

    Joined:
    Dec 31, 2015
    Messages:
    2,748
    Likes Received:
    16,978
    Reading List:
    Link
    I thought JP novels were on the decline since I don't see them as often, but I guess its just because of the influx of chinese novels hides them usually
    Some parts were confusing mostly cause I'm sleep deprived right now but all in all this was something interesting to read~
     
    Dr_H_16, Wujigege, mir and 3 others like this.
  7. canaria23

    canaria23 『  』

    Joined:
    Oct 21, 2015
    Messages:
    6,494
    Likes Received:
    7,104
    Reading List:
    Link
    Lazy translators too lazy to translate titles and it being drowned in a sea of red
     
    Matteus and Scrya like this.
  8. Donuts

    Donuts Endless surge of emotions

    Joined:
    Apr 21, 2017
    Messages:
    298
    Likes Received:
    309
    Reading List:
    Link
    This is very interesting Korean novels is invading our Nu website (i’m fine with it but please no more friendship tag in up incoming KR novel)
     
  9. Ha-Ha

    Ha-Ha Well-Known Member

    Joined:
    Aug 29, 2018
    Messages:
    59
    Likes Received:
    65
    Reading List:
    Link
    Hmm well I guess there’re technical terms I didn’t understand . But I mostly think it’s confusing becuz it’s my first time reading such an analysis. Like how you find new topics confusing at first, other than that all’s good.
     
    mir, TokioftheBel and Scrya like this.
  10. Little Potato

    Little Potato Sexiest Potato Alive [SpaceBar's Master]

    Joined:
    Sep 10, 2017
    Messages:
    580
    Likes Received:
    3,293
    Reading List:
    Link
    Nice! Thanks for dedicating your time to making this beautiful analysis. I look forward to the statistics and analysis of genre growth. I for one, am predicting that there would be an exponential rise in Yaoi (including shonen ai) related content lmao
     
  11. Ai chan

    Ai chan Queen of Yuri, Devourer of Traps, Thrusted Witch

    Joined:
    Nov 7, 2015
    Messages:
    10,610
    Likes Received:
    22,845
    Reading List:
    Link
    "...we will see another 700-800 active projects being added by the end of this June, with possibly more Korean projects given their popularity.."

    Ai-chan read that as "...with possibly more NTR and Cheating novels given their popularity."
     
  12. pass1478

    pass1478 A girl in a bear suit pajama

    Joined:
    Dec 15, 2019
    Messages:
    861
    Likes Received:
    3,314
    Reading List:
    Link
    Everything here I pretty much expected, thanks for clarifying my thoughts, mate!

    Though, I honestly don't think Korean novels will reach the amount of viewership and project amounts of Chinese and Japanese novels within the next few 1-5 years.
     
    Wujigege and Scrya like this.
  13. Seraphic

    Seraphic Uncomfortably close

    Joined:
    Aug 10, 2016
    Messages:
    1,288
    Likes Received:
    2,203
    Reading List:
    Link
    :blobscream: :blobparty: :blobxd: :bloblove:
     
    Scrya likes this.
  14. AliceShiki

    AliceShiki 『Ms. Tree』『Ophi-kun's Survival Teacher』

    Joined:
    Apr 27, 2016
    Messages:
    19,454
    Likes Received:
    77,235
    Reading List:
    Link
    Why didn't you include other official publishers like Wuxiaworld, Volare and Gravity Tales (is it still a thing? I dunno) to this list btw?
    I'm curious, did the data you gathered confirm this hypothesis? While I can understand bigger groups following that, I think smaller groups would much rather just translate a series they love, or translate a series they're confident in finishing, which usually means smaller series.
     
    TokioftheBel and Scrya like this.
  15. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    358
    Likes Received:
    1,334
    Reading List:
    Link
    Good question. I did not include them in the CN/publisher comparison because they were already translating even before taking up official licenses. What I wanted to achieve with that analysis was to see if the entrance of big Chinese publishers was actually a big game changer in the market.

    Yep, smaller groups would usually take up smaller projects. In fact, only about 1000 out of the 6000+ novels are more than 100 chapters in length! For the final analysis, it is more for translation groups that wish to be more than sustainable, or gain more profits!

    However, I would need to scrape data on translation groups for a deeper analysis to prove this. If only I can easily get information about all the novel chapters and their release dates, along with the groups that release them... Like having direct access to the database request API... *ahem* Tony *ahem*
     
  16. AliceShiki

    AliceShiki 『Ms. Tree』『Ophi-kun's Survival Teacher』

    Joined:
    Apr 27, 2016
    Messages:
    19,454
    Likes Received:
    77,235
    Reading List:
    Link
    Ooooooh, that makes plenty of sense~
    Fair enough~
    :blobpats::blobpats::blobpats::blobpats::blobpats::blobpats::blobpats:
     
    Scrya likes this.
  17. Nimroth

    Nimroth Someone

    Joined:
    Jan 15, 2016
    Messages:
    3,494
    Likes Received:
    2,744
    Reading List:
    Link
    Just curious if any of this will take into account the activity of translations?, such as how many are finished, the average release rate, or the average length since the last release?
     
  18. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    358
    Likes Received:
    1,334
    Reading List:
    Link
    Fumu. I did conduct a few analysis in that direction, such as distribution of release rates to the number of readers. But honestly the conclusion is something everyone already knew, faster release rates would generally have more readers.

    As for completed projects, I will look at them another time! :)

    In the meantime you can visit Dreams of Jianghu's site and check out what they did with their analysis on completed projects!
     
  19. Nimroth

    Nimroth Someone

    Joined:
    Jan 15, 2016
    Messages:
    3,494
    Likes Received:
    2,744
    Reading List:
    Link
    Well, I meant mainly that it would be interesting to see how many of all those projects are entirely dead, though I would guess that would be really annoying to get accurate. lol
    Aside for dead links of course.
     
    mir and Scrya like this.
  20. DocB

    DocB *Climbing the tower to kill the Mad King *

    Joined:
    Nov 10, 2015
    Messages:
    3,418
    Likes Received:
    7,595
    Reading List:
    Link
    all this haven't been updated since 1 november 2019 and by the 3 month Law, they are dead
    https://www.novelupdates.com/series-finder/?sf=1&org=495,496,497&ss=4&sort=sdate&order=desc
    note that they are all tagged with hiatus, and when i search ongoing nothing appear so i assume that hiatus is an automatic process and is working