Discussion Novel Updates Analysis - Tags (Feb 2020)

Discussion in 'Novel General' started by Scrya, Feb 26, 2020.

  1. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    I thought this would be easier since I initially wanted to use the genre analysis format... I ended up making so many detours... My entire week break from school is just gonna be spent on this side project, isn't it?

    If you have not read the previous threads yet:

    Novel Updates Analysis - General Trends (Feb 2020) [Part 1]
    Novel Updates Analysis - Genre Analysis (Feb 2020) [Part 2]

    Click here for my next and final thread:
    Novel Updates Analysis - Synopsis Analysis (Feb 2020) [Part 4]

    In this thread, I will be looking solely on tags! It's actually going to be a very long read, so I'm going to just split this thread into several posts, so you can just click on the table of contents to go straight to the information you like to see!

    Some bits would also be statistically heavy, such as tags/readers correlation section. So, just take note!

    3. Tag Analysis
    1. Overview
      1. Most Common Tags
      2. Least Used Tags
      3. Novels with Most Number of Tags
    2. Correlation of Number of Tags to Number of Readers [STATISTICALLY HEAVY]
    3. Most Common Tag Combinations
      1. 3-Tag Combinations
      2. 4-Tag Combinations
    4. Measuring Common 4-Tag Combinations
      1. Ratings
      2. Number of Readers
    5. Finding Subsets Between 2 Tags [MATH]
    6. Conclusion
    3.1. Overview

    3.1.1. Most Common Tags

    As of 19th February 2020, NU has a total of 766 different tags. The most used tags are, of course, “Male Protagonist” and “Female Protagonist”.

    tags_most_common.png

    Kind of interesting to see that we have more projects tagged with “Modern Day” than “Transmigration”. I honestly thought it would be the other way around. And… Why do we need separate “Sword and Magic” and “Magic” tags again?

    3.1.2. Least Used Tags

    tags_least_common.png

    Over here we have the least used tags. I thought that we would have more novels with the Jobless Class tag, but it seems we only have 3 of them? And if we count all the Xianxia/Xuanhuan novels where the MCs marry their spouses and then leave them for higher realms, wouldn’t that count as long-distance relationships? :thinks:

    3.1.3. Novels with Most Number of Tags

    A novel on NU has an average of 12.65 tags, and the novel with the most tags is Death March kara Hajimaru Isekai Kyusoukyoku (LN), with… with… 182 tags. Yep, 182 tags.

    tags_most_number.png

    You can see just how big of a difference in the number of tags it has compared to the rest. I believe a certain a degree of cleansing has to be done~ I mean, look, even the WN version has less than half of the tags it has!
     
    Last edited: Feb 28, 2020
    Dr_H_16, false, tottiy and 16 others like this.
  2. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    3.2. Correlation of Number of Tags to Number of Readers

    Fumu. I thought about this a lot. Unlike genres, we have over 700+ tags, so I don’t think finding trends of specific tags would be that beneficial. Instead, why not find out if there’s any correlation between the number of tags and number of readers? And what’s a better way of showing that than with a scatter plot! (I removed the outlier that is Death March, because yeah, screw Death March.)

    tags_scatter.png
    Over here, from how the scatter plot is fanning outwards, we can see that novels with fewer tags seem to have less readers than those with more tags. But of course, it’s not exactly definitive enough. So let’s calculate the estimated means of the number of readers for each number of tags, and plot a correlation line across them!

    tags_correlation_line.png

    Right, newcomers to statistics might not understand this chart, but I would like to first talk the R-squared value. A R-squared value of 0.471 suggests that the correlation line is able to explain 47.1% of the variability of the response data (number of readers) around its mean.

    Don’t understand? It’s alright. Let me reiterate. A R-squared value of 0.471 suggests that the correlation line is able to explain 47.1% of the variability of the response data (number of readers) around its mean.

    Still don’t understand? Don’t worry, it took me one whole year to actually figure this out after fumbling around during my first year of school. Yeah. I’m not proud of that. Anyway, using the same chart, I will attempt to provide a clearer explanation.

    tags_correlation_line_explained.png

    Please comment if you think Correlation Line-chan isn’t useless and is doing her best.

    In any case, a good correlation or regression line would be able to explain 70~100% of the variability of the data, while one’s that doing alright would be able to explain 40~70%. Of course, this is entirely subjective.

    For this case however, I would think just this positive relationship is enough to signify that more tags would bring in more readers. After all, more tags would mean more people of different tastes would be lured into reading the novels!
     
    Dr_H_16, false, tottiy and 4 others like this.
  3. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    3.3. Most Common Tag Combinations

    Alright, in this section, like the genre analysis, I will share the most common combinations of tags. This time, I will only show up to combinations of 4 tags. Why just 4? We have 766 tags you say? No. It isn’t JUST 4, okay!? Let me give you guys a sense of scale here.

    We have 766 single tags in total. If we want to JUST look at all combinations of 2 tags, we’re talking about (766 x 765) different pairs in total. That’s already 585,990 different pairs!

    Combinations of 3 tags will have a total of 766 x 765 x 764 = 447,696,360 different triplets!

    WHAT ABOUT COMBINATIONS OF 4, HUH!? 341,592,322,680 potentially different quadruplets! Of course, most combinations of 4 don’t exist, so the actual number is actually smaller. But still! It took my computer more than an hour to process them all! When I tried to save the processed data so that I didn’t have to go through this hell again, guess what! THE ENTIRE DATASET WAS OVER 4GB! At that point I was like, you know what. Let’s just only include quadruplets that have appeared at least 10 times, and I managed to cut the dataset all the way down to just 11MB!

    Geez. At one point, I was even thinking if a side project was really worth all this trouble. Just for these charts!

    3.3.1. Most Common 3-Tag Combinations

    tags_most_common_3_comb.png

    Anyway, enough with the rant. Looking at the chart, we can see that “Beautiful Female Lead” and “Handsome Male Lead” are often seen together, along with the occasional “Love Interest Falls in Love First.”

    Then there’s the “Weak to Strong” tag which is also often seen. But is this tag really necessary though? Don’t almost all novels have this characteristic in the first place? Also, since this tag exists, why doesn’t its opposite – “Strong to Weak” exist?

    We can also see “Game Elements” often on the chart. Makes sense. Everyone likes their infinite-space dimensional bags and skill screens!

    Let’s look at combinations of 4 tags now.

    3.3.2. Most Common 4-Tag Combinations

    tags_most_common_4_comb.png

    Pretty similar to the ones on the 3-tag combinations, just with a few new additions. One thing I noted here is that novels with female protagonists are topping these charts, even though they are only 3/5 of the total number of male protagonist novels. (1693 vs 2690)

    This could mean that female protagonist novels are probably more homogeneous, while male protagonist novels are more differentiated. We need more variety of female protagonist novels!
     
    Last edited: Feb 26, 2020
    Dr_H_16, simdimdim, false and 5 others like this.
  4. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    3.4. Measuring Common 4-Tag Combinations

    Similarly to the genre analysis, let’s see how different tag combinations work out when compared with different measures. Unlike the previous thread however, I will using tables to display my results, rather than bar charts since I find that there isn't a point in using a chart when the ratings/number of readers are all pretty similar.

    For this section, I will only look at tag combinations that appear in at least 50 novels. There’s a total of 982 4-tag combinations which fit this criteria.

    3.4.1. Ratings

    Highest rated 4-tag combinations
    tags_avg_rating.PNG

    In terms of highest average ratings, to no one’s surprise, mostly female protagonist novel tag combinations topped the chart. Actually, you know what. Why don’t I remove all novels with the “Female Protagonist” tag and see how different it is.

    Highest rated 4-tag combinations (With female protagonist novels removed)
    tags_avg_rating_male.PNG
    Among male protagonist novels… It seems like novels with Lolis are highly rated.



    … What? Why are you looking at me? Lolis are cute okay?

    A-Anyway! Let’s move on to the lowest rated ones!

    Lowest rated 4-tag combinations
    tags_avg_rating_lowest.PNG

    Hmm, similar to the genre analysis, it seems like novels with sexual themes aren’t received well in terms of ratings. Overpowered Protagonists too aren’t received as well too.

    3.4.2. Number of Readers

    4-tag combinations with highest readership
    tags_avg_readership.PNG

    Now looking at the readership, it’s clear that male protagonist novels take center stage here. However it’s interesting to see that though novels with Overpowered Protagonist and R-15 tags are among the lowest rated, they are still read by lots of readers. (Where did my lolis go!?)

    Alright, just like earlier, let’s see what happens when I remove novels with the Male Protagonist tag.

    4-tag combinations with highest readership (with male protagonist novels removed)
    tags_avg_readership_female.PNG
    Aside from how similar the tag combinations are, the highest average readership around these tags are around 4000-5000, about half to that of male protagonist novels.

    4-tag combinations with lowest readership
    tags_avg_readership_lowest.PNG

    In terms of tag combinations with the lowest average readership, modern day female protagonist novels seem to have the smallest following! This is kinda disappointing because I'm into modern day novels...
     
    Last edited: Feb 26, 2020
    Dr_H_16, false, tottiy and 4 others like this.
  5. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    3.5. Finding Subsets Between 2 Tags

    In this section, I would like to see if there are any tags which are subsets of other tags. What do I mean? Remember in like secondary/middle schools, where your math teacher taught you about sets and venn diagrams? It looks something like this.
    tags_venn.png

    In the context of novels, let’s say we have the tags “System Administrator” and “World Hopping”. There’s a total of 193 novels tagged with “System Administrator”, and 144 novels tagged with “World Hopping”. Out of these novels, 103 novels are both tagged with “System Administrator” and “World Hopping”. We want to find out the intersection rates in respect to the two tags. Here’s a graphic to better explain what I’m trying to convey.


    tags_sets.png

    How would finding subsets be useful? It might not be useful for readers, rather, it’s more for our dear NU admins! By finding subsets, it can be useful to quickly determine which tags are unnecessary, and thus remove them!

    Sorting the table by “% of A in Intersection”, we have this resulting table.

    set_a.PNG

    100% of A in intersection tells us that A is a subset of B. To give an example, there’s a total of 5 JSDF-tagged novels. And all 5 of them also have the Military tag. So the question would be, is there still a need to have a JSDF tag?

    Now, let’s sort the table by “% of B in Intersection”.
    set_b.PNG

    Similarly, over here, a 100% of B in intersection tells us that B is a subset of A. All 27 of “Sexual Cultivation Technique” tagged novels are also tagged with “Male Protagonist”. If any mods want the entire table, let me know and I will send it to you~
     
    Last edited: Feb 26, 2020
    Dr_H_16, false, tottiy and 7 others like this.
  6. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    3.6. Conclusion
    • The more tags a novel has, the more likely it will have more readership. Of course, this will still require more data, or take account of more variables, to make sure.
    • Current selection of female-targeted novels seems to be rather similar in content, especially novels with female protagonists. New female protagonist novels with a different flair might bring fresh air to the community, but readership would still be hard to gain as compared to male-targeted novels.
    • The tag analysis here has reinforced the conclusions made in the genre analysis. Novels with male-targeted tags are likely to have more readership over novels with female-targeted tags.
    Whew, this took two whole days. Hope that all of you have enjoyed reading this! As usual, leave your comments down below if you have any questions. If you have interesting ideas on what else I can do with information about tags, do let me know too and I will see if I can explore them!

    I’m not entirely sure if I would still make another thread on novel descriptions, but we’ll see~
     
    Dr_H_16, false, tottiy and 12 others like this.
  7. otaku31

    otaku31 Well-Known Member

    Joined:
    Nov 26, 2015
    Messages:
    5,608
    Likes Received:
    21,734
    Reading List:
    Link
    Scrya Presents: Fun With Tags

    Looks amazing! Will pore over it when I hv time. Thanks, Scrya.
     
  8. MangoGuy

    MangoGuy Rambling Mango

    Joined:
    Apr 15, 2016
    Messages:
    7,461
    Likes Received:
    8,375
    Reading List:
    Link
    Regression lines are the best to show the trend.
     
  9. Archaic pickle

    Archaic pickle Daoist Heavenly Kimichi

    Joined:
    Feb 16, 2016
    Messages:
    4,018
    Likes Received:
    32,492
    Reading List:
    Link
    Autism and editors are next to each other in the graph :blobspearpeek:
    (Edit: The least used tag graph)

    Shade has been thrown
     
    Scrya likes this.
  10. Inuzuka

    Inuzuka Well-Known Member

    Joined:
    Jul 25, 2016
    Messages:
    712
    Likes Received:
    275
    Reading List:
    Link
    You posted this picture under both Most common 3-tag combination and 4-tag combination and missed posting the actual 4-tag combination.
    [​IMG]
     
    Dr_H_16 likes this.
  11. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    Ooof, thanks for taking notice! I fixed it!
     
    Dr_H_16 likes this.
  12. DocB

    DocB "I see you, little mouse! Run along"

    Joined:
    Nov 10, 2015
    Messages:
    3,573
    Likes Received:
    8,085
    Reading List:
    Link
    You are kinda of missing the obvious that more reader mean more people that can edit tags and as such they have more tags.
    You can also see if the amount of chapter released influence the number of tags
     
    Dr_H_16, GonZ555 and Scrya like this.
  13. Siceraria

    Siceraria Well-Known Member

    Joined:
    Jun 27, 2016
    Messages:
    1,114
    Likes Received:
    3,302
    Reading List:
    Link
    Yep, time to go to bed.
    what laptap computer close.gif
    ... Sorry, you worked so hard on this that my brain got completely fried by the amount of information!
    My poor brain can't handle statistics!


    ... Though it does show a lot about what readers want. Good job on what you have done!
     
  14. Jasad

    Jasad ...not oldschool, just old...

    Joined:
    Feb 12, 2016
    Messages:
    938
    Likes Received:
    528
    Reading List:
    Link
    wow, what you do is amazing....:blobok:
     
    Scrya likes this.
  15. Scrya

    Scrya Lurking all day long~

    Joined:
    Oct 22, 2015
    Messages:
    359
    Likes Received:
    1,349
    Reading List:
    Link
    Mmnhmm! Things can always go two ways! And I didnt look into the number of chapters as I felt that Im already doing too much for now orz. Maybe next time~
     
    GonZ555 likes this.
  16. Dizzcity

    Dizzcity Watching generations of fans rise and fall away

    Joined:
    Nov 19, 2015
    Messages:
    913
    Likes Received:
    891
    Reading List:
    Link
    1) You know this already, but just to make it clear for other readers, correlation is not causality. That means we don't know if a novel having more tags is the reason why it gets more readers, or whether the reason why a novel has more tags is because it had more readers.

    2) It is possible that a reason why there seems to be a lack of variety in female-protagonist novels isn't because they are all similar, but because the tropes present are not as well-known and have not been as clearly codified yet into tags. So it's not a lack of variety in novels, it's a lack of variety in tags. For example, let's take a common genre: modern-day female protagonist romance novels. There are almost no tags to describe the kind of tone in the relationship between the two leads. (E.g. push-and-pull, mutual respect, rivalry-leads-to-love, calm-and-supportive, old-friends-become-lovers, idiots-in-love, quiet romance, domineering male lead, domineering female lead, etc.) The only one I can spot is perhaps Power Couple. You have to make an approximation based on user reviews, tags related to the character of the leads (black belly, calm protagonist, etc.), and maybe tags related to who falls in love first.

    3) This is again a side effect of the same problem I mentioned in the genre thread, I think. The wide disparity in the length of male-protagonist novels versus female-protagonist novels means that the former have more time to accrue fans than the latter. But fans of the latter may have read more novels of the same type of genre than the former.
     
    Dr_H_16, Fuyuneko, Snowbun and 3 others like this.
  17. Shio

    Shio Moderator Staff Member

    Joined:
    Oct 21, 2015
    Messages:
    5,878
    Likes Received:
    11,996
    Reading List:
    Link
    It's scary what people could do with their free time sometimes.
     
  18. GonZ555

    GonZ555 [Free Hugs]

    Joined:
    Nov 10, 2015
    Messages:
    1,918
    Likes Received:
    31,268
    Reading List:
    Link
    First of all, There's a genie tag?!

    No surprises there that romance stuff took the top tag combination..

    Also i agree on this..
     
    Scrya likes this.
  19. An Anime Addict

    An Anime Addict (≧▽≦)/̵͇/'̿'̿ ̿ ̿̿ ̿̿ ̿ ̿̿ ̿̿ (▀̿̿Ĺ̯̿▀̿ ̿)

    Joined:
    Feb 14, 2018
    Messages:
    913
    Likes Received:
    1,539
    Reading List:
    Link
    I wonder how big the charts would have gotten if you had used the 4gb data instead of the 11mb one. Now that would be something else:blobpeek:
     
    Last edited: Feb 26, 2020
    Dr_H_16 and Scrya like this.
  20. GonZ555

    GonZ555 [Free Hugs]

    Joined:
    Nov 10, 2015
    Messages:
    1,918
    Likes Received:
    31,268
    Reading List:
    Link
    Probably not that much different? Only the scaling will be prominent and a few more outliers will show up..
     
    Scrya likes this.