Are there any tools for scanning documents and tracking words??

Discussion in 'General Chat' started by Gandire Alea, Nov 3, 2021.

  1. Gandire Alea

    Gandire Alea [Wicked Awesome Translator]

    Joined:
    Jan 24, 2017
    Messages:
    4,016
    Likes Received:
    70,987
    Reading List:
    Link
    To be more specific, I am searching for something that will scan a file or document, and then catagorize each word/set of characters in the frequency that it appears. For example, if "Harry potter" were to be scanned, it would list the amount of times the words "wizard, witch, spell, magic, the, etc, school, room" appear along with the mount of times they appear.

    Are there any such tools capable of doing this??
     
  2. Lacey_Avocato

    Lacey_Avocato New Member

    Joined:
    Sep 22, 2021
    Messages:
    12
    Likes Received:
    17
    Reading List:
    Link
    Gandire Alea likes this.
  3. Ddraig

    Ddraig Frostfire Dragon|Retired lurker|FFF|Loved by RNG

    Joined:
    Apr 6, 2016
    Messages:
    7,855
    Likes Received:
    22,460
    Reading List:
    Link
    Can you tell me the format of the document to be scanned? Coz that is the only messy part. Other than that it is a simple counter application.
    Depending on what kind of stuff you are dealing with (say a book or two or a series), it is as simple as using Counter in python or an hashmap / dict / equivalent in other languages.

    There is also a cmd line solution I found,
    Code:
    cat potato.txt | tr '[:space:]' '[\n*]' | tr -d '[:punct:]' | grep -v "^\s*$" | sort | uniq -c | sort
     
    Gandire Alea likes this.
  4. Gandire Alea

    Gandire Alea [Wicked Awesome Translator]

    Joined:
    Jan 24, 2017
    Messages:
    4,016
    Likes Received:
    70,987
    Reading List:
    Link
    The format is a bit flexible.
    Ideally, it would be from a web page, but it can easily be copied into a word doc or even onto the application itself.
     
  5. Ai chan

    Ai chan Queen of Yuri, Devourer of Traps, Thrusted Witch

    Joined:
    Nov 7, 2015
    Messages:
    11,278
    Likes Received:
    24,346
    Reading List:
    Link
    If it's an image file, you use OCR. It would scan the image and output the texts, though it's not always accurate.

    If you're trying to get certain words calculated from a text file, paste it into https://wordcounter.net/

    Then look at Keyword Density section on the right side. It doesn't allow you to search for any particular keywords, but it gives the 30 most used keywords.
     
    Gandire Alea likes this.