5/1/2023 0 Comments Unique word counterWriting all unique words into file: C:\Users\charding\unique_War_and_Peace. Reading in C:\Users\charding\War_and_Peace.txt Part 3: Finding unique words and a mean value Part 4: Apply word count to a file Appendix A: Submitting your exercises to the Autograder. , eventually I will add the code for this adventure as well. If you are interested in the original text file and the file containing the 20465 unique words, look here: 1) Advance word counter: The main objective of this exercise is to produce a Java program in Eclipse that is capable of counting the unique words in a text. Now, if I just had any idea who these people are supposed to be - my guess would point to some romantic entanglement, what else could result in the two being the top mentioned names in War and Peace? Alas, I still have to read this magnum opus for myself, until them I'm left speculating that the importance of a person is linked to the frequency of their name. Iterate over the word array using for loop. Extract words from string using split () method and store them in an array. The idea to count the occurrence of the string in the string and print if count equals one. Now I can finally answer these burning questions: What persons are most often mentioned by name in War and Peace? Answer: Pierre, beating out Natasha, 1784 times to 1092 times. Methods: This can be done in the following ways: Using nested loops. Total time (on my 2008 Dell Vostro 17 laptop) needed to find 20,465 unique words within the total of 3,110,642 words write them into a file, reverse sort them by frequency and print out the top 100 most often used words: about 1.6 second s.Īfter this a got a bit more cocky and printed the top 1000 most often used words from Tolstoy's War and Peace into the python shell and that took a bit longer. not optimized) Python script on this monster - how many hours would I wait? Would my PC start to burn? To be fair, I removed the many debug print statements (as it would have produced too much text in the shell window, which seems rather slow in Idle) but added some info on the number of total on unique words and a simple way of measuring the time elapsed (the time() method from the time module gets the current time in float second). I hesitated to run my very simplistic (i.e. txt from Project Gutenberg and comes in at a whopping 3.2 Mb (uncompressed). print out the 10 most often occurring wordsĪfter trying my own code on a couple of Monty Python sketches (the biggest is 21 Kb) I wondered: What text file can I try that's really big? My first thought went to Tolstoy's War and Peace which can be downloaded as. sort the unique words by their count from most often to least often In your current code you can either increment uniqueWordCount in the else case where you already set countword, or just lookup the number of keys in the. write the list of unique words (the, is, etc.) into a new text file the occurs 20 times, is occurs 15 times, etc.) count how often each word occurs in the text I left links to what I believe are the two best answers to help you.- remove punctuation marks such as (.,()?!#:' etc. Sorry I didn't link to all of my suggestions, but I don't have the clout here at StackExchange to post all of them. DEVONnote may also have this capability, but it's been years since I've used it. Very useful for counting characters for Reddit posts Writing stories for various subs, I almost always encounter the character limit, and all other counters I find have small discrepancies that trip me up when editing. Well, a unique word count is readily available and the app has a bonus the capability of finding semantic connections among your documents using a pretty terrific artificial intelligence. bbdict(zip(list(aa),list(aa).count(i) for i in list(aa))) 3.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |