10 mb text file download
This will be on each line That took me about 3 tries to figure out. So just run that command with a random number for head count and the resultant size will guide you as to how to adjust it to get to the target size. Thanks — Ravi Teja. Um, no, not at all what you want for testing the effectiveness of your compressor. The resulting text is highly repetitive, and does not represent what a compressor will see in the real world. Do not use this answer.
See the compression corpora in the other answers. MarkAdler I think I have addressed the repetition concern with the edit. Show 3 more comments. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. Active Oldest Votes. Improve this answer. Overkillica Overkillica 2 2 silver badges 7 7 bronze badges. It depends on the book's length, I remember using a lot of books with the title encyclopedia, and on average each of them were about 10 MB plain text files.
I know it's not the whole dataset you're looking for, but you could join them and come up with the desired size. The great thing is that because those are english books, and not just a collection of random words, it's the perfect dataset to try on search algorithm in real situations, at least that's why I did.
Gabe Gabe Bradley Priest Bradley Priest 7, 1 1 gold badge 27 27 silver badges 33 33 bronze badges. Sign up or log in Sign up using Google.
Sign up using Facebook. Sign up using Email and Password. Not a massive hashtable. Something that I can simply read off the disk as pure text and not do any parsing.
Add a comment. Active Oldest Votes. Improve this answer. Overkillica Overkillica 2 2 silver badges 7 7 bronze badges. It depends on the book's length, I remember using a lot of books with the title encyclopedia, and on average each of them were about 10 MB plain text files.
I know it's not the whole dataset you're looking for, but you could join them and come up with the desired size. The great thing is that because those are english books, and not just a collection of random words, it's the perfect dataset to try on search algorithm in real situations, at least that's why I did.
Gabe Gabe Bradley Priest Bradley Priest 7, 1 1 gold badge 27 27 silver badges 33 33 bronze badges. This topic has been deleted. Only users with topic management privileges can see it. I have routinely edited MB files. There are a couple things to do: Disable sessions snapshots and periodic backups.
Again, thank you both!