German and english word list for download created from Wikipedia
For my last projet I needed a list of german words. But I did not found a good and free list. So I had the idea to create a word list from all Wikipedia articles.
Download free german and english word list
For those who only want to download the word list: (License Creative Commons as used by Wikipedia)
|All words from german Wikipedia||word_list_german_all.txt.7z|
|All spell checked words from german Wikipedia||word_list_german_spell_checked.txt.7z|
|All case insensitive spell checked words from german Wikipedia||word_list_german_uppercase_spell_checked.txt.7z|
|All words from english Wikipedia||word_list_english_all.txt.7z|
|All spell checked words from english Wikipedia||word_list_english_spell_checked.txt.7z|
|All case insensitive spell checked words from english Wikipedia||word_list_english_uppercase_spell_checked.txt.7z|
Those who want to create own word lists should read the following sections.
Create word list from Wikipedia
All articles from Wikipedia are available for offline reading. The whole Wikipedia can be downloaded as a compressed archive in Open ZIM format: https://download.kiwix.org/zim/wikipedia
There is a library called libzim to decompress and read the articles. This library is written in C++. A C++ library has the disadvantage that it is not easy to include in a C# program. But I like C# as programming language. Therefore I created a C library as a wrapper for libzim.
I used hunspell to check the words. In theory it is possible to use hunspell to create word lists. But not all created words make sense. So it is better to use it only for spell checking the words found in Wikipedia articles.
Program to create the word list
This program is written with Monodevelop on Linux. There are two versions, one for the console and one with a GUI. I hope the usage is self explaining. (License GPL v3)
Download sourcecode and binary: Woerterbuch.zip
(The binary is located in the directory WoerterbuchGUI/bin/debug)