Creating a password list for WPA/WPA2 dictionary attacks

In this post I will discuss some options of creating your own and most likely unique list of passwords, which can be used for dictionary attacks against certain security setups like WPA. I have to insist that you read and understand the following disclaimers.

Disclaimer I: the information provided here is only to be used in attacks against your own setups or in all other cases with the permission of the owner. In any case you should check the laws in your country.

Disclaimer II: generating too much internet traffic on a website may get you into serious trouble. Again, check your local laws.

I have the feeling that I need to repeat for the dim-witted: make sure you know about your legal situation! Do not break the law!

There are security systems are very hard to crack, at least in theory. From a cryptographic point of view, certain elements of the WPA/WPA2 protections belong to this class, if they are set up correctly. Still there are many reports of protected Wi-Fis being breached. Among the possibilities to break into a WPA/WPA2-secured network we will focus on one in particular: a dictionary-based attack on a pre-shared key (PSK) by means of captured four-way handshake between a client and the access point. The PSK setup is most common in private households. Let me clarify one point: not everything about WPA/WPA2 is secure. Here I will consider exclusively the issue that lies in your own hand: the choice of the passphrase.

Getting a handshake is in principle easy using the aircrack-ng suite. If you want to test your own network and know how to do it, go ahead. In Layman’s terms the handshake is tested against a list of possible passwords until the correct one is found. The security of the WPA-PSK type Wi-Fis relies on the computationally involved process to simulate the handshake, which makes the attack very slow. On a typical computer you might have at home one can try a couple of thousand passwords per second. aircrack-ng and hashcat can do that for you. Given the slowness of the attack the good old brute-force approach is unlikely to work in practice. Also the minimum length of the key is 8 characters. Even if you were to try brute-force, the incremental mode of john-the-ripper, which you can pipe into e.g. aircrack, allows at most 8 characters. Well, let’s just accept that the brute-forcing options are very limited.

As an alternative one can try passwords from a dictionary. So technically an adversary does not really attack WPA/WPA2, but rather attacks weak passwords. There are certain dictionaries around, varying in size and usefulness, e.g. the enormous rockyou list, based on previously leaked and cracked password hashes. However, they typically contain a lot of passwords of length less than 8 and are not very well maintained. Often they are plagued by “mangling duplicates”, like “password” and “p455w0rd”. The second one can be easily generated from the first one by using simple word mangling rules. If both are present in the dictionary, jtr (and any other comparable tools I know) will try the variations of both entries independently, resulting in an overlap of identical tries and therefore wasted computational time. And yes, I just invented the word “mangling duplicates”. To what extent that actually matters I don’t know, but it bothers me. It should also be possible to clean up existing lists.

The lack of dictionaries with long passwords remains a problem though. So our goal for today is to make a dictionary with words of minimum length of 8 characters. We will use one of the biggest sources available: Wikipedia. A little command line magic will help us with that.

So let’s talk about the stuff needed:

  • internet access 😉
  • Linux shell with the usual command line tools installed; if you have a raspberry pi, one of the usual operating systems will do; I also got it working on a rooted Android device;
  • wget
  • gnuplot (optionally)

Any decent Linux distro will ship with the necessary framework, with the exception of gnuplot, which can be installed easily or can be skipped.

To collect words from Wikipedia one can iteratively download the “random article” and extract what we want. So let me just give you a bash script and explain it below.

echo "how many rounds:"
read imax
echo "delay in sec:"
read delay
touch wikidict.txt
touch sizedev.txt

for ((i=1; i<=imax; i++));
  echo "Round $i"
  wget -nv -O input.txt
  cat input.txt | tr '_' '\n' | tr -d '\r' | grep -o -w '[a-zA-Z]\{8,63\}' | tr '[:upper:]' '[:lower:]' | sort | uniq > append.txt
  sort -u -m wikidict.txt append.txt > wikidict.tmp.txt
  mv -f wikidict.tmp.txt wikidict.txt
  rm -f input.txt
  rm -f append.txt
  echo "STATS"
  ls -l wikidict.txt
  wc -l wikidict.txt
  wc -l < wikidict.txt >> sizedev.txt
  echo "END STATS"
  if(( i<imax )); then
    sleep "$delay"

gnuplot -e  "set terminal dumb;  plot \"sizedev.txt\"; exit; " | tr 'A' '*'

Save this script to a file and make it executable with chmod. So what does it do? It first reads the number of rounds from the user, i.e. the number of random articles to be processed. Between each round we insert a delay. This is important! Don’t flood Wikipedia with your downloads. Don’t exaggerate! Try to mimic normal user behavior. Recall the disclaimer II. Maybe a random delay would be more realistic, but I’m too lazy to code that right now. The touch commands afterwards are only relevant in the first run. Then comes the cycle of rounds. A round consists of the code in the for-loop. wget downloads the random article and saves it into input.txt. In the resulting text file each underscore is replaced by a newline, the carriage return is removed (if present), and the grep command isolates single words of length between 8 and 63 alphabetic characters. The subsequent piping converts everything to lowercase, sorts it, removes duplicates and finally writes the output into append.txt. The next few commands simply merge and sort append.txt with the existing wikidict.txt, again discarding duplicates. If you haven’t realized from the name, wikidict.txt will be dictionary. Temporary files are removed afterwards. At the end we print some statistics of the dictionary, like the file size and number of words. The number of words is appended to sizedev.txt. If you have gnuplot available, I included a cute little gimmick, namely a plot of how the file size has developed with the number of rounds. The result may look like this:

  25000 ++-----+-------+------+-------+------+-------+------+-------+-----++
        +      +       +      +       +      +       +"sizedev.txt" + *    +
        |                                                                 **
        |                                                           *******|
  20000 ++                                                    *******     ++
        |                                                ******            |
        |                                              ***                 |
        |                                                                  |
  15000 ++                                        *****                   ++
        |                                   ******                         |
        |                             *******                              |
  10000 ++                     ********                                   ++
        |                   ****                                           |
        |                ****                                              |
        |           *****                                                  |
   5000 ++      *****                                                     ++
        |    ****                                                          |
        |  ***                                                             |
        ****   +       +      +       +      +       +      +       +      +
      0 *+-----+-------+------+-------+------+-------+------+-------+-----++
        0      50     100    150     200    250     300    350     400    450

Of course you can run the script multiple times, all words will be added on top of your dictionary. It will grow, but mind the above precautions. Note that we will have only lowercase words without numbers or special characters. Making variations like “Pa$$word1963” is part of the word mangling every decent password cracker should have. It does not belong in the dictionary as mentioned above. If you think your dictionary is large enough you can look if it contains your password. If not, look out for passwords that can be easily transformed into your password. Example: your PSK passphrase is “M37411!c4”. It cannot be in the list by construction, but “metallica” most certainly is. You don’t actually need to run the attack with the four-way handshake. You already know it will be successful, since turning words into leetspeak with capitalized first letter is a standard mangling rule. Don’t waste time. Go ahead and change your PSK into something secure. Btw. some (but not all) online password security checking tools rate “M37411!c4” as a “very strong password”. Go check it out. It’s hilarious.

Before I conclude, you can have some more fun with the dictionary. Here is little script that makes a chart of the number of available words vs. their length.

rm -f stats.txt
touch stats.txt
for ((i=7; i<=63; i++));
echo "$i "|tr -d '\n\r'>> stats.txt
grep -o -w "[a-zA-Z]\{$i,$i\}" wikidict.txt -c >> stats.txt

gnuplot -e  "set terminal dumb; set xrange[7:25];  plot \"stats.txt\"; exit; " | tr 'A' '*'

Again it uses gnuplot and its super-awesome terminal plotting feature. If your dictionary gets large, the performance of the script will be very bad. The result may look like

  7000 ++--+------+-------+------+-------+-------+------+-------+------+--++
       |   +      +       +      +       +       +      "stats.txt"   *+   |
       |   *                                                               |
  6000 ++                                                                 ++
       |                                                                   |
  5000 ++      *                                                          ++
       |                                                                   |
       |                                                                   |
  4000 ++                                                                 ++
       |          *                                                        |
       |                                                                   |
  3000 ++                                                                 ++
       |              *                                                    |
       |                                                                   |
  2000 ++                                                                 ++
       |                  *                                                |
  1000 ++                     *                                           ++
       |                         *                                         |
       |   +      +       +      +   *   *   *   +      +       +      +   |
     0 ++--+------+-------+------+-------+-------*--*---*---*---*--*---*--+*
           8      10      12     14      16      18     20      22     24

It’s mostly what one would expect. The larger the words, the rarer they get.

If you haven’t thought about it already, by chance the dictionary can contain funny, rude, vulgar and swear expressions. Search with grep for them. It certainly increased my vocabulary by a couple of words, which I won’t share with you but I intend to use them on the “right occasion”.

So let me conclude with some final remarks:

  • In order for the dictionary to be effective it should be a couple of MBs large. On average, the rate of growth decreases from round to round. In fact looking at the first plot, I suspect something like a logarithmic growth. It may take many rounds to arrive at say 5 MB. If you intend pushing your dictionary that far, please consider making a donation to the Wikimedia foundation.
  • Merging your Wikipedia dictionary with other dictionaries suggests itself. However for WPA I believe that one shouldn’t exaggerate. A large dictionary will limit the number of mangling rules you can apply and I believe there has to be a certain trade-off between the two.
  • Most available password lists are strongly oriented towards English. Thus the impact of the newly Wikipedia-aquired words in other languages is probably more valuable.
  • Adding other Wikis or websites with a “random article” feature obviously helps to increase growth of your list.
  • If you want to be secure from dictionary attacks, I have an advice from an earlier post. Why not make a random 63 character key? Do the math of how long it would take to brute-force it.
  • How successful is this Wikipedia dictionary in cracking passwords? I don’t know. If you have any experience with it, we’re all eager to know.

About goobypl5

pizza baker, autodidact, particle physicist
This entry was posted in Passwords, Security and tagged , , , , , , , , , , . Bookmark the permalink.

One Response to Creating a password list for WPA/WPA2 dictionary attacks

  1. RonaldMuh says:

    A very nice niche blog, and a good design there sparks Simplicity yet complex algorithm of the internet. Thank You.

Share your thoughts

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s