Cracking Vigenère Cipher
Kasiski’s method
One method of cracking Vigenere cipher without knowing the keyword or length of keyword is Kasiski’s method. The goal of Kasiski’s method is to find the length of keyword, and after finding the length of keyword one can use Chi-square(χ2) method to find the keyword (to be discussed in the next part).
Kasiski method looks for patterns and repeated strings, and identifies mathematical relations within the cipher text, such as distance between repeated letters, to derive correct length of keyword with high probability. The general algorithm of Kasiski’s method is as follows:
- Find repeated strings of letters in cipher text.
- Count how many letters are between the first letter of repeated string and the first letter of next occurrence of repeated string, and add 1.
- Factor the number from Step 2.
- Repeat the process for each repeated string and take note of common factors among the repeated strings.
The most common factor is highly likely to be the length of keyword. To understand why, let’s look into Kasiski’s book Die Geheimschriften und die Dechiffrirkunst (Cryptography and the Art of Decryption) published in 1863. He observed that if a repeated string in plaintext is encrypted with the same string of keyword, the cipher text contained repeated strings, and the distance between the repeated strings were a multiple of the keyword length. Intuitively speaking, since keyword is repeated over the plaintext in Vigenere cipher, knowing two fixed points where repeated strings occur would logically mean that the keyword is either the length between the two points, or multiples or factors of the distance. The general algorithm introduced above suggests using factors instead of multiples of possible keyword length because in the case of multiple repeated strings in cipher text, it is easier to find keyword by comparing common factors rather than testing out every multiple of each repeated strings and then trying to find the length by trial-and-error. In fact, the possibilities of keyword length would increase in combinatorial manner the more repeated strings there are in cipher text; it is also unlikely that the keyword length would exceed the length of an astronomical number from the probabilistic combinations of multiple repeated strings.
Chi-square(χ2) method
Knowing the length of the keyword, we can find the keyword by chi-square method. The first step is to find ‘cosets’ of cipher text. For instance, if the keyword length is 3, we can make 3 cosets of cipher text by writing down every 3rd letter starting from the number of coset position.
Cipher text: NWAIWEBB RFQFOCJPUGDOJ VBGWSPTWRZ
Cosets:
1)N I B F O P D V W T Z
2)W W B Q C U O B S W
3)A E R F J G J G P R
Then since the same letter encrypts each coset, we can try to find the keyword based on the frequency of letters. Of the possible 26 shifts in the cosets, there would be one shift where the frequency is closest to English. This is where chi-squared method comes in. In probability, chi-squared measures how good data fits into regression line — in another words for cryptography, how well each shift fits into regression line of frequencies. The shift that produces smallest chi-squared value is the frequency closest to English.
References
“Polyalphabetic Substitution Cipher.” Polyalphabetic Substitution Cipher. Cornell University Department of Mathematics. Web. 12 Oct. 2015. <http://www.math.cornell.edu/~mec/2003-2004/cryptography/polyalpha/polyalpha.html>.