/* | |
* Fast, portable, and easy-to-use Twofish implementation, | |
* Version 0.3. | |
* Copyright (c) 2002 by Niels Ferguson. | |
* (See further down for the almost-unrestricted licensing terms.) | |
* | |
* -------------------------------------------------------------------------- | |
* There are two files for this implementation: | |
* - twofish.h, the header file. | |
* - twofish.c, the code file. | |
* | |
* To incorporate this code into your program you should: | |
* - Check the licensing terms further down in this comment. | |
* - Fix the two type definitions in twofish.h to suit your platform. | |
* - Fix a few definitions in twofish.c in the section marked | |
* PLATFORM FIXES. There is one important ones that affects | |
* functionality, and then a few definitions that you can optimise | |
* for efficiency but those have no effect on the functionality. | |
* Don't change anything else. | |
* - Put the code in your project and compile it. | |
* | |
* To use this library you should: | |
* - Call Twofish_initialise() in your program before any other function in | |
* this library. | |
* - Use Twofish_prepare_key(...) to convert a key to internal form. | |
* - Use Twofish_encrypt(...) and Twofish_decrypt(...) to encrypt and decrypt | |
* data. | |
* See the comments in the header file for details on these functions. | |
* -------------------------------------------------------------------------- | |
* | |
* There are many Twofish implementation available for free on the web. | |
* Most of them are hard to integrate into your own program. | |
* As we like people to use our cipher, I thought I would make it easier. | |
* Here is a free and easy-to-integrate Twofish implementation in C. | |
* The latest version is always available from my personal home page at | |
* http://niels.ferguson.net/ | |
* | |
* Integrating library code into a project is difficult because the library | |
* header files interfere with the project's header files and code. | |
* And of course the project's header files interfere with the library code. | |
* I've tried to resolve these problems here. | |
* The header file of this implementation is very light-weight. | |
* It contains two typedefs, a structure, and a few function declarations. | |
* All names it defines start with "Twofish_". | |
* The header file is therefore unlikely to cause problems in your project. | |
* The code file of this implementation doesn't need to include the header | |
* files of the project. There is thus no danger of the project interfering | |
* with all the definitions and macros of the Twofish code. | |
* In most situations, all you need to do is fill in a few platform-specific | |
* definitions in the header file and code file, | |
* and you should be able to run the Twofish code in your project. | |
* I estimate it should take you less than an hour to integrate this code | |
* into your project, most of it spent reading the comments telling you what | |
* to do. | |
* | |
* For people using C++: it is very easy to wrap this library into a | |
* TwofishKey class. One of the big advantages is that you can automate the | |
* wiping of the key material in the destructor. I have not provided a C++ | |
* class because the interface depends too much on the abstract base class | |
* you use for block ciphers in your program, which I don't know about. | |
* | |
* This implementation is designed for use on PC-class machines. It uses the | |
* Twofish 'full' keying option which uses large tables. Total table size is | |
* around 5-6 kB for static tables plus 4.5 kB for each pre-processed key. | |
* If you need an implementation that uses less memory, | |
* take a look at Brian Gladman's code on his web site: | |
* http://fp.gladman.plus.com/cryptography_technology/aes/ | |
* He has code for all AES candidates. | |
* His Twofish code has lots of options trading off table size vs. speed. | |
* You can also take a look at the optimised code by Doug Whiting on the | |
* Twofish web site | |
* http://www.counterpane.com/twofish.html | |
* which has loads of options. | |
* I believe these existing implementations are harder to re-use because they | |
* are not clean libraries and they impose requirements on the environment. | |
* This implementation is very careful to minimise those, | |
* and should be easier to integrate into any larger program. | |
* | |
* The default mode of this implementation is fully portable as it uses no | |
* behaviour not defined in the C standard. (This is harder than you think.) | |
* If you have any problems porting the default mode, please let me know | |
* so that I can fix the problem. (But only if this code is at fault, I | |
* don't fix compilers.) | |
* Most of the platform fixes are related to non-portable but faster ways | |
* of implementing certain functions. | |
* | |
* In general I've tried to make the code as fast as possible, at the expense | |
* of memory and code size. However, C does impose limits, and this | |
* implementation will be slower than an optimised assembler implementation. | |
* But beware of assembler implementations: a good Pentium implementation | |
* uses completely different code than a good Pentium II implementation. | |
* You basically have to re-write the assembly code for every generation of | |
* processor. Unless you are severely pressed for speed, stick with C. | |
* | |
* The initialisation routine of this implementation contains a self-test. | |
* If initialisation succeeds without calling the fatal routine, then | |
* the implementation works. I don't think you can break the implementation | |
* in such a way that it still passes the tests, unless you are malicious. | |
* In other words: if the initialisation routine returns, | |
* you have successfully ported the implementation. | |
* (Or not implemented the fatal routine properly, but that is your problem.) | |
* | |
* I'm indebted to many people who helped me in one way or another to write | |
* this code. During the design of Twofish and the AES process I had very | |
* extensive discussions of all implementation issues with various people. | |
* Doug Whiting in particular provided a wealth of information. The Twofish | |
* team spent untold hours discussion various cipher features, and their | |
* implementation. Brian Gladman implemented all AES candidates in C, | |
* and we had some fruitful discussions on how to implement Twofish in C. | |
* Jan Nieuwenhuizen tested this code on Linux using GCC. | |
* | |
* Now for the license: | |
* The author hereby grants a perpetual license to everybody to | |
* use this code for any purpose as long as the copyright message is included | |
* in the source code of this or any derived work. | |
* | |
* Yes, this means that you, your company, your club, and anyone else | |
* can use this code anywhere you want. You can change it and distribute it | |
* under the GPL, include it in your commercial product without releasing | |
* the source code, put it on the web, etc. | |
* The only thing you cannot do is remove my copyright message, | |
* or distribute any source code based on this implementation that does not | |
* include my copyright message. | |
* | |
* I appreciate a mention in the documentation or credits, | |
* but I understand if that is difficult to do. | |
* I also appreciate it if you tell me where and why you used my code. | |
* | |
* Please send any questions or comments to niels@ferguson.net | |
* | |
* Have Fun! | |
* | |
* Niels | |
*/ | |
/* | |
* DISCLAIMER: As I'm giving away my work for free, I'm of course not going | |
* to accept any liability of any form. This code, or the Twofish cipher, | |
* might very well be flawed; you have been warned. | |
* This software is provided as-is, without any kind of warrenty or | |
* guarantee. And that is really all you can expect when you download | |
* code for free from the Internet. | |
* | |
* I think it is really sad that disclaimers like this seem to be necessary. | |
* If people only had a little bit more common sense, and didn't come | |
* whining like little children every time something happens.... | |
*/ | |
/* | |
* Version history: | |
* Version 0.0, 2002-08-30 | |
* First written. | |
* Version 0.1, 2002-09-03 | |
* Added disclaimer. Improved self-tests. | |
* Version 0.2, 2002-09-09 | |
* Removed last non-portabilities. Default now works completely within | |
* the C standard. UInt32 can be larger than 32 bits without problems. | |
* Version 0.3, 2002-09-28 | |
* Bugfix: use instead of to adhere to ANSI/ISO. | |
* Rename BIG_ENDIAN macro to CPU_IS_BIG_ENDIAN. The gcc library | |
* header already defines BIG_ENDIAN, even though it is not | |
* supposed to. | |
*/ | |
/* | |
* Minimum set of include files. | |
* You should not need any application-specific include files for this code. | |
* In fact, adding you own header files could break one of the many macros or | |
* functions in this file. Be very careful. | |
* Standard include files will probably be ok. | |
*/ | |
#include <stdio.h> | |
#include <string.h> | |
#include <stdlib.h> | |
/* #include * for memset(), memcpy(), and memcmp() */ | |
#include "twofish.h" | |
/* | |
* PLATFORM FIXES | |
* ============== | |
* | |
* Fix the type definitions in twofish.h first! | |
* | |
* The following definitions have to be fixed for each particular platform | |
* you work on. If you have a multi-platform program, you no doubt have | |
* portable definitions that you can substitute here without changing the | |
* rest of the code. | |
*/ | |
/* | |
* Function called if something is fatally wrong with the implementation. | |
* This fatal function is called when a coding error is detected in the | |
* Twofish implementation, or when somebody passes an obviously erroneous | |
* parameter to this implementation. There is not much you can do when | |
* the code contains bugs, so we just stop. | |
* | |
* The argument is a string. Ideally the fatal function prints this string | |
* as an error message. Whatever else this function does, it should never | |
* return. A typical implementation would stop the program completely after | |
* printing the error message. | |
* | |
* This default implementation is not very useful, | |
* but does not assume anything about your environment. | |
* It will at least let you know something is wrong.... | |
* I didn't want to include any libraries to print and error or so, | |
* as this makes the code much harder to integrate in a project. | |
* | |
* Note that the Twofish_fatal function may not return to the caller. | |
* Unfortunately this is not something the self-test can test for, | |
* so you have to make sure of this yourself. | |
* | |
* If you want to call an external function, be careful about including | |
* your own header files here. This code uses a lot of macros, and your | |
* header file could easily break it. Maybe the best solution is to use | |
* a separate extern statement for your fatal function. | |
*/ | |
/* #define Twofish_fatal(pmsgx) { fprintf(stderr, pmsgx); exit(1); } */ | |
#define Twofish_fatal(pmsgx, code) { return(code); } | |
/* | |
* The rest of the settings are not important for the functionality | |
* of this Twofish implementation. That is, their default settings | |
* work on all platforms. You can change them to improve the | |
* speed of the implementation on your platform. Erroneous settings | |
* will result in erroneous implementations, but the self-test should | |
* catch those. | |
*/ | |
/* | |
* Macros to rotate a Twofish_UInt32 value left or right by the | |
* specified number of bits. This should be a 32-bit rotation, | |
* and not rotation of, say, 64-bit values. | |
* | |
* Every encryption or decryption operation uses 32 of these rotations, | |
* so it is a good idea to make these macros efficient. | |
* | |
* This fully portable definition has one piece of tricky stuff. | |
* The UInt32 might be larger than 32 bits, so we have to mask | |
* any higher bits off. The simplest way to do this is to 'and' the | |
* value first with 0xffffffff and then shift it right. An optimising | |
* compiler that has a 32-bit type can optimise this 'and' away. | |
* | |
* Unfortunately there is no portable way of writing the constant | |
* 0xffffffff. You don't know which suffix to use (U, or UL?) | |
* The UINT32_MASK definition uses a bit of trickery. Shift-left | |
* is only defined if the shift amount is strictly less than the size | |
* of the UInt32, so we can't use (1<<32). The answer it to take the value | |
* 2, cast it to a UInt32, shift it left 31 positions, and subtract one. | |
* Another example of how to make something very simple extremely difficult. | |
* I hate C. | |
* | |
* The rotation macros are straightforward. | |
* They are only applied to UInt32 values, which are _unsigned_ | |
* so the >> operator must do a logical shift that brings in zeroes. | |
* On most platforms you will only need to optimise the ROL32 macro; the | |
* ROR32 macro is not inefficient on an optimising compiler as all rotation | |
* amounts in this code are known at compile time. | |
* | |
* On many platforms there is a faster solution. | |
* For example, MS compilers have the __rotl and __rotr functions | |
* that generate x86 rotation instructions. | |
*/ | |
#define UINT32_MASK ( (((Twofish_UInt32)2)<<31) - 1 ) | |
#ifndef _MSC_VER | |
#define ROL32(x,n) ( (x)<<(n) | ((x) & UINT32_MASK) >> (32-(n)) ) | |
#define ROR32(x,n) ( (x)>>(n) | ((x) & UINT32_MASK) << (32-(n)) ) | |
#else | |
#define ROL32(x,n) (_lrotl((x), (n))) | |
#define ROR32(x,n) (_lrotr((x), (n))) | |
#endif | |
/* | |
* Select data type for q-table entries. | |
* | |
* Larger entry types cost more memory (1.5 kB), and might be faster | |
* or slower depending on the CPU and compiler details. | |
* | |
* This choice only affects the static data size and the key setup speed. | |
* Functionality, expanded key size, or encryption speed are not affected. | |
* Define to 1 to get large q-table entries. | |
*/ | |
#define LARGE_Q_TABLE 0 /* default = 0 */ | |
/* | |
* Method to select a single byte from a UInt32. | |
* WARNING: non-portable code if set; might not work on all platforms. | |
* | |
* Inside the inner loop of Twofish it is necessary to access the 4 | |
* individual bytes of a UInt32. This can be done using either shifts | |
* and masks, or memory accesses. | |
* | |
* Set to 0 to use shift and mask operations for the byte selection. | |
* This is more ALU intensive. It is also fully portable. | |
* | |
* Set to 1 to use memory accesses. The UInt32 is stored in memory and | |
* the individual bytes are read from memory one at a time. | |
* This solution is more memory-intensive, and not fully portable. | |
* It might be faster on your platform, or not. If you use this option, | |
* make sure you set the CPU_IS_BIG_ENDIAN flag appropriately. | |
* | |
* This macro does not affect the conversion of the inputs and outputs | |
* of the cipher. See the CONVERT_USING_CASTS macro for that. | |
*/ | |
#define SELECT_BYTE_FROM_UINT32_IN_MEMORY 0 /* default = 0 */ | |
/* | |
* Method used to read the input and write the output. | |
* WARNING: non-portable code if set; might not work on all platforms. | |
* | |
* Twofish operates on 32-bit words. The input to the cipher is | |
* a byte array, as is the output. The portable method of doing the | |
* conversion is a bunch of rotate and mask operations, but on many | |
* platforms it can be done faster using a cast. | |
* This only works if your CPU allows UInt32 accesses to arbitrary Byte | |
* addresses. | |
* | |
* Set to 0 to use the shift and mask operations. This is fully | |
* portable. . | |
* | |
* Set to 1 to use a cast. The Byte * is cast to a UInt32 *, and a | |
* UInt32 is read. If necessary (as indicated by the CPU_IS_BIG_ENDIAN | |
* macro) the byte order in the UInt32 is swapped. The reverse is done | |
* to write the output of the encryption/decryption. Make sure you set | |
* the CPU_IS_BIG_ENDIAN flag appropriately. | |
* This option does not work unless a UInt32 is exactly 32 bits. | |
* | |
* This macro only changes the reading/writing of the plaintext/ciphertext. | |
* See the SELECT_BYTE_FROM_UINT32_IN_MEMORY to affect the way in which | |
* a UInt32 is split into 4 bytes for the S-box selection. | |
*/ | |
#define CONVERT_USING_CASTS 0 /* default = 0 */ | |
/* | |
* Endianness switch. | |
* Only relevant if SELECT_BYTE_FROM_UINT32_IN_MEMORY or | |
* CONVERT_USING_CASTS is set. | |
* | |
* Set to 1 on a big-endian machine, and to 0 on a little-endian machine. | |
* Twofish uses the little-endian convention (least significant byte first) | |
* and big-endian machines (using most significant byte first) | |
* have to do a few conversions. | |
* | |
* CAUTION: This code has never been tested on a big-endian machine, | |
* because I don't have access to one. Feedback appreciated. | |
*/ | |
#define CPU_IS_BIG_ENDIAN 0 | |
/* | |
* Macro to reverse the order of the bytes in a UInt32. | |
* Used to convert to little-endian on big-endian machines. | |
* This macro is always tested, but only used in the encryption and | |
* decryption if CONVERT_USING_CASTS, and CPU_IS_BIG_ENDIAN | |
* are both set. In other words: this macro is only speed-critical if | |
* both these flags have been set. | |
* | |
* This default definition of SWAP works, but on many platforms there is a | |
* more efficient implementation. | |
*/ | |
#define BSWAP(x) ((ROL32((x),8)&0x00ff00ff) | (ROR32((x),8) & 0xff00ff00)) | |
/* | |
* END OF PLATFORM FIXES | |
* ===================== | |
* | |
* You should not have to touch the rest of this file. | |
*/ | |
/* | |
* Convert the external type names to some that are easier to use inside | |
* this file. I didn't want to use the names Byte and UInt32 in the | |
* header file, because many programs already define them and using two | |
* conventions at once can be very difficult. | |
* Don't change these definitions! Change the originals | |
* in twofish.h instead. | |
*/ | |
/* A Byte must be an unsigned integer, 8 bits long. */ | |
/* typedef Twofish_Byte Byte; */ | |
/* A UInt32 must be an unsigned integer at least 32 bits long. */ | |
/* typedef Twofish_UInt32 UInt32; */ | |
/* | |
* Define a macro ENDIAN_CONVERT. | |
* | |
* We define a macro ENDIAN_CONVERT that performs a BSWAP on big-endian | |
* machines, and is the identity function on little-endian machines. | |
* The code then uses this macro without considering the endianness. | |
*/ | |
#if CPU_IS_BIG_ENDIAN | |
#define ENDIAN_CONVERT(x) BSWAP(x) | |
#else | |
#define ENDIAN_CONVERT(x) (x) | |
#endif | |
/* | |
* Compute byte offset within a UInt32 stored in memory. | |
* | |
* This is only used when SELECT_BYTE_FROM_UINT32_IN_MEMORY is set. | |
* | |
* The input is the byte number 0..3, 0 for least significant. | |
* Note the use of sizeof() to support UInt32 types that are larger | |
* than 4 bytes. | |
*/ | |
#if CPU_IS_BIG_ENDIAN | |
#define BYTE_OFFSET( n ) (sizeof(Twofish_UInt32) - 1 - (n) ) | |
#else | |
#define BYTE_OFFSET( n ) (n) | |
#endif | |
/* | |
* Macro to get Byte no. b from UInt32 value X. | |
* We use two different definition, depending on the settings. | |
*/ | |
#if SELECT_BYTE_FROM_UINT32_IN_MEMORY | |
/* Pick the byte from the memory in which X is stored. */ | |
#define SELECT_BYTE( X, b ) (((Twofish_Byte *)(&(X)))[BYTE_OFFSET(b)]) | |
#else | |
/* Portable solution: Pick the byte directly from the X value. */ | |
#define SELECT_BYTE( X, b ) (((X) >> (8*(b))) & 0xff) | |
#endif | |
/* Some shorthands because we use byte selection in large formulae. */ | |
#define b0(X) SELECT_BYTE((X),0) | |
#define b1(X) SELECT_BYTE((X),1) | |
#define b2(X) SELECT_BYTE((X),2) | |
#define b3(X) SELECT_BYTE((X),3) | |
/* | |
* We need macros to load and store UInt32 from/to byte arrays | |
* using the least-significant-byte-first convention. | |
* | |
* GET32( p ) gets a UInt32 in lsb-first form from four bytes pointed to | |
* by p. | |
* PUT32( v, p ) writes the UInt32 value v at address p in lsb-first form. | |
*/ | |
#if CONVERT_USING_CASTS | |
/* Get UInt32 from four bytes pointed to by p. */ | |
#define GET32( p ) ENDIAN_CONVERT( *((Twofish_UInt32 *)(p)) ) | |
/* Put UInt32 into four bytes pointed to by p */ | |
#define PUT32( v, p ) *((Twofish_UInt32 *)(p)) = ENDIAN_CONVERT(v) | |
#else | |
/* Get UInt32 from four bytes pointed to by p. */ | |
#define GET32( p ) \ | |
( \ | |
(Twofish_UInt32)((p)[0]) \ | |
| (Twofish_UInt32)((p)[1])<< 8 \ | |
| (Twofish_UInt32)((p)[2])<<16 \ | |
| (Twofish_UInt32)((p)[3])<<24 \ | |
) | |
/* Put UInt32 into four bytes pointed to by p */ | |
#define PUT32( v, p ) \ | |
(p)[0] = (Twofish_Byte)(((v) ) & 0xff); \ | |
(p)[1] = (Twofish_Byte)(((v) >> 8) & 0xff); \ | |
(p)[2] = (Twofish_Byte)(((v) >> 16) & 0xff); \ | |
(p)[3] = (Twofish_Byte)(((v) >> 24) & 0xff) | |
#endif | |
/* | |
* Test the platform-specific macros. | |
* This function tests the macros defined so far to make sure the | |
* definitions are appropriate for this platform. | |
* If you make any mistake in the platform configuration, this should detect | |
* that and inform you what went wrong. | |
* Somewhere, someday, this is going to save somebody a lot of time, | |
* because misbehaving macros are hard to debug. | |
*/ | |
static int test_platform() | |
{ | |
/* Buffer with test values. */ | |
Twofish_Byte buf[] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xbc, 0xde, 0}; | |
Twofish_UInt32 C; | |
Twofish_UInt32 x,y; | |
int i; | |
/* | |
* Some sanity checks on the types that can't be done in compile time. | |
* A smart compiler will just optimise these tests away. | |
* The pre-processor doesn't understand different types, so we cannot | |
* do these checks in compile-time. | |
* | |
* I hate C. | |
* | |
* The first check in each case is to make sure the size is correct. | |
* The second check is to ensure that it is an unsigned type. | |
*/ | |
if( ((Twofish_UInt32)((Twofish_UInt32)1 << 31) == 0) || ((Twofish_UInt32)-1 < 0 )) | |
{ | |
Twofish_fatal( "Twofish code: Twofish_UInt32 type not suitable", ERR_UINT32 ); | |
} | |
if( (sizeof( Twofish_Byte ) != 1) || (((Twofish_Byte)-1) < 0) ) | |
{ | |
Twofish_fatal( "Twofish code: Twofish_Byte type not suitable", ERR_BYTE ); | |
} | |
/* | |
* Sanity-check the endianness conversions. | |
* This is just an aid to find problems. If you do the endianness | |
* conversion macros wrong you will fail the full cipher test, | |
* but that does not help you find the error. | |
* Always make it easy to find the bugs! | |
* | |
* Detail: There is no fully portable way of writing UInt32 constants, | |
* as you don't know whether to use the U or UL suffix. Using only U you | |
* might only be allowed 16-bit constants. Using UL you might get 64-bit | |
* constants which cannot be stored in a UInt32 without warnings, and | |
* which generally behave subtly different from a true UInt32. | |
* As long as we're just comparing with the constant, | |
* we can always use the UL suffix and at worst lose some efficiency. | |
* I use a separate '32-bit constant' macro in most of my other code. | |
* | |
* I hate C. | |
* | |
* Start with testing GET32. We test it on all positions modulo 4 | |
* to make sure we can handly any position of inputs. (Some CPUs | |
* do not allow non-aligned accesses which we would do if you used | |
* the CONVERT_USING_CASTS option. | |
*/ | |
if( (GET32( buf ) != 0x78563412UL) || (GET32(buf+1) != 0x9a785634UL) | |
|| (GET32( buf+2 ) != 0xbc9a7856UL) || (GET32(buf+3) != 0xdebc9a78UL) ) | |
{ | |
Twofish_fatal( "Twofish code: GET32 not implemented properly", ERR_GET32 ); | |
} | |
/* | |
* We can now use GET32 to test PUT32. | |
* We don't test the shifted versions. If GET32 can do that then | |
* so should PUT32. | |
*/ | |
C = GET32( buf ); | |
PUT32( 3*C, buf ); | |
if( GET32( buf ) != 0x69029c36UL ) | |
{ | |
Twofish_fatal( "Twofish code: PUT32 not implemented properly", ERR_PUT32 ); | |
} | |
/* Test ROL and ROR */ | |
for( i=1; i<32; i++ ) | |
{ | |
/* Just a simple test. */ | |
x = ROR32( C, i ); | |
y = ROL32( C, i ); | |
x ^= (C>>i) ^ (C<<(32-i)); | |
/*y ^= (C<>(32-i)); */ | |
y ^= (C<<i) ^ (C>>(32-i)); | |
x |= y; | |
/* | |
* Now all we check is that x is zero in the least significant | |
* 32 bits. Using the UL suffix is safe here, as it doesn't matter | |
* if we get a larger type. | |
*/ | |
if( (x & 0xffffffffUL) != 0 ) | |
{ | |
Twofish_fatal( "Twofish ROL or ROR not properly defined.", ERR_ROLR ); | |
} | |
} | |
/* Test the BSWAP macro */ | |
if( BSWAP(C) != 0x12345678UL ) | |
{ | |
/* | |
* The BSWAP macro should always work, even if you are not using it. | |
* A smart optimising compiler will just remove this entire test. | |
*/ | |
Twofish_fatal( "BSWAP not properly defined.", ERR_BSWAP ); | |
} | |
/* And we can test the b macros which use SELECT_BYTE. */ | |
if( (b0(C)!=0x12) || (b1(C) != 0x34) || (b2(C) != 0x56) || (b3(C) != 0x78) ) | |
{ | |
/* | |
* There are many reasons why this could fail. | |
* Most likely is that CPU_IS_BIG_ENDIAN has the wrong value. | |
*/ | |
Twofish_fatal( "Twofish code: SELECT_BYTE not implemented properly", ERR_SELECTB ); | |
} | |
return SUCCESS; | |
} | |
/* | |
* Finally, we can start on the Twofish-related code. | |
* You really need the Twofish specifications to understand this code. The | |
* best source is the Twofish book: | |
* "The Twofish Encryption Algorithm", by Bruce Schneier, John Kelsey, | |
* Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson. | |
* you can also use the AES submission document of Twofish, which is | |
* available from my list of publications on my personal web site at | |
* http://niels.ferguson.net/. | |
* | |
* The first thing we do is write the testing routines. This is what the | |
* implementation has to satisfy in the end. We only test the external | |
* behaviour of the implementation of course. | |
*/ | |
/* | |
* Perform a single self test on a (plaintext,ciphertext,key) triple. | |
* Arguments: | |
* key array of key bytes | |
* key_len length of key in bytes | |
* p plaintext | |
* c ciphertext | |
*/ | |
static int test_vector( Twofish_Byte key[], int key_len, Twofish_Byte p[16], Twofish_Byte c[16] ) | |
{ | |
Twofish_Byte tmp[16]; /* scratch pad. */ | |
Twofish_key xkey; /* The expanded key */ | |
int i; | |
/* Prepare the key */ | |
if ((i = Twofish_prepare_key( key, key_len, &xkey)) < 0) | |
return i; | |
/* | |
* We run the test twice to ensure that the xkey structure | |
* is not damaged by the first encryption. | |
* Those are hideous bugs to find if you get them in an application. | |
*/ | |
for( i=0; i<2; i++ ) | |
{ | |
/* Encrypt and test */ | |
Twofish_encrypt( &xkey, p, tmp ); | |
if( memcmp( c, tmp, 16 ) != 0 ) | |
{ | |
Twofish_fatal( "Twofish encryption failure", ERR_TEST_ENC ); | |
} | |
/* Decrypt and test */ | |
Twofish_decrypt( &xkey, c, tmp ); | |
if( memcmp( p, tmp, 16 ) != 0 ) | |
{ | |
Twofish_fatal( "Twofish decryption failure", ERR_TEST_DEC ); | |
} | |
} | |
/* The test keys are not secret, so we don't need to wipe xkey. */ | |
return SUCCESS; | |
} | |
/* | |
* Check implementation using three (key,plaintext,ciphertext) | |
* test vectors, one for each major key length. | |
* | |
* This is an absolutely minimal self-test. | |
* This routine does not test odd-sized keys. | |
*/ | |
static int test_vectors() | |
{ | |
/* | |
* We run three tests, one for each major key length. | |
* These test vectors come from the Twofish specification. | |
* One encryption and one decryption using randomish data and key | |
* will detect almost any error, especially since we generate the | |
* tables ourselves, so we don't have the problem of a single | |
* damaged table entry in the source. | |
*/ | |
/* 128-bit test is the I=3 case of section B.2 of the Twofish book. */ | |
static Twofish_Byte k128[] = { | |
0x9F, 0x58, 0x9F, 0x5C, 0xF6, 0x12, 0x2C, 0x32, | |
0xB6, 0xBF, 0xEC, 0x2F, 0x2A, 0xE8, 0xC3, 0x5A, | |
}; | |
static Twofish_Byte p128[] = { | |
0xD4, 0x91, 0xDB, 0x16, 0xE7, 0xB1, 0xC3, 0x9E, | |
0x86, 0xCB, 0x08, 0x6B, 0x78, 0x9F, 0x54, 0x19 | |
}; | |
static Twofish_Byte c128[] = { | |
0x01, 0x9F, 0x98, 0x09, 0xDE, 0x17, 0x11, 0x85, | |
0x8F, 0xAA, 0xC3, 0xA3, 0xBA, 0x20, 0xFB, 0xC3 | |
}; | |
/* 192-bit test is the I=4 case of section B.2 of the Twofish book. */ | |
static Twofish_Byte k192[] = { | |
0x88, 0xB2, 0xB2, 0x70, 0x6B, 0x10, 0x5E, 0x36, | |
0xB4, 0x46, 0xBB, 0x6D, 0x73, 0x1A, 0x1E, 0x88, | |
0xEF, 0xA7, 0x1F, 0x78, 0x89, 0x65, 0xBD, 0x44 | |
}; | |
static Twofish_Byte p192[] = { | |
0x39, 0xDA, 0x69, 0xD6, 0xBA, 0x49, 0x97, 0xD5, | |
0x85, 0xB6, 0xDC, 0x07, 0x3C, 0xA3, 0x41, 0xB2 | |
}; | |
static Twofish_Byte c192[] = { | |
0x18, 0x2B, 0x02, 0xD8, 0x14, 0x97, 0xEA, 0x45, | |
0xF9, 0xDA, 0xAC, 0xDC, 0x29, 0x19, 0x3A, 0x65 | |
}; | |
/* 256-bit test is the I=4 case of section B.2 of the Twofish book. */ | |
static Twofish_Byte k256[] = { | |
0xD4, 0x3B, 0xB7, 0x55, 0x6E, 0xA3, 0x2E, 0x46, | |
0xF2, 0xA2, 0x82, 0xB7, 0xD4, 0x5B, 0x4E, 0x0D, | |
0x57, 0xFF, 0x73, 0x9D, 0x4D, 0xC9, 0x2C, 0x1B, | |
0xD7, 0xFC, 0x01, 0x70, 0x0C, 0xC8, 0x21, 0x6F | |
}; | |
static Twofish_Byte p256[] = { | |
0x90, 0xAF, 0xE9, 0x1B, 0xB2, 0x88, 0x54, 0x4F, | |
0x2C, 0x32, 0xDC, 0x23, 0x9B, 0x26, 0x35, 0xE6 | |
}; | |
static Twofish_Byte c256[] = { | |
0x6C, 0xB4, 0x56, 0x1C, 0x40, 0xBF, 0x0A, 0x97, | |
0x05, 0x93, 0x1C, 0xB6, 0xD4, 0x08, 0xE7, 0xFA | |
}; | |
int ret; | |
/* Run the actual tests. */ | |
if ((ret = test_vector( k128, 16, p128, c128 )) < 0) | |
return ret; | |
if ((ret = test_vector( k192, 24, p192, c192 )) < 0) | |
return ret; | |
if ((ret = test_vector( k256, 32, p256, c256 )) < 0) | |
return ret; | |
return SUCCESS; | |
} | |
/* | |
* Perform extensive test for a single key size. | |
* | |
* Test a single key size against the test vectors from section | |
* B.2 in the Twofish book. This is a sequence of 49 encryptions | |
* and decryptions. Each plaintext is equal to the ciphertext of | |
* the previous encryption. The key is made up from the ciphertext | |
* two and three encryptions ago. Both plaintext and key start | |
* at the zero value. | |
* We should have designed a cleaner recurrence relation for | |
* these tests, but it is too late for that now. At least we learned | |
* how to do it better next time. | |
* For details see appendix B of the book. | |
* | |
* Arguments: | |
* key_len Number of bytes of key | |
* final_value Final plaintext value after 49 iterations | |
*/ | |
static int test_sequence( int key_len, Twofish_Byte final_value[] ) | |
{ | |
Twofish_Byte buf[ (50+3)*16 ]; /* Buffer to hold our computation values. */ | |
Twofish_Byte tmp[16]; /* Temp for testing the decryption. */ | |
Twofish_key xkey; /* The expanded key */ | |
int i, ret; | |
Twofish_Byte * p; | |
/* Wipe the buffer */ | |
memset( buf, 0, sizeof( buf ) ); | |
/* | |
* Because the recurrence relation is done in an inconvenient manner | |
* we end up looping backwards over the buffer. | |
*/ | |
/* Pointer in buffer points to current plaintext. */ | |
p = &buf[50*16]; | |
for( i=1; i<50; i++ ) | |
{ | |
/* | |
* Prepare a key. | |
* This automatically checks that key_len is valid. | |
*/ | |
if ((ret = Twofish_prepare_key( p+16, key_len, &xkey)) < 0) | |
return ret; | |
/* Compute the next 16 bytes in the buffer */ | |
Twofish_encrypt( &xkey, p, p-16 ); | |
/* Check that the decryption is correct. */ | |
Twofish_decrypt( &xkey, p-16, tmp ); | |
if( memcmp( tmp, p, 16 ) != 0 ) | |
{ | |
Twofish_fatal( "Twofish decryption failure in sequence", ERR_SEQ_DEC ); | |
} | |
/* Move on to next 16 bytes in the buffer. */ | |
p -= 16; | |
} | |
/* And check the final value. */ | |
if( memcmp( p, final_value, 16 ) != 0 ) | |
{ | |
Twofish_fatal( "Twofish encryption failure in sequence", ERR_SEQ_ENC ); | |
} | |
/* None of the data was secret, so there is no need to wipe anything. */ | |
return SUCCESS; | |
} | |
/* | |
* Run all three sequence tests from the Twofish test vectors. | |
* | |
* This checks the most extensive test vectors currently available | |
* for Twofish. The data is from the Twofish book, appendix B.2. | |
*/ | |
static int test_sequences() | |
{ | |
static Twofish_Byte r128[] = { | |
0x5D, 0x9D, 0x4E, 0xEF, 0xFA, 0x91, 0x51, 0x57, | |
0x55, 0x24, 0xF1, 0x15, 0x81, 0x5A, 0x12, 0xE0 | |
}; | |
static Twofish_Byte r192[] = { | |
0xE7, 0x54, 0x49, 0x21, 0x2B, 0xEE, 0xF9, 0xF4, | |
0xA3, 0x90, 0xBD, 0x86, 0x0A, 0x64, 0x09, 0x41 | |
}; | |
static Twofish_Byte r256[] = { | |
0x37, 0xFE, 0x26, 0xFF, 0x1C, 0xF6, 0x61, 0x75, | |
0xF5, 0xDD, 0xF4, 0xC3, 0x3B, 0x97, 0xA2, 0x05 | |
}; | |
/* Run the three sequence test vectors */ | |
int ret; | |
if ((ret = test_sequence( 16, r128)) < 0) | |
return ret; | |
if ((ret = test_sequence( 24, r192)) < 0) | |
return ret; | |
if ((ret = test_sequence( 32, r256)) < 0) | |
return ret; | |
return SUCCESS; | |
} | |
/* | |
* Test the odd-sized keys. | |
* | |
* Every odd-sized key is equivalent to a one of 128, 192, or 256 bits. | |
* The equivalent key is found by padding at the end with zero bytes | |
* until a regular key size is reached. | |
* | |
* We just test that the key expansion routine behaves properly. | |
* If the expanded keys are identical, then the encryptions and decryptions | |
* will behave the same. | |
*/ | |
static int test_odd_sized_keys() | |
{ | |
Twofish_Byte buf[32]; | |
Twofish_key xkey; | |
Twofish_key xkey_two; | |
int i, ret; | |
/* | |
* We first create an all-zero key to use as PRNG key. | |
* Normally we would not have to fill the buffer with zeroes, as we could | |
* just pass a zero key length to the Twofish_prepare_key function. | |
* However, this relies on using odd-sized keys, and those are just the | |
* ones we are testing here. We can't use an untested function to test | |
* itself. | |
*/ | |
memset( buf, 0, sizeof( buf ) ); | |
if ((ret = Twofish_prepare_key( buf, 16, &xkey)) < 0) | |
return ret; | |
/* Fill buffer with pseudo-random data derived from two encryptions */ | |
Twofish_encrypt( &xkey, buf, buf ); | |
Twofish_encrypt( &xkey, buf, buf+16 ); | |
/* Create all possible shorter keys that are prefixes of the buffer. */ | |
for( i=31; i>=0; i-- ) | |
{ | |
/* Set a byte to zero. This is the new padding byte */ | |
buf[i] = 0; | |
/* Expand the key with only i bytes of length */ | |
if ((ret = Twofish_prepare_key( buf, i, &xkey)) < 0) | |
return ret; | |
/* Expand the corresponding padded key of regular length */ | |
if ((ret = Twofish_prepare_key( buf, i<=16 ? 16 : (i<= 24 ? 24 : 32), &xkey_two )) < 0) | |
return ret; | |
/* Compare the two */ | |
if( memcmp( &xkey, &xkey_two, sizeof( xkey ) ) != 0 ) | |
{ | |
Twofish_fatal( "Odd sized keys do not expand properly", ERR_ODD_KEY ); | |
} | |
} | |
/* None of the key values are secret, so we don't need to wipe them. */ | |
return SUCCESS; | |
} | |
/* | |
* Test the Twofish implementation. | |
* | |
* This routine runs all the self tests, in order of importance. | |
* It is called by the Twofish_initialise routine. | |
* | |
* In almost all applications the cost of running the self tests during | |
* initialisation is insignificant, especially | |
* compared to the time it takes to load the application from disk. | |
* If you are very pressed for initialisation performance, | |
* you could remove some of the tests. Make sure you did run them | |
* once in the software and hardware configuration you are using. | |
*/ | |
static int self_test() | |
{ | |
int ret; | |
/* The three test vectors form an absolute minimal test set. */ | |
if ((ret = test_vectors()) < 0) | |
return ret; | |
/* | |
* If at all possible you should run these tests too. They take | |
* more time, but provide a more thorough coverage. | |
*/ | |
if ((ret = test_sequences()) < 0) | |
return ret; | |
/* Test the odd-sized keys. */ | |
if ((ret = test_odd_sized_keys()) < 0) | |
return ret; | |
return SUCCESS; | |
} | |
/* | |
* And now, the actual Twofish implementation. | |
* | |
* This implementation generates all the tables during initialisation. | |
* I don't like large tables in the code, especially since they are easily | |
* damaged in the source without anyone noticing it. You need code to | |
* generate them anyway, and this way all the code is close together. | |
* Generating them in the application leads to a smaller executable | |
* (the code is smaller than the tables it generates) and a | |
* larger static memory footprint. | |
* | |
* Twofish can be implemented in many ways. I have chosen to | |
* use large tables with a relatively long key setup time. | |
* If you encrypt more than a few blocks of data it pays to pre-compute | |
* as much as possible. This implementation is relatively inefficient for | |
* applications that need to re-key every block or so. | |
*/ | |
/* | |
* We start with the t-tables, directly from the Twofish definition. | |
* These are nibble-tables, but merging them and putting them two nibbles | |
* in one byte is more work than it is worth. | |
*/ | |
static Twofish_Byte t_table[2][4][16] = { | |
{ | |
{0x8,0x1,0x7,0xD,0x6,0xF,0x3,0x2,0x0,0xB,0x5,0x9,0xE,0xC,0xA,0x4}, | |
{0xE,0xC,0xB,0x8,0x1,0x2,0x3,0x5,0xF,0x4,0xA,0x6,0x7,0x0,0x9,0xD}, | |
{0xB,0xA,0x5,0xE,0x6,0xD,0x9,0x0,0xC,0x8,0xF,0x3,0x2,0x4,0x7,0x1}, | |
{0xD,0x7,0xF,0x4,0x1,0x2,0x6,0xE,0x9,0xB,0x3,0x0,0x8,0x5,0xC,0xA} | |
}, | |
{ | |
{0x2,0x8,0xB,0xD,0xF,0x7,0x6,0xE,0x3,0x1,0x9,0x4,0x0,0xA,0xC,0x5}, | |
{0x1,0xE,0x2,0xB,0x4,0xC,0x3,0x7,0x6,0xD,0xA,0x5,0xF,0x9,0x0,0x8}, | |
{0x4,0xC,0x7,0x5,0x1,0x6,0x9,0xA,0x0,0xE,0xD,0x8,0x2,0xB,0x3,0xF}, | |
{0xB,0x9,0x5,0x1,0xC,0x3,0xD,0xE,0x6,0x4,0x7,0xF,0x2,0x0,0x8,0xA} | |
} | |
}; | |
/* A 1-bit rotation of 4-bit values. Input must be in range 0..15 */ | |
#define ROR4BY1( x ) (((x)>>1) | (((x)<<3) & 0x8) ) | |
/* | |
* The q-boxes are only used during the key schedule computations. | |
* These are 8->8 bit lookup tables. Some CPUs prefer to have 8->32 bit | |
* lookup tables as it is faster to load a 32-bit value than to load an | |
* 8-bit value and zero the rest of the register. | |
* The LARGE_Q_TABLE switch allows you to choose 32-bit entries in | |
* the q-tables. Here we just define the Qtype which is used to store | |
* the entries of the q-tables. | |
*/ | |
#if LARGE_Q_TABLE | |
typedef Twofish_UInt32 Qtype; | |
#else | |
typedef Twofish_Byte Qtype; | |
#endif | |
/* | |
* The actual q-box tables. | |
* There are two q-boxes, each having 256 entries. | |
*/ | |
static Qtype q_table[2][256]; | |
/* | |
* Now the function that converts a single t-table into a q-table. | |
* | |
* Arguments: | |
* t[4][16] : four 4->4bit lookup tables that define the q-box | |
* q[256] : output parameter: the resulting q-box as a lookup table. | |
*/ | |
static void make_q_table( Twofish_Byte t[4][16], Qtype q[256] ) | |
{ | |
int ae,be,ao,bo; /* Some temporaries. */ | |
int i; | |
/* Loop over all input values and compute the q-box result. */ | |
for( i=0; i<256; i++ ) { | |
/* | |
* This is straight from the Twofish specifications. | |
* | |
* The ae variable is used for the a_i values from the specs | |
* with even i, and ao for the odd i's. Similarly for the b's. | |
*/ | |
ae = i>>4; be = i&0xf; | |
ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8); | |
ae = t[0][ao]; be = t[1][bo]; | |
ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8); | |
ae = t[2][ao]; be = t[3][bo]; | |
/* Store the result in the q-box table, the cast avoids a warning. */ | |
q[i] = (Qtype) ((be<<4) | ae); | |
} | |
} | |
/* | |
* Initialise both q-box tables. | |
*/ | |
static void initialise_q_boxes() { | |
/* Initialise each of the q-boxes using the t-tables */ | |
make_q_table( t_table[0], q_table[0] ); | |
make_q_table( t_table[1], q_table[1] ); | |
} | |
/* | |
* Next up is the MDS matrix multiplication. | |
* The MDS matrix multiplication operates in the field | |
* GF(2)[x]/p(x) with p(x)=x^8+x^6+x^5+x^3+1. | |
* If you don't understand this, read a book on finite fields. You cannot | |
* follow the finite-field computations without some background. | |
* | |
* In this field, multiplication by x is easy: shift left one bit | |
* and if bit 8 is set then xor the result with 0x169. | |
* | |
* The MDS coefficients use a multiplication by 1/x, | |
* or rather a division by x. This is easy too: first make the | |
* value 'even' (i.e. bit 0 is zero) by xorring with 0x169 if necessary, | |
* and then shift right one position. | |
* Even easier: shift right and xor with 0xb4 if the lsbit was set. | |
* | |
* The MDS coefficients are 1, EF, and 5B, and we use the fact that | |
* EF = 1 + 1/x + 1/x^2 | |
* 5B = 1 + 1/x^2 | |
* in this field. This makes multiplication by EF and 5B relatively easy. | |
* | |
* This property is no accident, the MDS matrix was designed to allow | |
* this implementation technique to be used. | |
* | |
* We have four MDS tables, each mapping 8 bits to 32 bits. | |
* Each table performs one column of the matrix multiplication. | |
* As the MDS is always preceded by q-boxes, each of these tables | |
* also implements the q-box just previous to that column. | |
*/ | |
/* The actual MDS tables. */ | |
static Twofish_UInt32 MDS_table[4][256]; | |
/* A small table to get easy conditional access to the 0xb4 constant. */ | |
static Twofish_UInt32 mds_poly_divx_const[] = {0,0xb4}; | |
/* Function to initialise the MDS tables. */ | |
static void initialise_mds_tables() | |
{ | |
int i; | |
Twofish_UInt32 q,qef,q5b; /* Temporary variables. */ | |
/* Loop over all 8-bit input values */ | |
for( i=0; i<256; i++ ) | |
{ | |
/* | |
* To save some work during the key expansion we include the last | |
* of the q-box layers from the h() function in these MDS tables. | |
*/ | |
/* We first do the inputs that are mapped through the q0 table. */ | |
q = q_table[0][i]; | |
/* | |
* Here we divide by x, note the table to get 0xb4 only if the | |
* lsbit is set. | |
* This sets qef = (1/x)*q in the finite field | |
*/ | |
qef = (q >> 1) ^ mds_poly_divx_const[ q & 1 ]; | |
/* | |
* Divide by x again, and add q to get (1+1/x^2)*q. | |
* Note that (1+1/x^2) = 5B in the field, and addition in the field | |
* is exclusive or on the bits. | |
*/ | |
q5b = (qef >> 1) ^ mds_poly_divx_const[ qef & 1 ] ^ q; | |
/* | |
* Add q5b to qef to set qef = (1+1/x+1/x^2)*q. | |
* Again, (1+1/x+1/x^2) = EF in the field. | |
*/ | |
qef ^= q5b; | |
/* | |
* Now that we have q5b = 5B * q and qef = EF * q | |
* we can fill two of the entries in the MDS matrix table. | |
* See the Twofish specifications for the order of the constants. | |
*/ | |
MDS_table[1][i] = (q <<24) | (q5b<<16) | (qef<<8) | qef; | |
MDS_table[3][i] = (q5b<<24) | (qef<<16) | (q <<8) | q5b; | |
/* Now we do it all again for the two columns that have a q1 box. */ | |
q = q_table[1][i]; | |
qef = (q >> 1) ^ mds_poly_divx_const[ q & 1 ]; | |
q5b = (qef >> 1) ^ mds_poly_divx_const[ qef & 1 ] ^ q; | |
qef ^= q5b; | |
/* The other two columns use the coefficient in a different order. */ | |
MDS_table[0][i] = (qef<<24) | (qef<<16) | (q5b<<8) | q ; | |
MDS_table[2][i] = (qef<<24) | (q <<16) | (qef<<8) | q5b; | |
} | |
} | |
/* | |
* The h() function is the heart of the Twofish cipher. | |
* It is a complicated sequence of q-box lookups, key material xors, | |
* and finally the MDS matrix. | |
* We use lots of macros to make this reasonably fast. | |
*/ | |
/* First a shorthand for the two q-tables */ | |
#define q0 q_table[0] | |
#define q1 q_table[1] | |
/* | |
* Each macro computes one column of the h for either 2, 3, or 4 stages. | |
* As there are 4 columns, we have 12 macros in all. | |
* | |
* The key bytes are stored in the Byte array L at offset | |
* 0,1,2,3, 8,9,10,11, [16,17,18,19, [24,25,26,27]] as this is the | |
* order we get the bytes from the user. If you look at the Twofish | |
* specs, you'll see that h() is applied to the even key words or the | |
* odd key words. The bytes of the even words appear in this spacing, | |
* and those of the odd key words too. | |
* | |
* These macros are the only place where the q-boxes and the MDS table | |
* are used. | |
*/ | |
#define H02( y, L ) MDS_table[0][q0[q0[y]^L[ 8]]^L[0]] | |
#define H12( y, L ) MDS_table[1][q0[q1[y]^L[ 9]]^L[1]] | |
#define H22( y, L ) MDS_table[2][q1[q0[y]^L[10]]^L[2]] | |
#define H32( y, L ) MDS_table[3][q1[q1[y]^L[11]]^L[3]] | |
#define H03( y, L ) H02( q1[y]^L[16], L ) | |
#define H13( y, L ) H12( q1[y]^L[17], L ) | |
#define H23( y, L ) H22( q0[y]^L[18], L ) | |
#define H33( y, L ) H32( q0[y]^L[19], L ) | |
#define H04( y, L ) H03( q1[y]^L[24], L ) | |
#define H14( y, L ) H13( q0[y]^L[25], L ) | |
#define H24( y, L ) H23( q0[y]^L[26], L ) | |
#define H34( y, L ) H33( q1[y]^L[27], L ) | |
/* | |
* Now we can define the h() function given an array of key bytes. | |
* This function is only used in the key schedule, and not to pre-compute | |
* the keyed S-boxes. | |
* | |
* In the key schedule, the input is always of the form k*(1+2^8+2^16+2^24) | |
* so we only provide k as an argument. | |
* | |
* Arguments: | |
* k input to the h() function. | |
* L pointer to array of key bytes at | |
* offsets 0,1,2,3, ... 8,9,10,11, [16,17,18,19, [24,25,26,27]] | |
* kCycles # key cycles, 2, 3, or 4. | |
*/ | |
static Twofish_UInt32 h( int k, Twofish_Byte L[], int kCycles ) | |
{ | |
switch( kCycles ) { | |
/* We code all 3 cases separately for speed reasons. */ | |
case 2: | |
return H02(k,L) ^ H12(k,L) ^ H22(k,L) ^ H32(k,L); | |
case 3: | |
return H03(k,L) ^ H13(k,L) ^ H23(k,L) ^ H33(k,L); | |
case 4: | |
return H04(k,L) ^ H14(k,L) ^ H24(k,L) ^ H34(k,L); | |
default: | |
/* This is always a coding error, which is fatal. */ | |
Twofish_fatal( "Twofish h(): Illegal argument", ERR_ILL_ARG ); | |
return ERR_ILL_ARG; | |
} | |
} | |
/* | |
* Pre-compute the keyed S-boxes. | |
* Fill the pre-computed S-box array in the expanded key structure. | |
* Each pre-computed S-box maps 8 bits to 32 bits. | |
* | |
* The S argument contains half the number of bytes of the full key, but is | |
* derived from the full key. (See Twofish specifications for details.) | |
* S has the weird byte input order used by the Hxx macros. | |
* | |
* This function takes most of the time of a key expansion. | |
* | |
* Arguments: | |
* S pointer to array of 8*kCycles Bytes containing the S vector. | |
* kCycles number of key words, must be in the set {2,3,4} | |
* xkey pointer to Twofish_key structure that will contain the S-boxes. | |
*/ | |
static int fill_keyed_sboxes( Twofish_Byte S[], int kCycles, Twofish_key * xkey ) | |
{ | |
int i; | |
switch( kCycles ) { | |
/* We code all 3 cases separately for speed reasons. */ | |
case 2: | |
for( i=0; i<256; i++ ) | |
{ | |
xkey->s[0][i]= H02( i, S ); | |
xkey->s[1][i]= H12( i, S ); | |
xkey->s[2][i]= H22( i, S ); | |
xkey->s[3][i]= H32( i, S ); | |
} | |
break; | |
case 3: | |
for( i=0; i<256; i++ ) | |
{ | |
xkey->s[0][i]= H03( i, S ); | |
xkey->s[1][i]= H13( i, S ); | |
xkey->s[2][i]= H23( i, S ); | |
xkey->s[3][i]= H33( i, S ); | |
} | |
break; | |
case 4: | |
for( i=0; i<256; i++ ) | |
{ | |
xkey->s[0][i]= H04( i, S ); | |
xkey->s[1][i]= H14( i, S ); | |
xkey->s[2][i]= H24( i, S ); | |
xkey->s[3][i]= H34( i, S ); | |
} | |
break; | |
default: | |
/* This is always a coding error, which is fatal. */ | |
Twofish_fatal( "Twofish fill_keyed_sboxes(): Illegal argument", ERR_ILL_ARG ); | |
} | |
return SUCCESS; | |
} | |
/* A flag to keep track of whether we have been initialised or not. */ | |
static int Twofish_initialised = 0; | |
/* | |
* Initialise the Twofish implementation. | |
* This function must be called before any other function in the | |
* Twofish implementation is called. | |
* This routine also does some sanity checks, to make sure that | |
* all the macros behave, and it tests the whole cipher. | |
*/ | |
int Twofish_initialise() | |
{ | |
int ret; | |
/* First test the various platform-specific definitions. */ | |
if ((ret = test_platform()) < 0) | |
return ret; | |
/* We can now generate our tables, in the right order of course. */ | |
initialise_q_boxes(); | |
initialise_mds_tables(); | |
/* We're finished with the initialisation itself. */ | |
Twofish_initialised = 1; | |
/* | |
* And run some tests on the whole cipher. | |
* Yes, you need to do this every time you start your program. | |
* It is called assurance; you have to be certain that your program | |
* still works properly. | |
*/ | |
return self_test(); | |
} | |
/* | |
* The Twofish key schedule uses an Reed-Solomon code matrix multiply. | |
* Just like the MDS matrix, the RS-matrix is designed to be easy | |
* to implement. Details are below in the code. | |
* | |
* These constants make it easy to compute in the finite field used | |
* for the RS code. | |
* | |
* We use Bytes for the RS computation, but these are automatically | |
* widened to unsigned integers in the expressions. Having unsigned | |
* ints in these tables therefore provides the fastest access. | |
*/ | |
static unsigned int rs_poly_const[] = {0, 0x14d}; | |
static unsigned int rs_poly_div_const[] = {0, 0xa6 }; | |
/* | |
* Prepare a key for use in encryption and decryption. | |
* Like most block ciphers, Twofish allows the key schedule | |
* to be pre-computed given only the key. | |
* Twofish has a fairly 'heavy' key schedule that takes a lot of time | |
* to compute. The main work is pre-computing the S-boxes used in the | |
* encryption and decryption. We feel that this makes the cipher much | |
* harder to attack. The attacker doesn't even know what the S-boxes | |
* contain without including the entire key schedule in the analysis. | |
* | |
* Unlike most Twofish implementations, this one allows any key size from | |
* 0 to 32 bytes. Odd key sizes are defined for Twofish (see the | |
* specifications); the key is simply padded with zeroes to the next real | |
* key size of 16, 24, or 32 bytes. | |
* Each odd-sized key is thus equivalent to a single normal-sized key. | |
* | |
* Arguments: | |
* key array of key bytes | |
* key_len number of bytes in the key, must be in the range 0,...,32. | |
* xkey Pointer to an Twofish_key structure that will be filled | |
* with the internal form of the cipher key. | |
*/ | |
int Twofish_prepare_key( Twofish_Byte key[], int key_len, Twofish_key * xkey ) | |
{ | |
/* We use a single array to store all key material in, | |
* to simplify the wiping of the key material at the end. | |
* The first 32 bytes contain the actual (padded) cipher key. | |
* The next 32 bytes contain the S-vector in its weird format, | |
* and we have 4 bytes of overrun necessary for the RS-reduction. | |
*/ | |
Twofish_Byte K[32+32+4]; | |
int kCycles; /* # key cycles, 2,3, or 4. */ | |
int i; | |
Twofish_UInt32 A, B; /* Used to compute the round keys. */ | |
Twofish_Byte * kptr; /* Three pointers for the RS computation. */ | |
Twofish_Byte * sptr; | |
Twofish_Byte * t; | |
Twofish_Byte b,bx,bxx; /* Some more temporaries for the RS computation. */ | |
/* Check that the Twofish implementation was initialised. */ | |
if( Twofish_initialised == 0 ) | |
{ | |
/* | |
* You didn't call Twofish_initialise before calling this routine. | |
* This is a programming error, and therefore we call the fatal | |
* routine. | |
* | |
* I could of course call the initialisation routine here, | |
* but there are a few reasons why I don't. First of all, the | |
* self-tests have to be done at startup. It is no good to inform | |
* the user that the cipher implementation fails when he wants to | |
* write his data to disk in encrypted form. You have to warn him | |
* before he spends time typing his data. Second, the initialisation | |
* and self test are much slower than a single key expansion. | |
* Calling the initialisation here makes the performance of the | |
* cipher unpredictable. This can lead to really weird problems | |
* if you use the cipher for a real-time task. Suddenly it fails | |
* once in a while the first time you try to use it. Things like | |
* that are almost impossible to debug. | |
*/ | |
/* Twofish_fatal( "Twofish implementation was not initialised.", ERR_INIT ); */ | |
/* | |
* There is always a danger that the Twofish_fatal routine returns, | |
* in spite of the specifications that it should not. | |
* (A good programming rule: don't trust the rest of the code.) | |
* This would be disasterous. If the q-tables and MDS-tables have | |
* not been initialised, they are probably still filled with zeroes. | |
* Suppose the MDS-tables are all zero. The key expansion would then | |
* generate all-zero round keys, and all-zero s-boxes. The danger | |
* is that nobody would notice as the encry | |
* mangles the input, and the decryption still 'decrypts' it, | |
* but now in a completely key-independent manner. | |
* To stop such security disasters, we use blunt force. | |
* If your program hangs here: fix the fatal routine! | |
*/ | |
for(;;); /* Infinite loop, which beats being insecure. */ | |
} | |
/* Check for valid key length. */ | |
if( key_len < 0 || key_len > 32 ) | |
{ | |
/* | |
* This can only happen if a programmer didn't read the limitations | |
* on the key size. | |
*/ | |
Twofish_fatal( "Twofish_prepare_key: illegal key length", ERR_KEY_LEN ); | |
/* | |
* A return statement just in case the fatal macro returns. | |
* The rest of the code assumes that key_len is in range, and would | |
* buffer-overflow if it wasn't. | |
* | |
* Why do we still use a programming language that has problems like | |
* buffer overflows, when these problems were solved in 1960 with | |
* the development of Algol? Have we not leared anything? | |
*/ | |
return ERR_KEY_LEN; | |
} | |
/* Pad the key with zeroes to the next suitable key length. */ | |
memcpy( K, key, key_len ); | |
memset( K+key_len, 0, sizeof(K)-key_len ); | |
/* | |
* Compute kCycles: the number of key cycles used in the cipher. | |
* 2 for 128-bit keys, 3 for 192-bit keys, and 4 for 256-bit keys. | |
*/ | |
kCycles = (key_len + 7) >> 3; | |
/* Handle the special case of very short keys: minimum 2 cycles. */ | |
if( kCycles < 2 ) | |
{ | |
kCycles = 2; | |
} | |
/* | |
* From now on we just pretend to have 8*kCycles bytes of | |
* key material in K. This handles all the key size cases. | |
*/ | |
/* | |
* We first compute the 40 expanded key words, | |
* formulas straight from the Twofish specifications. | |
*/ | |
for( i=0; i<40; i+=2 ) | |
{ | |
/* | |
* Due to the byte spacing expected by the h() function | |
* we can pick the bytes directly from the key K. | |
* As we use bytes, we never have the little/big endian | |
* problem. | |
* | |
* Note that we apply the rotation function only to simple | |
* variables, as the rotation macro might evaluate its argument | |
* more than once. | |
*/ | |
A = h( i , K , kCycles ); | |
B = h( i+1, K+4, kCycles ); | |
B = ROL32( B, 8 ); | |
/* Compute and store the round keys. */ | |
A += B; | |
B += A; | |
xkey->K[i] = A; | |
xkey->K[i+1] = ROL32( B, 9 ); | |
} | |
/* Wipe variables that contained key material. */ | |
A=B=0; | |
/* | |
* And now the dreaded RS multiplication that few seem to understand. | |
* The RS matrix is not random, and is specially designed to compute the | |
* RS matrix multiplication in a simple way. | |
* | |
* We work in the field GF(2)[x]/x^8+x^6+x^3+x^2+1. Note that this is a | |
* different field than used for the MDS matrix. | |
* (At least, it is a different representation because all GF(2^8) | |
* representations are equivalent in some form.) | |
* | |
* We take 8 consecutive bytes of the key and interpret them as | |
* a polynomial k_0 + k_1 y + k_2 y^2 + ... + k_7 y^7 where | |
* the k_i bytes are the key bytes and are elements of the finite field. | |
* We multiply this polynomial by y^4 and reduce it modulo | |
* y^4 + (x + 1/x)y^3 + (x)y^2 + (x + 1/x)y + 1. | |
* using straightforward polynomial modulo reduction. | |
* The coefficients of the result are the result of the RS | |
* matrix multiplication. When we wrote the Twofish specification, | |
* the original RS definition used the polynomials, | |
* but that requires much more mathematical knowledge. | |
* We were already using matrix multiplication in a finite field for | |
* the MDS matrix, so I re-wrote the RS operation as a matrix | |
* multiplication to reduce the difficulty of understanding it. | |
* Some implementors have not picked up on this simpler method of | |
* computing the RS operation, even though it is mentioned in the | |
* specifications. | |
* | |
* It is possible to perform these computations faster by using 32-bit | |
* word operations, but that is not portable and this is not a speed- | |
* critical area. | |
* | |
* We explained the 1/x computation when we did the MDS matrix. | |
* | |
* The S vector is stored in K[32..64]. | |
* The S vector has to be reversed, so we loop cross-wise. | |
* | |
* Note the weird byte spacing of the S-vector, to match the even | |
* or odd key words arrays. See the discussion at the Hxx macros for | |
* details. | |
*/ | |
kptr = K + 8*kCycles; /* Start at end of key */ | |
sptr = K + 32; /* Start at start of S */ | |
/* Loop over all key material */ | |
while( kptr > K ) | |
{ | |
kptr -= 8; | |
/* | |
* Initialise the polynimial in sptr[0..12] | |
* The first four coefficients are 0 as we have to multiply by y^4. | |
* The next 8 coefficients are from the key material. | |
*/ | |
memset( sptr, 0, 4 ); | |
memcpy( sptr+4, kptr, 8 ); | |
/* | |
* The 12 bytes starting at sptr are now the coefficients of | |
* the polynomial we need to reduce. | |
*/ | |
/* Loop over the polynomial coefficients from high to low */ | |
t = sptr+11; | |
/* Keep looping until polynomial is degree 3; */ | |
while( t > sptr+3 ) | |
{ | |
/* Pick up the highest coefficient of the poly. */ | |
b = *t; | |
/* | |
* Compute x and (x+1/x) times this coefficient. | |
* See the MDS matrix implementation for a discussion of | |
* multiplication by x and 1/x. We just use different | |
* constants here as we are in a | |
* different finite field representation. | |
* | |
* These two statements set | |
* bx = (x) * b | |
* bxx= (x + 1/x) * b | |
*/ | |
bx = (Twofish_Byte)((b<<1) ^ rs_poly_const[ b>>7 ]); | |
bxx= (Twofish_Byte)((b>>1) ^ rs_poly_div_const[ b&1 ] ^ bx); | |
/* | |
* Subtract suitable multiple of | |
* y^4 + (x + 1/x)y^3 + (x)y^2 + (x + 1/x)y + 1 | |
* from the polynomial, except that we don't bother | |
* updating t[0] as it will become zero anyway. | |
*/ | |
t[-1] ^= bxx; | |
t[-2] ^= bx; | |
t[-3] ^= bxx; | |
t[-4] ^= b; | |
/* Go to the next coefficient. */ | |
t--; | |
} | |
/* Go to next S-vector word, obeying the weird spacing rules. */ | |
sptr += 8; | |
} | |
/* Wipe variables that contained key material. */ | |
b = bx = bxx = 0; | |
/* And finally, we can compute the key-dependent S-boxes. */ | |
fill_keyed_sboxes( &K[32], kCycles, xkey ); | |
/* Wipe array that contained key material. */ | |
memset( K, 0, sizeof( K ) ); | |
return SUCCESS; | |
} | |
/* | |
* We can now start on the actual encryption and decryption code. | |
* As these are often speed-critical we will use a lot of macros. | |
*/ | |
/* | |
* The g() function is the heart of the round function. | |
* We have two versions of the g() function, one without an input | |
* rotation and one with. | |
* The pre-computed S-boxes make this pretty simple. | |
*/ | |
#define g0(X,xkey) \ | |
(xkey->s[0][b0(X)]^xkey->s[1][b1(X)]^xkey->s[2][b2(X)]^xkey->s[3][b3(X)]) | |
#define g1(X,xkey) \ | |
(xkey->s[0][b3(X)]^xkey->s[1][b0(X)]^xkey->s[2][b1(X)]^xkey->s[3][b2(X)]) | |
/* | |
* A single round of Twofish. The A,B,C,D are the four state variables, | |
* T0 and T1 are temporaries, xkey is the expanded key, and r the | |
* round number. | |
* | |
* Note that this macro does not implement the swap at the end of the round. | |
*/ | |
#define ENCRYPT_RND( A,B,C,D, T0, T1, xkey, r ) \ | |
T0 = g0(A,xkey); T1 = g1(B,xkey);\ | |
C ^= T0+T1+xkey->K[8+2*(r)]; C = ROR32(C,1);\ | |
D = ROL32(D,1); D ^= T0+2*T1+xkey->K[8+2*(r)+1] | |
/* | |
* Encrypt a single cycle, consisting of two rounds. | |
* This avoids the swapping of the two halves. | |
* Parameter r is now the cycle number. | |
*/ | |
#define ENCRYPT_CYCLE( A, B, C, D, T0, T1, xkey, r ) \ | |
ENCRYPT_RND( A,B,C,D,T0,T1,xkey,2*(r) );\ | |
ENCRYPT_RND( C,D,A,B,T0,T1,xkey,2*(r)+1 ) | |
/* Full 16-round encryption */ | |
#define ENCRYPT( A,B,C,D,T0,T1,xkey ) \ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 0 );\ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 1 );\ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 2 );\ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 3 );\ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 4 );\ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 5 );\ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 6 );\ | |
ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 7 ) | |
/* | |
* A single round of Twofish for decryption. It differs from | |
* ENCRYTP_RND only because of the 1-bit rotations. | |
*/ | |
#define DECRYPT_RND( A,B,C,D, T0, T1, xkey, r ) \ | |
T0 = g0(A,xkey); T1 = g1(B,xkey);\ | |
C = ROL32(C,1); C ^= T0+T1+xkey->K[8+2*(r)];\ | |
D ^= T0+2*T1+xkey->K[8+2*(r)+1]; D = ROR32(D,1) | |
/* | |
* Decrypt a single cycle, consisting of two rounds. | |
* This avoids the swapping of the two halves. | |
* Parameter r is now the cycle number. | |
*/ | |
#define DECRYPT_CYCLE( A, B, C, D, T0, T1, xkey, r ) \ | |
DECRYPT_RND( A,B,C,D,T0,T1,xkey,2*(r)+1 );\ | |
DECRYPT_RND( C,D,A,B,T0,T1,xkey,2*(r) ) | |
/* Full 16-round decryption. */ | |
#define DECRYPT( A,B,C,D,T0,T1, xkey ) \ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 7 );\ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 6 );\ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 5 );\ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 4 );\ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 3 );\ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 2 );\ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 1 );\ | |
DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 0 ) | |
/* | |
* A macro to read the state from the plaintext and do the initial key xors. | |
* The koff argument allows us to use the same macro | |
* for the decryption which uses different key words at the start. | |
*/ | |
#define GET_INPUT( src, A,B,C,D, xkey, koff ) \ | |
A = GET32(src )^xkey->K[ koff]; B = GET32(src+ 4)^xkey->K[1+koff]; \ | |
C = GET32(src+ 8)^xkey->K[2+koff]; D = GET32(src+12)^xkey->K[3+koff] | |
/* | |
* Similar macro to put the ciphertext in the output buffer. | |
* We xor the keys into the state variables before we use the PUT32 | |
* macro as the macro might use its argument multiple times. | |
*/ | |
#define PUT_OUTPUT( A,B,C,D, dst, xkey, koff ) \ | |
A ^= xkey->K[ koff]; B ^= xkey->K[1+koff]; \ | |
C ^= xkey->K[2+koff]; D ^= xkey->K[3+koff]; \ | |
PUT32( A, dst ); PUT32( B, dst+ 4 ); \ | |
PUT32( C, dst+8 ); PUT32( D, dst+12 ) | |
/* | |
* Twofish block encryption | |
* | |
* Arguments: | |
* xkey expanded key array | |
* p 16 bytes of plaintext | |
* c 16 bytes in which to store the ciphertext | |
*/ | |
void Twofish_encrypt( Twofish_key * xkey, Twofish_Byte p[16], Twofish_Byte c[16]) | |
{ | |
Twofish_UInt32 A,B,C,D,T0,T1; /* Working variables */ | |
/* Get the four plaintext words xorred with the key */ | |
GET_INPUT( p, A,B,C,D, xkey, 0 ); | |
/* Do 8 cycles (= 16 rounds) */ | |
ENCRYPT( A,B,C,D,T0,T1,xkey ); | |
/* Store them with the final swap and the output whitening. */ | |
PUT_OUTPUT( C,D,A,B, c, xkey, 4 ); | |
} | |
/* | |
* Twofish block decryption. | |
* | |
* Arguments: | |
* xkey expanded key array | |
* p 16 bytes of plaintext | |
* c 16 bytes in which to store the ciphertext | |
*/ | |
void Twofish_decrypt( Twofish_key * xkey, Twofish_Byte c[16], Twofish_Byte p[16]) | |
{ | |
Twofish_UInt32 A,B,C,D,T0,T1; /* Working variables */ | |
/* Get the four plaintext words xorred with the key */ | |
GET_INPUT( c, A,B,C,D, xkey, 4 ); | |
/* Do 8 cycles (= 16 rounds) */ | |
DECRYPT( A,B,C,D,T0,T1,xkey ); | |
/* Store them with the final swap and the output whitening. */ | |
PUT_OUTPUT( C,D,A,B, p, xkey, 0 ); | |
} | |
/* | |
* Using the macros it is easy to make special routines for | |
* CBC mode, CTR mode etc. The only thing you might want to | |
* add is a XOR_PUT_OUTPUT which xors the outputs into the | |
* destinationa instead of overwriting the data. This requires | |
* a XOR_PUT32 macro as well, but that should all be trivial. | |
* | |
* I thought about including routines for the separate cipher | |
* modes here, but it is unclear which modes should be included, | |
* and each encryption or decryption routine takes up a lot of code space. | |
* Also, I don't have any test vectors for any cipher modes | |
* with Twofish. | |
*/ | |