Alexandre Lision | ddd731e | 2014-01-31 11:50:08 -0500 | [diff] [blame] | 1 | /*
|
| 2 | * Fast, portable, and easy-to-use Twofish implementation,
|
| 3 | * Version 0.3.
|
| 4 | * Copyright (c) 2002 by Niels Ferguson.
|
| 5 | * (See further down for the almost-unrestricted licensing terms.)
|
| 6 | *
|
| 7 | * --------------------------------------------------------------------------
|
| 8 | * There are two files for this implementation:
|
| 9 | * - twofish.h, the header file.
|
| 10 | * - twofish.c, the code file.
|
| 11 | *
|
| 12 | * To incorporate this code into your program you should:
|
| 13 | * - Check the licensing terms further down in this comment.
|
| 14 | * - Fix the two type definitions in twofish.h to suit your platform.
|
| 15 | * - Fix a few definitions in twofish.c in the section marked
|
| 16 | * PLATFORM FIXES. There is one important ones that affects
|
| 17 | * functionality, and then a few definitions that you can optimise
|
| 18 | * for efficiency but those have no effect on the functionality.
|
| 19 | * Don't change anything else.
|
| 20 | * - Put the code in your project and compile it.
|
| 21 | *
|
| 22 | * To use this library you should:
|
| 23 | * - Call Twofish_initialise() in your program before any other function in
|
| 24 | * this library.
|
| 25 | * - Use Twofish_prepare_key(...) to convert a key to internal form.
|
| 26 | * - Use Twofish_encrypt(...) and Twofish_decrypt(...) to encrypt and decrypt
|
| 27 | * data.
|
| 28 | * See the comments in the header file for details on these functions.
|
| 29 | * --------------------------------------------------------------------------
|
| 30 | *
|
| 31 | * There are many Twofish implementation available for free on the web.
|
| 32 | * Most of them are hard to integrate into your own program.
|
| 33 | * As we like people to use our cipher, I thought I would make it easier.
|
| 34 | * Here is a free and easy-to-integrate Twofish implementation in C.
|
| 35 | * The latest version is always available from my personal home page at
|
| 36 | * http://niels.ferguson.net/
|
| 37 | *
|
| 38 | * Integrating library code into a project is difficult because the library
|
| 39 | * header files interfere with the project's header files and code.
|
| 40 | * And of course the project's header files interfere with the library code.
|
| 41 | * I've tried to resolve these problems here.
|
| 42 | * The header file of this implementation is very light-weight.
|
| 43 | * It contains two typedefs, a structure, and a few function declarations.
|
| 44 | * All names it defines start with "Twofish_".
|
| 45 | * The header file is therefore unlikely to cause problems in your project.
|
| 46 | * The code file of this implementation doesn't need to include the header
|
| 47 | * files of the project. There is thus no danger of the project interfering
|
| 48 | * with all the definitions and macros of the Twofish code.
|
| 49 | * In most situations, all you need to do is fill in a few platform-specific
|
| 50 | * definitions in the header file and code file,
|
| 51 | * and you should be able to run the Twofish code in your project.
|
| 52 | * I estimate it should take you less than an hour to integrate this code
|
| 53 | * into your project, most of it spent reading the comments telling you what
|
| 54 | * to do.
|
| 55 | *
|
| 56 | * For people using C++: it is very easy to wrap this library into a
|
| 57 | * TwofishKey class. One of the big advantages is that you can automate the
|
| 58 | * wiping of the key material in the destructor. I have not provided a C++
|
| 59 | * class because the interface depends too much on the abstract base class
|
| 60 | * you use for block ciphers in your program, which I don't know about.
|
| 61 | *
|
| 62 | * This implementation is designed for use on PC-class machines. It uses the
|
| 63 | * Twofish 'full' keying option which uses large tables. Total table size is
|
| 64 | * around 5-6 kB for static tables plus 4.5 kB for each pre-processed key.
|
| 65 | * If you need an implementation that uses less memory,
|
| 66 | * take a look at Brian Gladman's code on his web site:
|
| 67 | * http://fp.gladman.plus.com/cryptography_technology/aes/
|
| 68 | * He has code for all AES candidates.
|
| 69 | * His Twofish code has lots of options trading off table size vs. speed.
|
| 70 | * You can also take a look at the optimised code by Doug Whiting on the
|
| 71 | * Twofish web site
|
| 72 | * http://www.counterpane.com/twofish.html
|
| 73 | * which has loads of options.
|
| 74 | * I believe these existing implementations are harder to re-use because they
|
| 75 | * are not clean libraries and they impose requirements on the environment.
|
| 76 | * This implementation is very careful to minimise those,
|
| 77 | * and should be easier to integrate into any larger program.
|
| 78 | *
|
| 79 | * The default mode of this implementation is fully portable as it uses no
|
| 80 | * behaviour not defined in the C standard. (This is harder than you think.)
|
| 81 | * If you have any problems porting the default mode, please let me know
|
| 82 | * so that I can fix the problem. (But only if this code is at fault, I
|
| 83 | * don't fix compilers.)
|
| 84 | * Most of the platform fixes are related to non-portable but faster ways
|
| 85 | * of implementing certain functions.
|
| 86 | *
|
| 87 | * In general I've tried to make the code as fast as possible, at the expense
|
| 88 | * of memory and code size. However, C does impose limits, and this
|
| 89 | * implementation will be slower than an optimised assembler implementation.
|
| 90 | * But beware of assembler implementations: a good Pentium implementation
|
| 91 | * uses completely different code than a good Pentium II implementation.
|
| 92 | * You basically have to re-write the assembly code for every generation of
|
| 93 | * processor. Unless you are severely pressed for speed, stick with C.
|
| 94 | *
|
| 95 | * The initialisation routine of this implementation contains a self-test.
|
| 96 | * If initialisation succeeds without calling the fatal routine, then
|
| 97 | * the implementation works. I don't think you can break the implementation
|
| 98 | * in such a way that it still passes the tests, unless you are malicious.
|
| 99 | * In other words: if the initialisation routine returns,
|
| 100 | * you have successfully ported the implementation.
|
| 101 | * (Or not implemented the fatal routine properly, but that is your problem.)
|
| 102 | *
|
| 103 | * I'm indebted to many people who helped me in one way or another to write
|
| 104 | * this code. During the design of Twofish and the AES process I had very
|
| 105 | * extensive discussions of all implementation issues with various people.
|
| 106 | * Doug Whiting in particular provided a wealth of information. The Twofish
|
| 107 | * team spent untold hours discussion various cipher features, and their
|
| 108 | * implementation. Brian Gladman implemented all AES candidates in C,
|
| 109 | * and we had some fruitful discussions on how to implement Twofish in C.
|
| 110 | * Jan Nieuwenhuizen tested this code on Linux using GCC.
|
| 111 | *
|
| 112 | * Now for the license:
|
| 113 | * The author hereby grants a perpetual license to everybody to
|
| 114 | * use this code for any purpose as long as the copyright message is included
|
| 115 | * in the source code of this or any derived work.
|
| 116 | *
|
| 117 | * Yes, this means that you, your company, your club, and anyone else
|
| 118 | * can use this code anywhere you want. You can change it and distribute it
|
| 119 | * under the GPL, include it in your commercial product without releasing
|
| 120 | * the source code, put it on the web, etc.
|
| 121 | * The only thing you cannot do is remove my copyright message,
|
| 122 | * or distribute any source code based on this implementation that does not
|
| 123 | * include my copyright message.
|
| 124 | *
|
| 125 | * I appreciate a mention in the documentation or credits,
|
| 126 | * but I understand if that is difficult to do.
|
| 127 | * I also appreciate it if you tell me where and why you used my code.
|
| 128 | *
|
| 129 | * Please send any questions or comments to niels@ferguson.net
|
| 130 | *
|
| 131 | * Have Fun!
|
| 132 | *
|
| 133 | * Niels
|
| 134 | */
|
| 135 |
|
| 136 | /*
|
| 137 | * DISCLAIMER: As I'm giving away my work for free, I'm of course not going
|
| 138 | * to accept any liability of any form. This code, or the Twofish cipher,
|
| 139 | * might very well be flawed; you have been warned.
|
| 140 | * This software is provided as-is, without any kind of warrenty or
|
| 141 | * guarantee. And that is really all you can expect when you download
|
| 142 | * code for free from the Internet.
|
| 143 | *
|
| 144 | * I think it is really sad that disclaimers like this seem to be necessary.
|
| 145 | * If people only had a little bit more common sense, and didn't come
|
| 146 | * whining like little children every time something happens....
|
| 147 | */
|
| 148 |
|
| 149 | /*
|
| 150 | * Version history:
|
| 151 | * Version 0.0, 2002-08-30
|
| 152 | * First written.
|
| 153 | * Version 0.1, 2002-09-03
|
| 154 | * Added disclaimer. Improved self-tests.
|
| 155 | * Version 0.2, 2002-09-09
|
| 156 | * Removed last non-portabilities. Default now works completely within
|
| 157 | * the C standard. UInt32 can be larger than 32 bits without problems.
|
| 158 | * Version 0.3, 2002-09-28
|
| 159 | * Bugfix: use instead of to adhere to ANSI/ISO.
|
| 160 | * Rename BIG_ENDIAN macro to CPU_IS_BIG_ENDIAN. The gcc library
|
| 161 | * header already defines BIG_ENDIAN, even though it is not
|
| 162 | * supposed to.
|
| 163 | */
|
| 164 |
|
| 165 |
|
| 166 | /*
|
| 167 | * Minimum set of include files.
|
| 168 | * You should not need any application-specific include files for this code.
|
| 169 | * In fact, adding you own header files could break one of the many macros or
|
| 170 | * functions in this file. Be very careful.
|
| 171 | * Standard include files will probably be ok.
|
| 172 | */
|
| 173 | #include <stdio.h>
|
| 174 | #include <string.h>
|
| 175 | #include <stdlib.h>
|
| 176 | /* #include * for memset(), memcpy(), and memcmp() */
|
| 177 | #include "twofish.h"
|
| 178 |
|
| 179 |
|
| 180 | /*
|
| 181 | * PLATFORM FIXES
|
| 182 | * ==============
|
| 183 | *
|
| 184 | * Fix the type definitions in twofish.h first!
|
| 185 | *
|
| 186 | * The following definitions have to be fixed for each particular platform
|
| 187 | * you work on. If you have a multi-platform program, you no doubt have
|
| 188 | * portable definitions that you can substitute here without changing the
|
| 189 | * rest of the code.
|
| 190 | */
|
| 191 |
|
| 192 |
|
| 193 | /*
|
| 194 | * Function called if something is fatally wrong with the implementation.
|
| 195 | * This fatal function is called when a coding error is detected in the
|
| 196 | * Twofish implementation, or when somebody passes an obviously erroneous
|
| 197 | * parameter to this implementation. There is not much you can do when
|
| 198 | * the code contains bugs, so we just stop.
|
| 199 | *
|
| 200 | * The argument is a string. Ideally the fatal function prints this string
|
| 201 | * as an error message. Whatever else this function does, it should never
|
| 202 | * return. A typical implementation would stop the program completely after
|
| 203 | * printing the error message.
|
| 204 | *
|
| 205 | * This default implementation is not very useful,
|
| 206 | * but does not assume anything about your environment.
|
| 207 | * It will at least let you know something is wrong....
|
| 208 | * I didn't want to include any libraries to print and error or so,
|
| 209 | * as this makes the code much harder to integrate in a project.
|
| 210 | *
|
| 211 | * Note that the Twofish_fatal function may not return to the caller.
|
| 212 | * Unfortunately this is not something the self-test can test for,
|
| 213 | * so you have to make sure of this yourself.
|
| 214 | *
|
| 215 | * If you want to call an external function, be careful about including
|
| 216 | * your own header files here. This code uses a lot of macros, and your
|
| 217 | * header file could easily break it. Maybe the best solution is to use
|
| 218 | * a separate extern statement for your fatal function.
|
| 219 | */
|
| 220 | /* #define Twofish_fatal(pmsgx) { fprintf(stderr, pmsgx); exit(1); } */
|
| 221 | #define Twofish_fatal(pmsgx, code) { return(code); }
|
| 222 |
|
| 223 |
|
| 224 | /*
|
| 225 | * The rest of the settings are not important for the functionality
|
| 226 | * of this Twofish implementation. That is, their default settings
|
| 227 | * work on all platforms. You can change them to improve the
|
| 228 | * speed of the implementation on your platform. Erroneous settings
|
| 229 | * will result in erroneous implementations, but the self-test should
|
| 230 | * catch those.
|
| 231 | */
|
| 232 |
|
| 233 |
|
| 234 | /*
|
| 235 | * Macros to rotate a Twofish_UInt32 value left or right by the
|
| 236 | * specified number of bits. This should be a 32-bit rotation,
|
| 237 | * and not rotation of, say, 64-bit values.
|
| 238 | *
|
| 239 | * Every encryption or decryption operation uses 32 of these rotations,
|
| 240 | * so it is a good idea to make these macros efficient.
|
| 241 | *
|
| 242 | * This fully portable definition has one piece of tricky stuff.
|
| 243 | * The UInt32 might be larger than 32 bits, so we have to mask
|
| 244 | * any higher bits off. The simplest way to do this is to 'and' the
|
| 245 | * value first with 0xffffffff and then shift it right. An optimising
|
| 246 | * compiler that has a 32-bit type can optimise this 'and' away.
|
| 247 | *
|
| 248 | * Unfortunately there is no portable way of writing the constant
|
| 249 | * 0xffffffff. You don't know which suffix to use (U, or UL?)
|
| 250 | * The UINT32_MASK definition uses a bit of trickery. Shift-left
|
| 251 | * is only defined if the shift amount is strictly less than the size
|
| 252 | * of the UInt32, so we can't use (1<<32). The answer it to take the value
|
| 253 | * 2, cast it to a UInt32, shift it left 31 positions, and subtract one.
|
| 254 | * Another example of how to make something very simple extremely difficult.
|
| 255 | * I hate C.
|
| 256 | *
|
| 257 | * The rotation macros are straightforward.
|
| 258 | * They are only applied to UInt32 values, which are _unsigned_
|
| 259 | * so the >> operator must do a logical shift that brings in zeroes.
|
| 260 | * On most platforms you will only need to optimise the ROL32 macro; the
|
| 261 | * ROR32 macro is not inefficient on an optimising compiler as all rotation
|
| 262 | * amounts in this code are known at compile time.
|
| 263 | *
|
| 264 | * On many platforms there is a faster solution.
|
| 265 | * For example, MS compilers have the __rotl and __rotr functions
|
| 266 | * that generate x86 rotation instructions.
|
| 267 | */
|
| 268 | #define UINT32_MASK ( (((Twofish_UInt32)2)<<31) - 1 )
|
| 269 |
|
| 270 | #ifndef _MSC_VER
|
| 271 | #define ROL32(x,n) ( (x)<<(n) | ((x) & UINT32_MASK) >> (32-(n)) )
|
| 272 | #define ROR32(x,n) ( (x)>>(n) | ((x) & UINT32_MASK) << (32-(n)) )
|
| 273 | #else
|
| 274 | #define ROL32(x,n) (_lrotl((x), (n)))
|
| 275 | #define ROR32(x,n) (_lrotr((x), (n)))
|
| 276 | #endif
|
| 277 |
|
| 278 | /*
|
| 279 | * Select data type for q-table entries.
|
| 280 | *
|
| 281 | * Larger entry types cost more memory (1.5 kB), and might be faster
|
| 282 | * or slower depending on the CPU and compiler details.
|
| 283 | *
|
| 284 | * This choice only affects the static data size and the key setup speed.
|
| 285 | * Functionality, expanded key size, or encryption speed are not affected.
|
| 286 | * Define to 1 to get large q-table entries.
|
| 287 | */
|
| 288 | #define LARGE_Q_TABLE 0 /* default = 0 */
|
| 289 |
|
| 290 |
|
| 291 | /*
|
| 292 | * Method to select a single byte from a UInt32.
|
| 293 | * WARNING: non-portable code if set; might not work on all platforms.
|
| 294 | *
|
| 295 | * Inside the inner loop of Twofish it is necessary to access the 4
|
| 296 | * individual bytes of a UInt32. This can be done using either shifts
|
| 297 | * and masks, or memory accesses.
|
| 298 | *
|
| 299 | * Set to 0 to use shift and mask operations for the byte selection.
|
| 300 | * This is more ALU intensive. It is also fully portable.
|
| 301 | *
|
| 302 | * Set to 1 to use memory accesses. The UInt32 is stored in memory and
|
| 303 | * the individual bytes are read from memory one at a time.
|
| 304 | * This solution is more memory-intensive, and not fully portable.
|
| 305 | * It might be faster on your platform, or not. If you use this option,
|
| 306 | * make sure you set the CPU_IS_BIG_ENDIAN flag appropriately.
|
| 307 | *
|
| 308 | * This macro does not affect the conversion of the inputs and outputs
|
| 309 | * of the cipher. See the CONVERT_USING_CASTS macro for that.
|
| 310 | */
|
| 311 | #define SELECT_BYTE_FROM_UINT32_IN_MEMORY 0 /* default = 0 */
|
| 312 |
|
| 313 |
|
| 314 | /*
|
| 315 | * Method used to read the input and write the output.
|
| 316 | * WARNING: non-portable code if set; might not work on all platforms.
|
| 317 | *
|
| 318 | * Twofish operates on 32-bit words. The input to the cipher is
|
| 319 | * a byte array, as is the output. The portable method of doing the
|
| 320 | * conversion is a bunch of rotate and mask operations, but on many
|
| 321 | * platforms it can be done faster using a cast.
|
| 322 | * This only works if your CPU allows UInt32 accesses to arbitrary Byte
|
| 323 | * addresses.
|
| 324 | *
|
| 325 | * Set to 0 to use the shift and mask operations. This is fully
|
| 326 | * portable. .
|
| 327 | *
|
| 328 | * Set to 1 to use a cast. The Byte * is cast to a UInt32 *, and a
|
| 329 | * UInt32 is read. If necessary (as indicated by the CPU_IS_BIG_ENDIAN
|
| 330 | * macro) the byte order in the UInt32 is swapped. The reverse is done
|
| 331 | * to write the output of the encryption/decryption. Make sure you set
|
| 332 | * the CPU_IS_BIG_ENDIAN flag appropriately.
|
| 333 | * This option does not work unless a UInt32 is exactly 32 bits.
|
| 334 | *
|
| 335 | * This macro only changes the reading/writing of the plaintext/ciphertext.
|
| 336 | * See the SELECT_BYTE_FROM_UINT32_IN_MEMORY to affect the way in which
|
| 337 | * a UInt32 is split into 4 bytes for the S-box selection.
|
| 338 | */
|
| 339 | #define CONVERT_USING_CASTS 0 /* default = 0 */
|
| 340 |
|
| 341 |
|
| 342 | /*
|
| 343 | * Endianness switch.
|
| 344 | * Only relevant if SELECT_BYTE_FROM_UINT32_IN_MEMORY or
|
| 345 | * CONVERT_USING_CASTS is set.
|
| 346 | *
|
| 347 | * Set to 1 on a big-endian machine, and to 0 on a little-endian machine.
|
| 348 | * Twofish uses the little-endian convention (least significant byte first)
|
| 349 | * and big-endian machines (using most significant byte first)
|
| 350 | * have to do a few conversions.
|
| 351 | *
|
| 352 | * CAUTION: This code has never been tested on a big-endian machine,
|
| 353 | * because I don't have access to one. Feedback appreciated.
|
| 354 | */
|
| 355 | #define CPU_IS_BIG_ENDIAN 0
|
| 356 |
|
| 357 |
|
| 358 | /*
|
| 359 | * Macro to reverse the order of the bytes in a UInt32.
|
| 360 | * Used to convert to little-endian on big-endian machines.
|
| 361 | * This macro is always tested, but only used in the encryption and
|
| 362 | * decryption if CONVERT_USING_CASTS, and CPU_IS_BIG_ENDIAN
|
| 363 | * are both set. In other words: this macro is only speed-critical if
|
| 364 | * both these flags have been set.
|
| 365 | *
|
| 366 | * This default definition of SWAP works, but on many platforms there is a
|
| 367 | * more efficient implementation.
|
| 368 | */
|
| 369 | #define BSWAP(x) ((ROL32((x),8)&0x00ff00ff) | (ROR32((x),8) & 0xff00ff00))
|
| 370 |
|
| 371 |
|
| 372 | /*
|
| 373 | * END OF PLATFORM FIXES
|
| 374 | * =====================
|
| 375 | *
|
| 376 | * You should not have to touch the rest of this file.
|
| 377 | */
|
| 378 |
|
| 379 |
|
| 380 | /*
|
| 381 | * Convert the external type names to some that are easier to use inside
|
| 382 | * this file. I didn't want to use the names Byte and UInt32 in the
|
| 383 | * header file, because many programs already define them and using two
|
| 384 | * conventions at once can be very difficult.
|
| 385 | * Don't change these definitions! Change the originals
|
| 386 | * in twofish.h instead.
|
| 387 | */
|
| 388 | /* A Byte must be an unsigned integer, 8 bits long. */
|
| 389 | /* typedef Twofish_Byte Byte; */
|
| 390 | /* A UInt32 must be an unsigned integer at least 32 bits long. */
|
| 391 | /* typedef Twofish_UInt32 UInt32; */
|
| 392 |
|
| 393 |
|
| 394 | /*
|
| 395 | * Define a macro ENDIAN_CONVERT.
|
| 396 | *
|
| 397 | * We define a macro ENDIAN_CONVERT that performs a BSWAP on big-endian
|
| 398 | * machines, and is the identity function on little-endian machines.
|
| 399 | * The code then uses this macro without considering the endianness.
|
| 400 | */
|
| 401 |
|
| 402 | #if CPU_IS_BIG_ENDIAN
|
| 403 | #define ENDIAN_CONVERT(x) BSWAP(x)
|
| 404 | #else
|
| 405 | #define ENDIAN_CONVERT(x) (x)
|
| 406 | #endif
|
| 407 |
|
| 408 |
|
| 409 | /*
|
| 410 | * Compute byte offset within a UInt32 stored in memory.
|
| 411 | *
|
| 412 | * This is only used when SELECT_BYTE_FROM_UINT32_IN_MEMORY is set.
|
| 413 | *
|
| 414 | * The input is the byte number 0..3, 0 for least significant.
|
| 415 | * Note the use of sizeof() to support UInt32 types that are larger
|
| 416 | * than 4 bytes.
|
| 417 | */
|
| 418 | #if CPU_IS_BIG_ENDIAN
|
| 419 | #define BYTE_OFFSET( n ) (sizeof(Twofish_UInt32) - 1 - (n) )
|
| 420 | #else
|
| 421 | #define BYTE_OFFSET( n ) (n)
|
| 422 | #endif
|
| 423 |
|
| 424 |
|
| 425 | /*
|
| 426 | * Macro to get Byte no. b from UInt32 value X.
|
| 427 | * We use two different definition, depending on the settings.
|
| 428 | */
|
| 429 | #if SELECT_BYTE_FROM_UINT32_IN_MEMORY
|
| 430 | /* Pick the byte from the memory in which X is stored. */
|
| 431 | #define SELECT_BYTE( X, b ) (((Twofish_Byte *)(&(X)))[BYTE_OFFSET(b)])
|
| 432 | #else
|
| 433 | /* Portable solution: Pick the byte directly from the X value. */
|
| 434 | #define SELECT_BYTE( X, b ) (((X) >> (8*(b))) & 0xff)
|
| 435 | #endif
|
| 436 |
|
| 437 |
|
| 438 | /* Some shorthands because we use byte selection in large formulae. */
|
| 439 | #define b0(X) SELECT_BYTE((X),0)
|
| 440 | #define b1(X) SELECT_BYTE((X),1)
|
| 441 | #define b2(X) SELECT_BYTE((X),2)
|
| 442 | #define b3(X) SELECT_BYTE((X),3)
|
| 443 |
|
| 444 |
|
| 445 | /*
|
| 446 | * We need macros to load and store UInt32 from/to byte arrays
|
| 447 | * using the least-significant-byte-first convention.
|
| 448 | *
|
| 449 | * GET32( p ) gets a UInt32 in lsb-first form from four bytes pointed to
|
| 450 | * by p.
|
| 451 | * PUT32( v, p ) writes the UInt32 value v at address p in lsb-first form.
|
| 452 | */
|
| 453 | #if CONVERT_USING_CASTS
|
| 454 |
|
| 455 | /* Get UInt32 from four bytes pointed to by p. */
|
| 456 | #define GET32( p ) ENDIAN_CONVERT( *((Twofish_UInt32 *)(p)) )
|
| 457 | /* Put UInt32 into four bytes pointed to by p */
|
| 458 | #define PUT32( v, p ) *((Twofish_UInt32 *)(p)) = ENDIAN_CONVERT(v)
|
| 459 |
|
| 460 | #else
|
| 461 |
|
| 462 | /* Get UInt32 from four bytes pointed to by p. */
|
| 463 | #define GET32( p ) \
|
| 464 | ( \
|
| 465 | (Twofish_UInt32)((p)[0]) \
|
| 466 | | (Twofish_UInt32)((p)[1])<< 8 \
|
| 467 | | (Twofish_UInt32)((p)[2])<<16 \
|
| 468 | | (Twofish_UInt32)((p)[3])<<24 \
|
| 469 | )
|
| 470 | /* Put UInt32 into four bytes pointed to by p */
|
| 471 | #define PUT32( v, p ) \
|
| 472 | (p)[0] = (Twofish_Byte)(((v) ) & 0xff); \
|
| 473 | (p)[1] = (Twofish_Byte)(((v) >> 8) & 0xff); \
|
| 474 | (p)[2] = (Twofish_Byte)(((v) >> 16) & 0xff); \
|
| 475 | (p)[3] = (Twofish_Byte)(((v) >> 24) & 0xff)
|
| 476 |
|
| 477 | #endif
|
| 478 |
|
| 479 |
|
| 480 | /*
|
| 481 | * Test the platform-specific macros.
|
| 482 | * This function tests the macros defined so far to make sure the
|
| 483 | * definitions are appropriate for this platform.
|
| 484 | * If you make any mistake in the platform configuration, this should detect
|
| 485 | * that and inform you what went wrong.
|
| 486 | * Somewhere, someday, this is going to save somebody a lot of time,
|
| 487 | * because misbehaving macros are hard to debug.
|
| 488 | */
|
| 489 | static int test_platform()
|
| 490 | {
|
| 491 | /* Buffer with test values. */
|
| 492 | Twofish_Byte buf[] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xbc, 0xde, 0};
|
| 493 | Twofish_UInt32 C;
|
| 494 | Twofish_UInt32 x,y;
|
| 495 | int i;
|
| 496 |
|
| 497 | /*
|
| 498 | * Some sanity checks on the types that can't be done in compile time.
|
| 499 | * A smart compiler will just optimise these tests away.
|
| 500 | * The pre-processor doesn't understand different types, so we cannot
|
| 501 | * do these checks in compile-time.
|
| 502 | *
|
| 503 | * I hate C.
|
| 504 | *
|
| 505 | * The first check in each case is to make sure the size is correct.
|
| 506 | * The second check is to ensure that it is an unsigned type.
|
| 507 | */
|
| 508 | if( ((Twofish_UInt32)((Twofish_UInt32)1 << 31) == 0) || ((Twofish_UInt32)-1 < 0 ))
|
| 509 | {
|
| 510 | Twofish_fatal( "Twofish code: Twofish_UInt32 type not suitable", ERR_UINT32 );
|
| 511 | }
|
| 512 | if( (sizeof( Twofish_Byte ) != 1) || (((Twofish_Byte)-1) < 0) )
|
| 513 | {
|
| 514 | Twofish_fatal( "Twofish code: Twofish_Byte type not suitable", ERR_BYTE );
|
| 515 | }
|
| 516 |
|
| 517 | /*
|
| 518 | * Sanity-check the endianness conversions.
|
| 519 | * This is just an aid to find problems. If you do the endianness
|
| 520 | * conversion macros wrong you will fail the full cipher test,
|
| 521 | * but that does not help you find the error.
|
| 522 | * Always make it easy to find the bugs!
|
| 523 | *
|
| 524 | * Detail: There is no fully portable way of writing UInt32 constants,
|
| 525 | * as you don't know whether to use the U or UL suffix. Using only U you
|
| 526 | * might only be allowed 16-bit constants. Using UL you might get 64-bit
|
| 527 | * constants which cannot be stored in a UInt32 without warnings, and
|
| 528 | * which generally behave subtly different from a true UInt32.
|
| 529 | * As long as we're just comparing with the constant,
|
| 530 | * we can always use the UL suffix and at worst lose some efficiency.
|
| 531 | * I use a separate '32-bit constant' macro in most of my other code.
|
| 532 | *
|
| 533 | * I hate C.
|
| 534 | *
|
| 535 | * Start with testing GET32. We test it on all positions modulo 4
|
| 536 | * to make sure we can handly any position of inputs. (Some CPUs
|
| 537 | * do not allow non-aligned accesses which we would do if you used
|
| 538 | * the CONVERT_USING_CASTS option.
|
| 539 | */
|
| 540 | if( (GET32( buf ) != 0x78563412UL) || (GET32(buf+1) != 0x9a785634UL)
|
| 541 | || (GET32( buf+2 ) != 0xbc9a7856UL) || (GET32(buf+3) != 0xdebc9a78UL) )
|
| 542 | {
|
| 543 | Twofish_fatal( "Twofish code: GET32 not implemented properly", ERR_GET32 );
|
| 544 | }
|
| 545 |
|
| 546 | /*
|
| 547 | * We can now use GET32 to test PUT32.
|
| 548 | * We don't test the shifted versions. If GET32 can do that then
|
| 549 | * so should PUT32.
|
| 550 | */
|
| 551 | C = GET32( buf );
|
| 552 | PUT32( 3*C, buf );
|
| 553 | if( GET32( buf ) != 0x69029c36UL )
|
| 554 | {
|
| 555 | Twofish_fatal( "Twofish code: PUT32 not implemented properly", ERR_PUT32 );
|
| 556 | }
|
| 557 |
|
| 558 |
|
| 559 | /* Test ROL and ROR */
|
| 560 | for( i=1; i<32; i++ )
|
| 561 | {
|
| 562 | /* Just a simple test. */
|
| 563 | x = ROR32( C, i );
|
| 564 | y = ROL32( C, i );
|
| 565 | x ^= (C>>i) ^ (C<<(32-i));
|
| 566 | /*y ^= (C<>(32-i)); */
|
| 567 | y ^= (C<<i) ^ (C>>(32-i));
|
| 568 | x |= y;
|
| 569 | /*
|
| 570 | * Now all we check is that x is zero in the least significant
|
| 571 | * 32 bits. Using the UL suffix is safe here, as it doesn't matter
|
| 572 | * if we get a larger type.
|
| 573 | */
|
| 574 | if( (x & 0xffffffffUL) != 0 )
|
| 575 | {
|
| 576 | Twofish_fatal( "Twofish ROL or ROR not properly defined.", ERR_ROLR );
|
| 577 | }
|
| 578 | }
|
| 579 |
|
| 580 | /* Test the BSWAP macro */
|
| 581 | if( BSWAP(C) != 0x12345678UL )
|
| 582 | {
|
| 583 | /*
|
| 584 | * The BSWAP macro should always work, even if you are not using it.
|
| 585 | * A smart optimising compiler will just remove this entire test.
|
| 586 | */
|
| 587 | Twofish_fatal( "BSWAP not properly defined.", ERR_BSWAP );
|
| 588 | }
|
| 589 |
|
| 590 | /* And we can test the b macros which use SELECT_BYTE. */
|
| 591 | if( (b0(C)!=0x12) || (b1(C) != 0x34) || (b2(C) != 0x56) || (b3(C) != 0x78) )
|
| 592 | {
|
| 593 | /*
|
| 594 | * There are many reasons why this could fail.
|
| 595 | * Most likely is that CPU_IS_BIG_ENDIAN has the wrong value.
|
| 596 | */
|
| 597 | Twofish_fatal( "Twofish code: SELECT_BYTE not implemented properly", ERR_SELECTB );
|
| 598 | }
|
| 599 | return SUCCESS;
|
| 600 | }
|
| 601 |
|
| 602 |
|
| 603 | /*
|
| 604 | * Finally, we can start on the Twofish-related code.
|
| 605 | * You really need the Twofish specifications to understand this code. The
|
| 606 | * best source is the Twofish book:
|
| 607 | * "The Twofish Encryption Algorithm", by Bruce Schneier, John Kelsey,
|
| 608 | * Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson.
|
| 609 | * you can also use the AES submission document of Twofish, which is
|
| 610 | * available from my list of publications on my personal web site at
|
| 611 | * http://niels.ferguson.net/.
|
| 612 | *
|
| 613 | * The first thing we do is write the testing routines. This is what the
|
| 614 | * implementation has to satisfy in the end. We only test the external
|
| 615 | * behaviour of the implementation of course.
|
| 616 | */
|
| 617 |
|
| 618 |
|
| 619 | /*
|
| 620 | * Perform a single self test on a (plaintext,ciphertext,key) triple.
|
| 621 | * Arguments:
|
| 622 | * key array of key bytes
|
| 623 | * key_len length of key in bytes
|
| 624 | * p plaintext
|
| 625 | * c ciphertext
|
| 626 | */
|
| 627 | static int test_vector( Twofish_Byte key[], int key_len, Twofish_Byte p[16], Twofish_Byte c[16] )
|
| 628 | {
|
| 629 | Twofish_Byte tmp[16]; /* scratch pad. */
|
| 630 | Twofish_key xkey; /* The expanded key */
|
| 631 | int i;
|
| 632 |
|
| 633 |
|
| 634 | /* Prepare the key */
|
| 635 | if ((i = Twofish_prepare_key( key, key_len, &xkey)) < 0)
|
| 636 | return i;
|
| 637 |
|
| 638 | /*
|
| 639 | * We run the test twice to ensure that the xkey structure
|
| 640 | * is not damaged by the first encryption.
|
| 641 | * Those are hideous bugs to find if you get them in an application.
|
| 642 | */
|
| 643 | for( i=0; i<2; i++ )
|
| 644 | {
|
| 645 | /* Encrypt and test */
|
| 646 | Twofish_encrypt( &xkey, p, tmp );
|
| 647 | if( memcmp( c, tmp, 16 ) != 0 )
|
| 648 | {
|
| 649 | Twofish_fatal( "Twofish encryption failure", ERR_TEST_ENC );
|
| 650 | }
|
| 651 |
|
| 652 | /* Decrypt and test */
|
| 653 | Twofish_decrypt( &xkey, c, tmp );
|
| 654 | if( memcmp( p, tmp, 16 ) != 0 )
|
| 655 | {
|
| 656 | Twofish_fatal( "Twofish decryption failure", ERR_TEST_DEC );
|
| 657 | }
|
| 658 | }
|
| 659 |
|
| 660 | /* The test keys are not secret, so we don't need to wipe xkey. */
|
| 661 | return SUCCESS;
|
| 662 | }
|
| 663 |
|
| 664 |
|
| 665 | /*
|
| 666 | * Check implementation using three (key,plaintext,ciphertext)
|
| 667 | * test vectors, one for each major key length.
|
| 668 | *
|
| 669 | * This is an absolutely minimal self-test.
|
| 670 | * This routine does not test odd-sized keys.
|
| 671 | */
|
| 672 | static int test_vectors()
|
| 673 | {
|
| 674 | /*
|
| 675 | * We run three tests, one for each major key length.
|
| 676 | * These test vectors come from the Twofish specification.
|
| 677 | * One encryption and one decryption using randomish data and key
|
| 678 | * will detect almost any error, especially since we generate the
|
| 679 | * tables ourselves, so we don't have the problem of a single
|
| 680 | * damaged table entry in the source.
|
| 681 | */
|
| 682 |
|
| 683 | /* 128-bit test is the I=3 case of section B.2 of the Twofish book. */
|
| 684 | static Twofish_Byte k128[] = {
|
| 685 | 0x9F, 0x58, 0x9F, 0x5C, 0xF6, 0x12, 0x2C, 0x32,
|
| 686 | 0xB6, 0xBF, 0xEC, 0x2F, 0x2A, 0xE8, 0xC3, 0x5A,
|
| 687 | };
|
| 688 | static Twofish_Byte p128[] = {
|
| 689 | 0xD4, 0x91, 0xDB, 0x16, 0xE7, 0xB1, 0xC3, 0x9E,
|
| 690 | 0x86, 0xCB, 0x08, 0x6B, 0x78, 0x9F, 0x54, 0x19
|
| 691 | };
|
| 692 | static Twofish_Byte c128[] = {
|
| 693 | 0x01, 0x9F, 0x98, 0x09, 0xDE, 0x17, 0x11, 0x85,
|
| 694 | 0x8F, 0xAA, 0xC3, 0xA3, 0xBA, 0x20, 0xFB, 0xC3
|
| 695 | };
|
| 696 |
|
| 697 | /* 192-bit test is the I=4 case of section B.2 of the Twofish book. */
|
| 698 | static Twofish_Byte k192[] = {
|
| 699 | 0x88, 0xB2, 0xB2, 0x70, 0x6B, 0x10, 0x5E, 0x36,
|
| 700 | 0xB4, 0x46, 0xBB, 0x6D, 0x73, 0x1A, 0x1E, 0x88,
|
| 701 | 0xEF, 0xA7, 0x1F, 0x78, 0x89, 0x65, 0xBD, 0x44
|
| 702 | };
|
| 703 | static Twofish_Byte p192[] = {
|
| 704 | 0x39, 0xDA, 0x69, 0xD6, 0xBA, 0x49, 0x97, 0xD5,
|
| 705 | 0x85, 0xB6, 0xDC, 0x07, 0x3C, 0xA3, 0x41, 0xB2
|
| 706 | };
|
| 707 | static Twofish_Byte c192[] = {
|
| 708 | 0x18, 0x2B, 0x02, 0xD8, 0x14, 0x97, 0xEA, 0x45,
|
| 709 | 0xF9, 0xDA, 0xAC, 0xDC, 0x29, 0x19, 0x3A, 0x65
|
| 710 | };
|
| 711 |
|
| 712 | /* 256-bit test is the I=4 case of section B.2 of the Twofish book. */
|
| 713 | static Twofish_Byte k256[] = {
|
| 714 | 0xD4, 0x3B, 0xB7, 0x55, 0x6E, 0xA3, 0x2E, 0x46,
|
| 715 | 0xF2, 0xA2, 0x82, 0xB7, 0xD4, 0x5B, 0x4E, 0x0D,
|
| 716 | 0x57, 0xFF, 0x73, 0x9D, 0x4D, 0xC9, 0x2C, 0x1B,
|
| 717 | 0xD7, 0xFC, 0x01, 0x70, 0x0C, 0xC8, 0x21, 0x6F
|
| 718 | };
|
| 719 | static Twofish_Byte p256[] = {
|
| 720 | 0x90, 0xAF, 0xE9, 0x1B, 0xB2, 0x88, 0x54, 0x4F,
|
| 721 | 0x2C, 0x32, 0xDC, 0x23, 0x9B, 0x26, 0x35, 0xE6
|
| 722 | };
|
| 723 | static Twofish_Byte c256[] = {
|
| 724 | 0x6C, 0xB4, 0x56, 0x1C, 0x40, 0xBF, 0x0A, 0x97,
|
| 725 | 0x05, 0x93, 0x1C, 0xB6, 0xD4, 0x08, 0xE7, 0xFA
|
| 726 | };
|
| 727 |
|
| 728 | int ret;
|
| 729 |
|
| 730 | /* Run the actual tests. */
|
| 731 | if ((ret = test_vector( k128, 16, p128, c128 )) < 0)
|
| 732 | return ret;
|
| 733 | if ((ret = test_vector( k192, 24, p192, c192 )) < 0)
|
| 734 | return ret;
|
| 735 | if ((ret = test_vector( k256, 32, p256, c256 )) < 0)
|
| 736 | return ret;
|
| 737 | return SUCCESS;
|
| 738 | }
|
| 739 |
|
| 740 |
|
| 741 | /*
|
| 742 | * Perform extensive test for a single key size.
|
| 743 | *
|
| 744 | * Test a single key size against the test vectors from section
|
| 745 | * B.2 in the Twofish book. This is a sequence of 49 encryptions
|
| 746 | * and decryptions. Each plaintext is equal to the ciphertext of
|
| 747 | * the previous encryption. The key is made up from the ciphertext
|
| 748 | * two and three encryptions ago. Both plaintext and key start
|
| 749 | * at the zero value.
|
| 750 | * We should have designed a cleaner recurrence relation for
|
| 751 | * these tests, but it is too late for that now. At least we learned
|
| 752 | * how to do it better next time.
|
| 753 | * For details see appendix B of the book.
|
| 754 | *
|
| 755 | * Arguments:
|
| 756 | * key_len Number of bytes of key
|
| 757 | * final_value Final plaintext value after 49 iterations
|
| 758 | */
|
| 759 | static int test_sequence( int key_len, Twofish_Byte final_value[] )
|
| 760 | {
|
| 761 | Twofish_Byte buf[ (50+3)*16 ]; /* Buffer to hold our computation values. */
|
| 762 | Twofish_Byte tmp[16]; /* Temp for testing the decryption. */
|
| 763 | Twofish_key xkey; /* The expanded key */
|
| 764 | int i, ret;
|
| 765 | Twofish_Byte * p;
|
| 766 |
|
| 767 | /* Wipe the buffer */
|
| 768 | memset( buf, 0, sizeof( buf ) );
|
| 769 |
|
| 770 | /*
|
| 771 | * Because the recurrence relation is done in an inconvenient manner
|
| 772 | * we end up looping backwards over the buffer.
|
| 773 | */
|
| 774 |
|
| 775 | /* Pointer in buffer points to current plaintext. */
|
| 776 | p = &buf[50*16];
|
| 777 | for( i=1; i<50; i++ )
|
| 778 | {
|
| 779 | /*
|
| 780 | * Prepare a key.
|
| 781 | * This automatically checks that key_len is valid.
|
| 782 | */
|
| 783 | if ((ret = Twofish_prepare_key( p+16, key_len, &xkey)) < 0)
|
| 784 | return ret;
|
| 785 |
|
| 786 | /* Compute the next 16 bytes in the buffer */
|
| 787 | Twofish_encrypt( &xkey, p, p-16 );
|
| 788 |
|
| 789 | /* Check that the decryption is correct. */
|
| 790 | Twofish_decrypt( &xkey, p-16, tmp );
|
| 791 | if( memcmp( tmp, p, 16 ) != 0 )
|
| 792 | {
|
| 793 | Twofish_fatal( "Twofish decryption failure in sequence", ERR_SEQ_DEC );
|
| 794 | }
|
| 795 | /* Move on to next 16 bytes in the buffer. */
|
| 796 | p -= 16;
|
| 797 | }
|
| 798 |
|
| 799 | /* And check the final value. */
|
| 800 | if( memcmp( p, final_value, 16 ) != 0 )
|
| 801 | {
|
| 802 | Twofish_fatal( "Twofish encryption failure in sequence", ERR_SEQ_ENC );
|
| 803 | }
|
| 804 |
|
| 805 | /* None of the data was secret, so there is no need to wipe anything. */
|
| 806 | return SUCCESS;
|
| 807 | }
|
| 808 |
|
| 809 |
|
| 810 | /*
|
| 811 | * Run all three sequence tests from the Twofish test vectors.
|
| 812 | *
|
| 813 | * This checks the most extensive test vectors currently available
|
| 814 | * for Twofish. The data is from the Twofish book, appendix B.2.
|
| 815 | */
|
| 816 | static int test_sequences()
|
| 817 | {
|
| 818 | static Twofish_Byte r128[] = {
|
| 819 | 0x5D, 0x9D, 0x4E, 0xEF, 0xFA, 0x91, 0x51, 0x57,
|
| 820 | 0x55, 0x24, 0xF1, 0x15, 0x81, 0x5A, 0x12, 0xE0
|
| 821 | };
|
| 822 | static Twofish_Byte r192[] = {
|
| 823 | 0xE7, 0x54, 0x49, 0x21, 0x2B, 0xEE, 0xF9, 0xF4,
|
| 824 | 0xA3, 0x90, 0xBD, 0x86, 0x0A, 0x64, 0x09, 0x41
|
| 825 | };
|
| 826 | static Twofish_Byte r256[] = {
|
| 827 | 0x37, 0xFE, 0x26, 0xFF, 0x1C, 0xF6, 0x61, 0x75,
|
| 828 | 0xF5, 0xDD, 0xF4, 0xC3, 0x3B, 0x97, 0xA2, 0x05
|
| 829 | };
|
| 830 |
|
| 831 | /* Run the three sequence test vectors */
|
| 832 | int ret;
|
| 833 | if ((ret = test_sequence( 16, r128)) < 0)
|
| 834 | return ret;
|
| 835 | if ((ret = test_sequence( 24, r192)) < 0)
|
| 836 | return ret;
|
| 837 | if ((ret = test_sequence( 32, r256)) < 0)
|
| 838 | return ret;
|
| 839 | return SUCCESS;
|
| 840 | }
|
| 841 |
|
| 842 |
|
| 843 | /*
|
| 844 | * Test the odd-sized keys.
|
| 845 | *
|
| 846 | * Every odd-sized key is equivalent to a one of 128, 192, or 256 bits.
|
| 847 | * The equivalent key is found by padding at the end with zero bytes
|
| 848 | * until a regular key size is reached.
|
| 849 | *
|
| 850 | * We just test that the key expansion routine behaves properly.
|
| 851 | * If the expanded keys are identical, then the encryptions and decryptions
|
| 852 | * will behave the same.
|
| 853 | */
|
| 854 | static int test_odd_sized_keys()
|
| 855 | {
|
| 856 | Twofish_Byte buf[32];
|
| 857 | Twofish_key xkey;
|
| 858 | Twofish_key xkey_two;
|
| 859 | int i, ret;
|
| 860 |
|
| 861 | /*
|
| 862 | * We first create an all-zero key to use as PRNG key.
|
| 863 | * Normally we would not have to fill the buffer with zeroes, as we could
|
| 864 | * just pass a zero key length to the Twofish_prepare_key function.
|
| 865 | * However, this relies on using odd-sized keys, and those are just the
|
| 866 | * ones we are testing here. We can't use an untested function to test
|
| 867 | * itself.
|
| 868 | */
|
| 869 | memset( buf, 0, sizeof( buf ) );
|
| 870 | if ((ret = Twofish_prepare_key( buf, 16, &xkey)) < 0)
|
| 871 | return ret;
|
| 872 |
|
| 873 | /* Fill buffer with pseudo-random data derived from two encryptions */
|
| 874 | Twofish_encrypt( &xkey, buf, buf );
|
| 875 | Twofish_encrypt( &xkey, buf, buf+16 );
|
| 876 |
|
| 877 | /* Create all possible shorter keys that are prefixes of the buffer. */
|
| 878 | for( i=31; i>=0; i-- )
|
| 879 | {
|
| 880 | /* Set a byte to zero. This is the new padding byte */
|
| 881 | buf[i] = 0;
|
| 882 |
|
| 883 | /* Expand the key with only i bytes of length */
|
| 884 | if ((ret = Twofish_prepare_key( buf, i, &xkey)) < 0)
|
| 885 | return ret;
|
| 886 |
|
| 887 | /* Expand the corresponding padded key of regular length */
|
| 888 | if ((ret = Twofish_prepare_key( buf, i<=16 ? 16 : (i<= 24 ? 24 : 32), &xkey_two )) < 0)
|
| 889 | return ret;
|
| 890 |
|
| 891 | /* Compare the two */
|
| 892 | if( memcmp( &xkey, &xkey_two, sizeof( xkey ) ) != 0 )
|
| 893 | {
|
| 894 | Twofish_fatal( "Odd sized keys do not expand properly", ERR_ODD_KEY );
|
| 895 | }
|
| 896 | }
|
| 897 |
|
| 898 | /* None of the key values are secret, so we don't need to wipe them. */
|
| 899 | return SUCCESS;
|
| 900 | }
|
| 901 |
|
| 902 |
|
| 903 | /*
|
| 904 | * Test the Twofish implementation.
|
| 905 | *
|
| 906 | * This routine runs all the self tests, in order of importance.
|
| 907 | * It is called by the Twofish_initialise routine.
|
| 908 | *
|
| 909 | * In almost all applications the cost of running the self tests during
|
| 910 | * initialisation is insignificant, especially
|
| 911 | * compared to the time it takes to load the application from disk.
|
| 912 | * If you are very pressed for initialisation performance,
|
| 913 | * you could remove some of the tests. Make sure you did run them
|
| 914 | * once in the software and hardware configuration you are using.
|
| 915 | */
|
| 916 | static int self_test()
|
| 917 | {
|
| 918 | int ret;
|
| 919 | /* The three test vectors form an absolute minimal test set. */
|
| 920 | if ((ret = test_vectors()) < 0)
|
| 921 | return ret;
|
| 922 |
|
| 923 | /*
|
| 924 | * If at all possible you should run these tests too. They take
|
| 925 | * more time, but provide a more thorough coverage.
|
| 926 | */
|
| 927 | if ((ret = test_sequences()) < 0)
|
| 928 | return ret;
|
| 929 |
|
| 930 | /* Test the odd-sized keys. */
|
| 931 | if ((ret = test_odd_sized_keys()) < 0)
|
| 932 | return ret;
|
| 933 | return SUCCESS;
|
| 934 | }
|
| 935 |
|
| 936 |
|
| 937 | /*
|
| 938 | * And now, the actual Twofish implementation.
|
| 939 | *
|
| 940 | * This implementation generates all the tables during initialisation.
|
| 941 | * I don't like large tables in the code, especially since they are easily
|
| 942 | * damaged in the source without anyone noticing it. You need code to
|
| 943 | * generate them anyway, and this way all the code is close together.
|
| 944 | * Generating them in the application leads to a smaller executable
|
| 945 | * (the code is smaller than the tables it generates) and a
|
| 946 | * larger static memory footprint.
|
| 947 | *
|
| 948 | * Twofish can be implemented in many ways. I have chosen to
|
| 949 | * use large tables with a relatively long key setup time.
|
| 950 | * If you encrypt more than a few blocks of data it pays to pre-compute
|
| 951 | * as much as possible. This implementation is relatively inefficient for
|
| 952 | * applications that need to re-key every block or so.
|
| 953 | */
|
| 954 |
|
| 955 | /*
|
| 956 | * We start with the t-tables, directly from the Twofish definition.
|
| 957 | * These are nibble-tables, but merging them and putting them two nibbles
|
| 958 | * in one byte is more work than it is worth.
|
| 959 | */
|
| 960 | static Twofish_Byte t_table[2][4][16] = {
|
| 961 | {
|
| 962 | {0x8,0x1,0x7,0xD,0x6,0xF,0x3,0x2,0x0,0xB,0x5,0x9,0xE,0xC,0xA,0x4},
|
| 963 | {0xE,0xC,0xB,0x8,0x1,0x2,0x3,0x5,0xF,0x4,0xA,0x6,0x7,0x0,0x9,0xD},
|
| 964 | {0xB,0xA,0x5,0xE,0x6,0xD,0x9,0x0,0xC,0x8,0xF,0x3,0x2,0x4,0x7,0x1},
|
| 965 | {0xD,0x7,0xF,0x4,0x1,0x2,0x6,0xE,0x9,0xB,0x3,0x0,0x8,0x5,0xC,0xA}
|
| 966 | },
|
| 967 | {
|
| 968 | {0x2,0x8,0xB,0xD,0xF,0x7,0x6,0xE,0x3,0x1,0x9,0x4,0x0,0xA,0xC,0x5},
|
| 969 | {0x1,0xE,0x2,0xB,0x4,0xC,0x3,0x7,0x6,0xD,0xA,0x5,0xF,0x9,0x0,0x8},
|
| 970 | {0x4,0xC,0x7,0x5,0x1,0x6,0x9,0xA,0x0,0xE,0xD,0x8,0x2,0xB,0x3,0xF},
|
| 971 | {0xB,0x9,0x5,0x1,0xC,0x3,0xD,0xE,0x6,0x4,0x7,0xF,0x2,0x0,0x8,0xA}
|
| 972 | }
|
| 973 | };
|
| 974 |
|
| 975 |
|
| 976 | /* A 1-bit rotation of 4-bit values. Input must be in range 0..15 */
|
| 977 | #define ROR4BY1( x ) (((x)>>1) | (((x)<<3) & 0x8) )
|
| 978 |
|
| 979 | /*
|
| 980 | * The q-boxes are only used during the key schedule computations.
|
| 981 | * These are 8->8 bit lookup tables. Some CPUs prefer to have 8->32 bit
|
| 982 | * lookup tables as it is faster to load a 32-bit value than to load an
|
| 983 | * 8-bit value and zero the rest of the register.
|
| 984 | * The LARGE_Q_TABLE switch allows you to choose 32-bit entries in
|
| 985 | * the q-tables. Here we just define the Qtype which is used to store
|
| 986 | * the entries of the q-tables.
|
| 987 | */
|
| 988 | #if LARGE_Q_TABLE
|
| 989 | typedef Twofish_UInt32 Qtype;
|
| 990 | #else
|
| 991 | typedef Twofish_Byte Qtype;
|
| 992 | #endif
|
| 993 |
|
| 994 | /*
|
| 995 | * The actual q-box tables.
|
| 996 | * There are two q-boxes, each having 256 entries.
|
| 997 | */
|
| 998 | static Qtype q_table[2][256];
|
| 999 |
|
| 1000 |
|
| 1001 | /*
|
| 1002 | * Now the function that converts a single t-table into a q-table.
|
| 1003 | *
|
| 1004 | * Arguments:
|
| 1005 | * t[4][16] : four 4->4bit lookup tables that define the q-box
|
| 1006 | * q[256] : output parameter: the resulting q-box as a lookup table.
|
| 1007 | */
|
| 1008 | static void make_q_table( Twofish_Byte t[4][16], Qtype q[256] )
|
| 1009 | {
|
| 1010 | int ae,be,ao,bo; /* Some temporaries. */
|
| 1011 | int i;
|
| 1012 | /* Loop over all input values and compute the q-box result. */
|
| 1013 | for( i=0; i<256; i++ ) {
|
| 1014 | /*
|
| 1015 | * This is straight from the Twofish specifications.
|
| 1016 | *
|
| 1017 | * The ae variable is used for the a_i values from the specs
|
| 1018 | * with even i, and ao for the odd i's. Similarly for the b's.
|
| 1019 | */
|
| 1020 | ae = i>>4; be = i&0xf;
|
| 1021 | ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8);
|
| 1022 | ae = t[0][ao]; be = t[1][bo];
|
| 1023 | ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8);
|
| 1024 | ae = t[2][ao]; be = t[3][bo];
|
| 1025 |
|
| 1026 | /* Store the result in the q-box table, the cast avoids a warning. */
|
| 1027 | q[i] = (Qtype) ((be<<4) | ae);
|
| 1028 | }
|
| 1029 | }
|
| 1030 |
|
| 1031 |
|
| 1032 | /*
|
| 1033 | * Initialise both q-box tables.
|
| 1034 | */
|
| 1035 | static void initialise_q_boxes() {
|
| 1036 | /* Initialise each of the q-boxes using the t-tables */
|
| 1037 | make_q_table( t_table[0], q_table[0] );
|
| 1038 | make_q_table( t_table[1], q_table[1] );
|
| 1039 | }
|
| 1040 |
|
| 1041 |
|
| 1042 | /*
|
| 1043 | * Next up is the MDS matrix multiplication.
|
| 1044 | * The MDS matrix multiplication operates in the field
|
| 1045 | * GF(2)[x]/p(x) with p(x)=x^8+x^6+x^5+x^3+1.
|
| 1046 | * If you don't understand this, read a book on finite fields. You cannot
|
| 1047 | * follow the finite-field computations without some background.
|
| 1048 | *
|
| 1049 | * In this field, multiplication by x is easy: shift left one bit
|
| 1050 | * and if bit 8 is set then xor the result with 0x169.
|
| 1051 | *
|
| 1052 | * The MDS coefficients use a multiplication by 1/x,
|
| 1053 | * or rather a division by x. This is easy too: first make the
|
| 1054 | * value 'even' (i.e. bit 0 is zero) by xorring with 0x169 if necessary,
|
| 1055 | * and then shift right one position.
|
| 1056 | * Even easier: shift right and xor with 0xb4 if the lsbit was set.
|
| 1057 | *
|
| 1058 | * The MDS coefficients are 1, EF, and 5B, and we use the fact that
|
| 1059 | * EF = 1 + 1/x + 1/x^2
|
| 1060 | * 5B = 1 + 1/x^2
|
| 1061 | * in this field. This makes multiplication by EF and 5B relatively easy.
|
| 1062 | *
|
| 1063 | * This property is no accident, the MDS matrix was designed to allow
|
| 1064 | * this implementation technique to be used.
|
| 1065 | *
|
| 1066 | * We have four MDS tables, each mapping 8 bits to 32 bits.
|
| 1067 | * Each table performs one column of the matrix multiplication.
|
| 1068 | * As the MDS is always preceded by q-boxes, each of these tables
|
| 1069 | * also implements the q-box just previous to that column.
|
| 1070 | */
|
| 1071 |
|
| 1072 | /* The actual MDS tables. */
|
| 1073 | static Twofish_UInt32 MDS_table[4][256];
|
| 1074 |
|
| 1075 | /* A small table to get easy conditional access to the 0xb4 constant. */
|
| 1076 | static Twofish_UInt32 mds_poly_divx_const[] = {0,0xb4};
|
| 1077 |
|
| 1078 | /* Function to initialise the MDS tables. */
|
| 1079 | static void initialise_mds_tables()
|
| 1080 | {
|
| 1081 | int i;
|
| 1082 | Twofish_UInt32 q,qef,q5b; /* Temporary variables. */
|
| 1083 |
|
| 1084 | /* Loop over all 8-bit input values */
|
| 1085 | for( i=0; i<256; i++ )
|
| 1086 | {
|
| 1087 | /*
|
| 1088 | * To save some work during the key expansion we include the last
|
| 1089 | * of the q-box layers from the h() function in these MDS tables.
|
| 1090 | */
|
| 1091 |
|
| 1092 | /* We first do the inputs that are mapped through the q0 table. */
|
| 1093 | q = q_table[0][i];
|
| 1094 | /*
|
| 1095 | * Here we divide by x, note the table to get 0xb4 only if the
|
| 1096 | * lsbit is set.
|
| 1097 | * This sets qef = (1/x)*q in the finite field
|
| 1098 | */
|
| 1099 | qef = (q >> 1) ^ mds_poly_divx_const[ q & 1 ];
|
| 1100 | /*
|
| 1101 | * Divide by x again, and add q to get (1+1/x^2)*q.
|
| 1102 | * Note that (1+1/x^2) = 5B in the field, and addition in the field
|
| 1103 | * is exclusive or on the bits.
|
| 1104 | */
|
| 1105 | q5b = (qef >> 1) ^ mds_poly_divx_const[ qef & 1 ] ^ q;
|
| 1106 | /*
|
| 1107 | * Add q5b to qef to set qef = (1+1/x+1/x^2)*q.
|
| 1108 | * Again, (1+1/x+1/x^2) = EF in the field.
|
| 1109 | */
|
| 1110 | qef ^= q5b;
|
| 1111 |
|
| 1112 | /*
|
| 1113 | * Now that we have q5b = 5B * q and qef = EF * q
|
| 1114 | * we can fill two of the entries in the MDS matrix table.
|
| 1115 | * See the Twofish specifications for the order of the constants.
|
| 1116 | */
|
| 1117 | MDS_table[1][i] = (q <<24) | (q5b<<16) | (qef<<8) | qef;
|
| 1118 | MDS_table[3][i] = (q5b<<24) | (qef<<16) | (q <<8) | q5b;
|
| 1119 |
|
| 1120 | /* Now we do it all again for the two columns that have a q1 box. */
|
| 1121 | q = q_table[1][i];
|
| 1122 | qef = (q >> 1) ^ mds_poly_divx_const[ q & 1 ];
|
| 1123 | q5b = (qef >> 1) ^ mds_poly_divx_const[ qef & 1 ] ^ q;
|
| 1124 | qef ^= q5b;
|
| 1125 |
|
| 1126 | /* The other two columns use the coefficient in a different order. */
|
| 1127 | MDS_table[0][i] = (qef<<24) | (qef<<16) | (q5b<<8) | q ;
|
| 1128 | MDS_table[2][i] = (qef<<24) | (q <<16) | (qef<<8) | q5b;
|
| 1129 | }
|
| 1130 | }
|
| 1131 |
|
| 1132 |
|
| 1133 | /*
|
| 1134 | * The h() function is the heart of the Twofish cipher.
|
| 1135 | * It is a complicated sequence of q-box lookups, key material xors,
|
| 1136 | * and finally the MDS matrix.
|
| 1137 | * We use lots of macros to make this reasonably fast.
|
| 1138 | */
|
| 1139 |
|
| 1140 | /* First a shorthand for the two q-tables */
|
| 1141 | #define q0 q_table[0]
|
| 1142 | #define q1 q_table[1]
|
| 1143 |
|
| 1144 | /*
|
| 1145 | * Each macro computes one column of the h for either 2, 3, or 4 stages.
|
| 1146 | * As there are 4 columns, we have 12 macros in all.
|
| 1147 | *
|
| 1148 | * The key bytes are stored in the Byte array L at offset
|
| 1149 | * 0,1,2,3, 8,9,10,11, [16,17,18,19, [24,25,26,27]] as this is the
|
| 1150 | * order we get the bytes from the user. If you look at the Twofish
|
| 1151 | * specs, you'll see that h() is applied to the even key words or the
|
| 1152 | * odd key words. The bytes of the even words appear in this spacing,
|
| 1153 | * and those of the odd key words too.
|
| 1154 | *
|
| 1155 | * These macros are the only place where the q-boxes and the MDS table
|
| 1156 | * are used.
|
| 1157 | */
|
| 1158 | #define H02( y, L ) MDS_table[0][q0[q0[y]^L[ 8]]^L[0]]
|
| 1159 | #define H12( y, L ) MDS_table[1][q0[q1[y]^L[ 9]]^L[1]]
|
| 1160 | #define H22( y, L ) MDS_table[2][q1[q0[y]^L[10]]^L[2]]
|
| 1161 | #define H32( y, L ) MDS_table[3][q1[q1[y]^L[11]]^L[3]]
|
| 1162 | #define H03( y, L ) H02( q1[y]^L[16], L )
|
| 1163 | #define H13( y, L ) H12( q1[y]^L[17], L )
|
| 1164 | #define H23( y, L ) H22( q0[y]^L[18], L )
|
| 1165 | #define H33( y, L ) H32( q0[y]^L[19], L )
|
| 1166 | #define H04( y, L ) H03( q1[y]^L[24], L )
|
| 1167 | #define H14( y, L ) H13( q0[y]^L[25], L )
|
| 1168 | #define H24( y, L ) H23( q0[y]^L[26], L )
|
| 1169 | #define H34( y, L ) H33( q1[y]^L[27], L )
|
| 1170 |
|
| 1171 | /*
|
| 1172 | * Now we can define the h() function given an array of key bytes.
|
| 1173 | * This function is only used in the key schedule, and not to pre-compute
|
| 1174 | * the keyed S-boxes.
|
| 1175 | *
|
| 1176 | * In the key schedule, the input is always of the form k*(1+2^8+2^16+2^24)
|
| 1177 | * so we only provide k as an argument.
|
| 1178 | *
|
| 1179 | * Arguments:
|
| 1180 | * k input to the h() function.
|
| 1181 | * L pointer to array of key bytes at
|
| 1182 | * offsets 0,1,2,3, ... 8,9,10,11, [16,17,18,19, [24,25,26,27]]
|
| 1183 | * kCycles # key cycles, 2, 3, or 4.
|
| 1184 | */
|
| 1185 | static Twofish_UInt32 h( int k, Twofish_Byte L[], int kCycles )
|
| 1186 | {
|
| 1187 | switch( kCycles ) {
|
| 1188 | /* We code all 3 cases separately for speed reasons. */
|
| 1189 | case 2:
|
| 1190 | return H02(k,L) ^ H12(k,L) ^ H22(k,L) ^ H32(k,L);
|
| 1191 | case 3:
|
| 1192 | return H03(k,L) ^ H13(k,L) ^ H23(k,L) ^ H33(k,L);
|
| 1193 | case 4:
|
| 1194 | return H04(k,L) ^ H14(k,L) ^ H24(k,L) ^ H34(k,L);
|
| 1195 | default:
|
| 1196 | /* This is always a coding error, which is fatal. */
|
| 1197 | Twofish_fatal( "Twofish h(): Illegal argument", ERR_ILL_ARG );
|
| 1198 | return ERR_ILL_ARG;
|
| 1199 | }
|
| 1200 | }
|
| 1201 |
|
| 1202 |
|
| 1203 | /*
|
| 1204 | * Pre-compute the keyed S-boxes.
|
| 1205 | * Fill the pre-computed S-box array in the expanded key structure.
|
| 1206 | * Each pre-computed S-box maps 8 bits to 32 bits.
|
| 1207 | *
|
| 1208 | * The S argument contains half the number of bytes of the full key, but is
|
| 1209 | * derived from the full key. (See Twofish specifications for details.)
|
| 1210 | * S has the weird byte input order used by the Hxx macros.
|
| 1211 | *
|
| 1212 | * This function takes most of the time of a key expansion.
|
| 1213 | *
|
| 1214 | * Arguments:
|
| 1215 | * S pointer to array of 8*kCycles Bytes containing the S vector.
|
| 1216 | * kCycles number of key words, must be in the set {2,3,4}
|
| 1217 | * xkey pointer to Twofish_key structure that will contain the S-boxes.
|
| 1218 | */
|
| 1219 | static int fill_keyed_sboxes( Twofish_Byte S[], int kCycles, Twofish_key * xkey )
|
| 1220 | {
|
| 1221 | int i;
|
| 1222 | switch( kCycles ) {
|
| 1223 | /* We code all 3 cases separately for speed reasons. */
|
| 1224 | case 2:
|
| 1225 | for( i=0; i<256; i++ )
|
| 1226 | {
|
| 1227 | xkey->s[0][i]= H02( i, S );
|
| 1228 | xkey->s[1][i]= H12( i, S );
|
| 1229 | xkey->s[2][i]= H22( i, S );
|
| 1230 | xkey->s[3][i]= H32( i, S );
|
| 1231 | }
|
| 1232 | break;
|
| 1233 | case 3:
|
| 1234 | for( i=0; i<256; i++ )
|
| 1235 | {
|
| 1236 | xkey->s[0][i]= H03( i, S );
|
| 1237 | xkey->s[1][i]= H13( i, S );
|
| 1238 | xkey->s[2][i]= H23( i, S );
|
| 1239 | xkey->s[3][i]= H33( i, S );
|
| 1240 | }
|
| 1241 | break;
|
| 1242 | case 4:
|
| 1243 | for( i=0; i<256; i++ )
|
| 1244 | {
|
| 1245 | xkey->s[0][i]= H04( i, S );
|
| 1246 | xkey->s[1][i]= H14( i, S );
|
| 1247 | xkey->s[2][i]= H24( i, S );
|
| 1248 | xkey->s[3][i]= H34( i, S );
|
| 1249 | }
|
| 1250 | break;
|
| 1251 | default:
|
| 1252 | /* This is always a coding error, which is fatal. */
|
| 1253 | Twofish_fatal( "Twofish fill_keyed_sboxes(): Illegal argument", ERR_ILL_ARG );
|
| 1254 | }
|
| 1255 | return SUCCESS;
|
| 1256 | }
|
| 1257 |
|
| 1258 |
|
| 1259 | /* A flag to keep track of whether we have been initialised or not. */
|
| 1260 | static int Twofish_initialised = 0;
|
| 1261 |
|
| 1262 | /*
|
| 1263 | * Initialise the Twofish implementation.
|
| 1264 | * This function must be called before any other function in the
|
| 1265 | * Twofish implementation is called.
|
| 1266 | * This routine also does some sanity checks, to make sure that
|
| 1267 | * all the macros behave, and it tests the whole cipher.
|
| 1268 | */
|
| 1269 | int Twofish_initialise()
|
| 1270 | {
|
| 1271 | int ret;
|
| 1272 | /* First test the various platform-specific definitions. */
|
| 1273 | if ((ret = test_platform()) < 0)
|
| 1274 | return ret;
|
| 1275 |
|
| 1276 | /* We can now generate our tables, in the right order of course. */
|
| 1277 | initialise_q_boxes();
|
| 1278 | initialise_mds_tables();
|
| 1279 |
|
| 1280 | /* We're finished with the initialisation itself. */
|
| 1281 | Twofish_initialised = 1;
|
| 1282 |
|
| 1283 | /*
|
| 1284 | * And run some tests on the whole cipher.
|
| 1285 | * Yes, you need to do this every time you start your program.
|
| 1286 | * It is called assurance; you have to be certain that your program
|
| 1287 | * still works properly.
|
| 1288 | */
|
| 1289 | return self_test();
|
| 1290 | }
|
| 1291 |
|
| 1292 |
|
| 1293 | /*
|
| 1294 | * The Twofish key schedule uses an Reed-Solomon code matrix multiply.
|
| 1295 | * Just like the MDS matrix, the RS-matrix is designed to be easy
|
| 1296 | * to implement. Details are below in the code.
|
| 1297 | *
|
| 1298 | * These constants make it easy to compute in the finite field used
|
| 1299 | * for the RS code.
|
| 1300 | *
|
| 1301 | * We use Bytes for the RS computation, but these are automatically
|
| 1302 | * widened to unsigned integers in the expressions. Having unsigned
|
| 1303 | * ints in these tables therefore provides the fastest access.
|
| 1304 | */
|
| 1305 | static unsigned int rs_poly_const[] = {0, 0x14d};
|
| 1306 | static unsigned int rs_poly_div_const[] = {0, 0xa6 };
|
| 1307 |
|
| 1308 |
|
| 1309 | /*
|
| 1310 | * Prepare a key for use in encryption and decryption.
|
| 1311 | * Like most block ciphers, Twofish allows the key schedule
|
| 1312 | * to be pre-computed given only the key.
|
| 1313 | * Twofish has a fairly 'heavy' key schedule that takes a lot of time
|
| 1314 | * to compute. The main work is pre-computing the S-boxes used in the
|
| 1315 | * encryption and decryption. We feel that this makes the cipher much
|
| 1316 | * harder to attack. The attacker doesn't even know what the S-boxes
|
| 1317 | * contain without including the entire key schedule in the analysis.
|
| 1318 | *
|
| 1319 | * Unlike most Twofish implementations, this one allows any key size from
|
| 1320 | * 0 to 32 bytes. Odd key sizes are defined for Twofish (see the
|
| 1321 | * specifications); the key is simply padded with zeroes to the next real
|
| 1322 | * key size of 16, 24, or 32 bytes.
|
| 1323 | * Each odd-sized key is thus equivalent to a single normal-sized key.
|
| 1324 | *
|
| 1325 | * Arguments:
|
| 1326 | * key array of key bytes
|
| 1327 | * key_len number of bytes in the key, must be in the range 0,...,32.
|
| 1328 | * xkey Pointer to an Twofish_key structure that will be filled
|
| 1329 | * with the internal form of the cipher key.
|
| 1330 | */
|
| 1331 | int Twofish_prepare_key( Twofish_Byte key[], int key_len, Twofish_key * xkey )
|
| 1332 | {
|
| 1333 | /* We use a single array to store all key material in,
|
| 1334 | * to simplify the wiping of the key material at the end.
|
| 1335 | * The first 32 bytes contain the actual (padded) cipher key.
|
| 1336 | * The next 32 bytes contain the S-vector in its weird format,
|
| 1337 | * and we have 4 bytes of overrun necessary for the RS-reduction.
|
| 1338 | */
|
| 1339 | Twofish_Byte K[32+32+4];
|
| 1340 |
|
| 1341 | int kCycles; /* # key cycles, 2,3, or 4. */
|
| 1342 |
|
| 1343 | int i;
|
| 1344 | Twofish_UInt32 A, B; /* Used to compute the round keys. */
|
| 1345 |
|
| 1346 | Twofish_Byte * kptr; /* Three pointers for the RS computation. */
|
| 1347 | Twofish_Byte * sptr;
|
| 1348 | Twofish_Byte * t;
|
| 1349 |
|
| 1350 | Twofish_Byte b,bx,bxx; /* Some more temporaries for the RS computation. */
|
| 1351 |
|
| 1352 | /* Check that the Twofish implementation was initialised. */
|
| 1353 | if( Twofish_initialised == 0 )
|
| 1354 | {
|
| 1355 | /*
|
| 1356 | * You didn't call Twofish_initialise before calling this routine.
|
| 1357 | * This is a programming error, and therefore we call the fatal
|
| 1358 | * routine.
|
| 1359 | *
|
| 1360 | * I could of course call the initialisation routine here,
|
| 1361 | * but there are a few reasons why I don't. First of all, the
|
| 1362 | * self-tests have to be done at startup. It is no good to inform
|
| 1363 | * the user that the cipher implementation fails when he wants to
|
| 1364 | * write his data to disk in encrypted form. You have to warn him
|
| 1365 | * before he spends time typing his data. Second, the initialisation
|
| 1366 | * and self test are much slower than a single key expansion.
|
| 1367 | * Calling the initialisation here makes the performance of the
|
| 1368 | * cipher unpredictable. This can lead to really weird problems
|
| 1369 | * if you use the cipher for a real-time task. Suddenly it fails
|
| 1370 | * once in a while the first time you try to use it. Things like
|
| 1371 | * that are almost impossible to debug.
|
| 1372 | */
|
| 1373 | /* Twofish_fatal( "Twofish implementation was not initialised.", ERR_INIT ); */
|
| 1374 |
|
| 1375 | /*
|
| 1376 | * There is always a danger that the Twofish_fatal routine returns,
|
| 1377 | * in spite of the specifications that it should not.
|
| 1378 | * (A good programming rule: don't trust the rest of the code.)
|
| 1379 | * This would be disasterous. If the q-tables and MDS-tables have
|
| 1380 | * not been initialised, they are probably still filled with zeroes.
|
| 1381 | * Suppose the MDS-tables are all zero. The key expansion would then
|
| 1382 | * generate all-zero round keys, and all-zero s-boxes. The danger
|
| 1383 | * is that nobody would notice as the encry
|
| 1384 | * mangles the input, and the decryption still 'decrypts' it,
|
| 1385 | * but now in a completely key-independent manner.
|
| 1386 | * To stop such security disasters, we use blunt force.
|
| 1387 | * If your program hangs here: fix the fatal routine!
|
| 1388 | */
|
| 1389 | for(;;); /* Infinite loop, which beats being insecure. */
|
| 1390 | }
|
| 1391 |
|
| 1392 | /* Check for valid key length. */
|
| 1393 | if( key_len < 0 || key_len > 32 )
|
| 1394 | {
|
| 1395 | /*
|
| 1396 | * This can only happen if a programmer didn't read the limitations
|
| 1397 | * on the key size.
|
| 1398 | */
|
| 1399 | Twofish_fatal( "Twofish_prepare_key: illegal key length", ERR_KEY_LEN );
|
| 1400 | /*
|
| 1401 | * A return statement just in case the fatal macro returns.
|
| 1402 | * The rest of the code assumes that key_len is in range, and would
|
| 1403 | * buffer-overflow if it wasn't.
|
| 1404 | *
|
| 1405 | * Why do we still use a programming language that has problems like
|
| 1406 | * buffer overflows, when these problems were solved in 1960 with
|
| 1407 | * the development of Algol? Have we not leared anything?
|
| 1408 | */
|
| 1409 | return ERR_KEY_LEN;
|
| 1410 | }
|
| 1411 |
|
| 1412 | /* Pad the key with zeroes to the next suitable key length. */
|
| 1413 | memcpy( K, key, key_len );
|
| 1414 | memset( K+key_len, 0, sizeof(K)-key_len );
|
| 1415 |
|
| 1416 | /*
|
| 1417 | * Compute kCycles: the number of key cycles used in the cipher.
|
| 1418 | * 2 for 128-bit keys, 3 for 192-bit keys, and 4 for 256-bit keys.
|
| 1419 | */
|
| 1420 | kCycles = (key_len + 7) >> 3;
|
| 1421 | /* Handle the special case of very short keys: minimum 2 cycles. */
|
| 1422 | if( kCycles < 2 )
|
| 1423 | {
|
| 1424 | kCycles = 2;
|
| 1425 | }
|
| 1426 |
|
| 1427 | /*
|
| 1428 | * From now on we just pretend to have 8*kCycles bytes of
|
| 1429 | * key material in K. This handles all the key size cases.
|
| 1430 | */
|
| 1431 |
|
| 1432 | /*
|
| 1433 | * We first compute the 40 expanded key words,
|
| 1434 | * formulas straight from the Twofish specifications.
|
| 1435 | */
|
| 1436 | for( i=0; i<40; i+=2 )
|
| 1437 | {
|
| 1438 | /*
|
| 1439 | * Due to the byte spacing expected by the h() function
|
| 1440 | * we can pick the bytes directly from the key K.
|
| 1441 | * As we use bytes, we never have the little/big endian
|
| 1442 | * problem.
|
| 1443 | *
|
| 1444 | * Note that we apply the rotation function only to simple
|
| 1445 | * variables, as the rotation macro might evaluate its argument
|
| 1446 | * more than once.
|
| 1447 | */
|
| 1448 | A = h( i , K , kCycles );
|
| 1449 | B = h( i+1, K+4, kCycles );
|
| 1450 | B = ROL32( B, 8 );
|
| 1451 |
|
| 1452 | /* Compute and store the round keys. */
|
| 1453 | A += B;
|
| 1454 | B += A;
|
| 1455 | xkey->K[i] = A;
|
| 1456 | xkey->K[i+1] = ROL32( B, 9 );
|
| 1457 | }
|
| 1458 |
|
| 1459 | /* Wipe variables that contained key material. */
|
| 1460 | A=B=0;
|
| 1461 |
|
| 1462 | /*
|
| 1463 | * And now the dreaded RS multiplication that few seem to understand.
|
| 1464 | * The RS matrix is not random, and is specially designed to compute the
|
| 1465 | * RS matrix multiplication in a simple way.
|
| 1466 | *
|
| 1467 | * We work in the field GF(2)[x]/x^8+x^6+x^3+x^2+1. Note that this is a
|
| 1468 | * different field than used for the MDS matrix.
|
| 1469 | * (At least, it is a different representation because all GF(2^8)
|
| 1470 | * representations are equivalent in some form.)
|
| 1471 | *
|
| 1472 | * We take 8 consecutive bytes of the key and interpret them as
|
| 1473 | * a polynomial k_0 + k_1 y + k_2 y^2 + ... + k_7 y^7 where
|
| 1474 | * the k_i bytes are the key bytes and are elements of the finite field.
|
| 1475 | * We multiply this polynomial by y^4 and reduce it modulo
|
| 1476 | * y^4 + (x + 1/x)y^3 + (x)y^2 + (x + 1/x)y + 1.
|
| 1477 | * using straightforward polynomial modulo reduction.
|
| 1478 | * The coefficients of the result are the result of the RS
|
| 1479 | * matrix multiplication. When we wrote the Twofish specification,
|
| 1480 | * the original RS definition used the polynomials,
|
| 1481 | * but that requires much more mathematical knowledge.
|
| 1482 | * We were already using matrix multiplication in a finite field for
|
| 1483 | * the MDS matrix, so I re-wrote the RS operation as a matrix
|
| 1484 | * multiplication to reduce the difficulty of understanding it.
|
| 1485 | * Some implementors have not picked up on this simpler method of
|
| 1486 | * computing the RS operation, even though it is mentioned in the
|
| 1487 | * specifications.
|
| 1488 | *
|
| 1489 | * It is possible to perform these computations faster by using 32-bit
|
| 1490 | * word operations, but that is not portable and this is not a speed-
|
| 1491 | * critical area.
|
| 1492 | *
|
| 1493 | * We explained the 1/x computation when we did the MDS matrix.
|
| 1494 | *
|
| 1495 | * The S vector is stored in K[32..64].
|
| 1496 | * The S vector has to be reversed, so we loop cross-wise.
|
| 1497 | *
|
| 1498 | * Note the weird byte spacing of the S-vector, to match the even
|
| 1499 | * or odd key words arrays. See the discussion at the Hxx macros for
|
| 1500 | * details.
|
| 1501 | */
|
| 1502 | kptr = K + 8*kCycles; /* Start at end of key */
|
| 1503 | sptr = K + 32; /* Start at start of S */
|
| 1504 |
|
| 1505 | /* Loop over all key material */
|
| 1506 | while( kptr > K )
|
| 1507 | {
|
| 1508 | kptr -= 8;
|
| 1509 | /*
|
| 1510 | * Initialise the polynimial in sptr[0..12]
|
| 1511 | * The first four coefficients are 0 as we have to multiply by y^4.
|
| 1512 | * The next 8 coefficients are from the key material.
|
| 1513 | */
|
| 1514 | memset( sptr, 0, 4 );
|
| 1515 | memcpy( sptr+4, kptr, 8 );
|
| 1516 |
|
| 1517 | /*
|
| 1518 | * The 12 bytes starting at sptr are now the coefficients of
|
| 1519 | * the polynomial we need to reduce.
|
| 1520 | */
|
| 1521 |
|
| 1522 | /* Loop over the polynomial coefficients from high to low */
|
| 1523 | t = sptr+11;
|
| 1524 | /* Keep looping until polynomial is degree 3; */
|
| 1525 | while( t > sptr+3 )
|
| 1526 | {
|
| 1527 | /* Pick up the highest coefficient of the poly. */
|
| 1528 | b = *t;
|
| 1529 |
|
| 1530 | /*
|
| 1531 | * Compute x and (x+1/x) times this coefficient.
|
| 1532 | * See the MDS matrix implementation for a discussion of
|
| 1533 | * multiplication by x and 1/x. We just use different
|
| 1534 | * constants here as we are in a
|
| 1535 | * different finite field representation.
|
| 1536 | *
|
| 1537 | * These two statements set
|
| 1538 | * bx = (x) * b
|
| 1539 | * bxx= (x + 1/x) * b
|
| 1540 | */
|
| 1541 | bx = (Twofish_Byte)((b<<1) ^ rs_poly_const[ b>>7 ]);
|
| 1542 | bxx= (Twofish_Byte)((b>>1) ^ rs_poly_div_const[ b&1 ] ^ bx);
|
| 1543 |
|
| 1544 | /*
|
| 1545 | * Subtract suitable multiple of
|
| 1546 | * y^4 + (x + 1/x)y^3 + (x)y^2 + (x + 1/x)y + 1
|
| 1547 | * from the polynomial, except that we don't bother
|
| 1548 | * updating t[0] as it will become zero anyway.
|
| 1549 | */
|
| 1550 | t[-1] ^= bxx;
|
| 1551 | t[-2] ^= bx;
|
| 1552 | t[-3] ^= bxx;
|
| 1553 | t[-4] ^= b;
|
| 1554 |
|
| 1555 | /* Go to the next coefficient. */
|
| 1556 | t--;
|
| 1557 | }
|
| 1558 |
|
| 1559 | /* Go to next S-vector word, obeying the weird spacing rules. */
|
| 1560 | sptr += 8;
|
| 1561 | }
|
| 1562 |
|
| 1563 | /* Wipe variables that contained key material. */
|
| 1564 | b = bx = bxx = 0;
|
| 1565 |
|
| 1566 | /* And finally, we can compute the key-dependent S-boxes. */
|
| 1567 | fill_keyed_sboxes( &K[32], kCycles, xkey );
|
| 1568 |
|
| 1569 | /* Wipe array that contained key material. */
|
| 1570 | memset( K, 0, sizeof( K ) );
|
| 1571 | return SUCCESS;
|
| 1572 | }
|
| 1573 |
|
| 1574 |
|
| 1575 | /*
|
| 1576 | * We can now start on the actual encryption and decryption code.
|
| 1577 | * As these are often speed-critical we will use a lot of macros.
|
| 1578 | */
|
| 1579 |
|
| 1580 | /*
|
| 1581 | * The g() function is the heart of the round function.
|
| 1582 | * We have two versions of the g() function, one without an input
|
| 1583 | * rotation and one with.
|
| 1584 | * The pre-computed S-boxes make this pretty simple.
|
| 1585 | */
|
| 1586 | #define g0(X,xkey) \
|
| 1587 | (xkey->s[0][b0(X)]^xkey->s[1][b1(X)]^xkey->s[2][b2(X)]^xkey->s[3][b3(X)])
|
| 1588 |
|
| 1589 | #define g1(X,xkey) \
|
| 1590 | (xkey->s[0][b3(X)]^xkey->s[1][b0(X)]^xkey->s[2][b1(X)]^xkey->s[3][b2(X)])
|
| 1591 |
|
| 1592 | /*
|
| 1593 | * A single round of Twofish. The A,B,C,D are the four state variables,
|
| 1594 | * T0 and T1 are temporaries, xkey is the expanded key, and r the
|
| 1595 | * round number.
|
| 1596 | *
|
| 1597 | * Note that this macro does not implement the swap at the end of the round.
|
| 1598 | */
|
| 1599 | #define ENCRYPT_RND( A,B,C,D, T0, T1, xkey, r ) \
|
| 1600 | T0 = g0(A,xkey); T1 = g1(B,xkey);\
|
| 1601 | C ^= T0+T1+xkey->K[8+2*(r)]; C = ROR32(C,1);\
|
| 1602 | D = ROL32(D,1); D ^= T0+2*T1+xkey->K[8+2*(r)+1]
|
| 1603 |
|
| 1604 | /*
|
| 1605 | * Encrypt a single cycle, consisting of two rounds.
|
| 1606 | * This avoids the swapping of the two halves.
|
| 1607 | * Parameter r is now the cycle number.
|
| 1608 | */
|
| 1609 | #define ENCRYPT_CYCLE( A, B, C, D, T0, T1, xkey, r ) \
|
| 1610 | ENCRYPT_RND( A,B,C,D,T0,T1,xkey,2*(r) );\
|
| 1611 | ENCRYPT_RND( C,D,A,B,T0,T1,xkey,2*(r)+1 )
|
| 1612 |
|
| 1613 | /* Full 16-round encryption */
|
| 1614 | #define ENCRYPT( A,B,C,D,T0,T1,xkey ) \
|
| 1615 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 0 );\
|
| 1616 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 1 );\
|
| 1617 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 2 );\
|
| 1618 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 3 );\
|
| 1619 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 4 );\
|
| 1620 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 5 );\
|
| 1621 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 6 );\
|
| 1622 | ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 7 )
|
| 1623 |
|
| 1624 | /*
|
| 1625 | * A single round of Twofish for decryption. It differs from
|
| 1626 | * ENCRYTP_RND only because of the 1-bit rotations.
|
| 1627 | */
|
| 1628 | #define DECRYPT_RND( A,B,C,D, T0, T1, xkey, r ) \
|
| 1629 | T0 = g0(A,xkey); T1 = g1(B,xkey);\
|
| 1630 | C = ROL32(C,1); C ^= T0+T1+xkey->K[8+2*(r)];\
|
| 1631 | D ^= T0+2*T1+xkey->K[8+2*(r)+1]; D = ROR32(D,1)
|
| 1632 |
|
| 1633 | /*
|
| 1634 | * Decrypt a single cycle, consisting of two rounds.
|
| 1635 | * This avoids the swapping of the two halves.
|
| 1636 | * Parameter r is now the cycle number.
|
| 1637 | */
|
| 1638 | #define DECRYPT_CYCLE( A, B, C, D, T0, T1, xkey, r ) \
|
| 1639 | DECRYPT_RND( A,B,C,D,T0,T1,xkey,2*(r)+1 );\
|
| 1640 | DECRYPT_RND( C,D,A,B,T0,T1,xkey,2*(r) )
|
| 1641 |
|
| 1642 | /* Full 16-round decryption. */
|
| 1643 | #define DECRYPT( A,B,C,D,T0,T1, xkey ) \
|
| 1644 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 7 );\
|
| 1645 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 6 );\
|
| 1646 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 5 );\
|
| 1647 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 4 );\
|
| 1648 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 3 );\
|
| 1649 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 2 );\
|
| 1650 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 1 );\
|
| 1651 | DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 0 )
|
| 1652 |
|
| 1653 | /*
|
| 1654 | * A macro to read the state from the plaintext and do the initial key xors.
|
| 1655 | * The koff argument allows us to use the same macro
|
| 1656 | * for the decryption which uses different key words at the start.
|
| 1657 | */
|
| 1658 | #define GET_INPUT( src, A,B,C,D, xkey, koff ) \
|
| 1659 | A = GET32(src )^xkey->K[ koff]; B = GET32(src+ 4)^xkey->K[1+koff]; \
|
| 1660 | C = GET32(src+ 8)^xkey->K[2+koff]; D = GET32(src+12)^xkey->K[3+koff]
|
| 1661 |
|
| 1662 | /*
|
| 1663 | * Similar macro to put the ciphertext in the output buffer.
|
| 1664 | * We xor the keys into the state variables before we use the PUT32
|
| 1665 | * macro as the macro might use its argument multiple times.
|
| 1666 | */
|
| 1667 | #define PUT_OUTPUT( A,B,C,D, dst, xkey, koff ) \
|
| 1668 | A ^= xkey->K[ koff]; B ^= xkey->K[1+koff]; \
|
| 1669 | C ^= xkey->K[2+koff]; D ^= xkey->K[3+koff]; \
|
| 1670 | PUT32( A, dst ); PUT32( B, dst+ 4 ); \
|
| 1671 | PUT32( C, dst+8 ); PUT32( D, dst+12 )
|
| 1672 |
|
| 1673 |
|
| 1674 | /*
|
| 1675 | * Twofish block encryption
|
| 1676 | *
|
| 1677 | * Arguments:
|
| 1678 | * xkey expanded key array
|
| 1679 | * p 16 bytes of plaintext
|
| 1680 | * c 16 bytes in which to store the ciphertext
|
| 1681 | */
|
| 1682 | void Twofish_encrypt( Twofish_key * xkey, Twofish_Byte p[16], Twofish_Byte c[16])
|
| 1683 | {
|
| 1684 | Twofish_UInt32 A,B,C,D,T0,T1; /* Working variables */
|
| 1685 |
|
| 1686 | /* Get the four plaintext words xorred with the key */
|
| 1687 | GET_INPUT( p, A,B,C,D, xkey, 0 );
|
| 1688 |
|
| 1689 | /* Do 8 cycles (= 16 rounds) */
|
| 1690 | ENCRYPT( A,B,C,D,T0,T1,xkey );
|
| 1691 |
|
| 1692 | /* Store them with the final swap and the output whitening. */
|
| 1693 | PUT_OUTPUT( C,D,A,B, c, xkey, 4 );
|
| 1694 | }
|
| 1695 |
|
| 1696 |
|
| 1697 | /*
|
| 1698 | * Twofish block decryption.
|
| 1699 | *
|
| 1700 | * Arguments:
|
| 1701 | * xkey expanded key array
|
| 1702 | * p 16 bytes of plaintext
|
| 1703 | * c 16 bytes in which to store the ciphertext
|
| 1704 | */
|
| 1705 | void Twofish_decrypt( Twofish_key * xkey, Twofish_Byte c[16], Twofish_Byte p[16])
|
| 1706 | {
|
| 1707 | Twofish_UInt32 A,B,C,D,T0,T1; /* Working variables */
|
| 1708 |
|
| 1709 | /* Get the four plaintext words xorred with the key */
|
| 1710 | GET_INPUT( c, A,B,C,D, xkey, 4 );
|
| 1711 |
|
| 1712 | /* Do 8 cycles (= 16 rounds) */
|
| 1713 | DECRYPT( A,B,C,D,T0,T1,xkey );
|
| 1714 |
|
| 1715 | /* Store them with the final swap and the output whitening. */
|
| 1716 | PUT_OUTPUT( C,D,A,B, p, xkey, 0 );
|
| 1717 | }
|
| 1718 |
|
| 1719 | /*
|
| 1720 | * Using the macros it is easy to make special routines for
|
| 1721 | * CBC mode, CTR mode etc. The only thing you might want to
|
| 1722 | * add is a XOR_PUT_OUTPUT which xors the outputs into the
|
| 1723 | * destinationa instead of overwriting the data. This requires
|
| 1724 | * a XOR_PUT32 macro as well, but that should all be trivial.
|
| 1725 | *
|
| 1726 | * I thought about including routines for the separate cipher
|
| 1727 | * modes here, but it is unclear which modes should be included,
|
| 1728 | * and each encryption or decryption routine takes up a lot of code space.
|
| 1729 | * Also, I don't have any test vectors for any cipher modes
|
| 1730 | * with Twofish.
|
| 1731 | */
|
| 1732 |
|
| 1733 |
|