Blame - jni/libzrtp/sources/cryptcommon/twofish.c - jami-client-android

blob: 03d27700328a0cf827da62e6f1e18a68af4047f6 [file] [log] [blame]

Alexandre Lision	7fd5d3d	2013-12-04 13:06:40 -0500	[diff] [blame]	1	/*
				2	* Fast, portable, and easy-to-use Twofish implementation,
				3	* Version 0.3.
				4	* Copyright (c) 2002 by Niels Ferguson.
				5	* (See further down for the almost-unrestricted licensing terms.)
				6	*
				7	* --------------------------------------------------------------------------
				8	* There are two files for this implementation:
				9	* - twofish.h, the header file.
				10	* - twofish.c, the code file.
				11	*
				12	* To incorporate this code into your program you should:
				13	* - Check the licensing terms further down in this comment.
				14	* - Fix the two type definitions in twofish.h to suit your platform.
				15	* - Fix a few definitions in twofish.c in the section marked
				16	* PLATFORM FIXES. There is one important ones that affects
				17	* functionality, and then a few definitions that you can optimise
				18	* for efficiency but those have no effect on the functionality.
				19	* Don't change anything else.
				20	* - Put the code in your project and compile it.
				21	*
				22	* To use this library you should:
				23	* - Call Twofish_initialise() in your program before any other function in
				24	* this library.
				25	* - Use Twofish_prepare_key(...) to convert a key to internal form.
				26	* - Use Twofish_encrypt(...) and Twofish_decrypt(...) to encrypt and decrypt
				27	* data.
				28	* See the comments in the header file for details on these functions.
				29	* --------------------------------------------------------------------------
				30	*
				31	* There are many Twofish implementation available for free on the web.
				32	* Most of them are hard to integrate into your own program.
				33	* As we like people to use our cipher, I thought I would make it easier.
				34	* Here is a free and easy-to-integrate Twofish implementation in C.
				35	* The latest version is always available from my personal home page at
				36	* http://niels.ferguson.net/
				37	*
				38	* Integrating library code into a project is difficult because the library
				39	* header files interfere with the project's header files and code.
				40	* And of course the project's header files interfere with the library code.
				41	* I've tried to resolve these problems here.
				42	* The header file of this implementation is very light-weight.
				43	* It contains two typedefs, a structure, and a few function declarations.
				44	* All names it defines start with "Twofish_".
				45	* The header file is therefore unlikely to cause problems in your project.
				46	* The code file of this implementation doesn't need to include the header
				47	* files of the project. There is thus no danger of the project interfering
				48	* with all the definitions and macros of the Twofish code.
				49	* In most situations, all you need to do is fill in a few platform-specific
				50	* definitions in the header file and code file,
				51	* and you should be able to run the Twofish code in your project.
				52	* I estimate it should take you less than an hour to integrate this code
				53	* into your project, most of it spent reading the comments telling you what
				54	* to do.
				55	*
				56	* For people using C++: it is very easy to wrap this library into a
				57	* TwofishKey class. One of the big advantages is that you can automate the
				58	* wiping of the key material in the destructor. I have not provided a C++
				59	* class because the interface depends too much on the abstract base class
				60	* you use for block ciphers in your program, which I don't know about.
				61	*
				62	* This implementation is designed for use on PC-class machines. It uses the
				63	* Twofish 'full' keying option which uses large tables. Total table size is
				64	* around 5-6 kB for static tables plus 4.5 kB for each pre-processed key.
				65	* If you need an implementation that uses less memory,
				66	* take a look at Brian Gladman's code on his web site:
				67	* http://fp.gladman.plus.com/cryptography_technology/aes/
				68	* He has code for all AES candidates.
				69	* His Twofish code has lots of options trading off table size vs. speed.
				70	* You can also take a look at the optimised code by Doug Whiting on the
				71	* Twofish web site
				72	* http://www.counterpane.com/twofish.html
				73	* which has loads of options.
				74	* I believe these existing implementations are harder to re-use because they
				75	* are not clean libraries and they impose requirements on the environment.
				76	* This implementation is very careful to minimise those,
				77	* and should be easier to integrate into any larger program.
				78	*
				79	* The default mode of this implementation is fully portable as it uses no
				80	* behaviour not defined in the C standard. (This is harder than you think.)
				81	* If you have any problems porting the default mode, please let me know
				82	* so that I can fix the problem. (But only if this code is at fault, I
				83	* don't fix compilers.)
				84	* Most of the platform fixes are related to non-portable but faster ways
				85	* of implementing certain functions.
				86	*
				87	* In general I've tried to make the code as fast as possible, at the expense
				88	* of memory and code size. However, C does impose limits, and this
				89	* implementation will be slower than an optimised assembler implementation.
				90	* But beware of assembler implementations: a good Pentium implementation
				91	* uses completely different code than a good Pentium II implementation.
				92	* You basically have to re-write the assembly code for every generation of
				93	* processor. Unless you are severely pressed for speed, stick with C.
				94	*
				95	* The initialisation routine of this implementation contains a self-test.
				96	* If initialisation succeeds without calling the fatal routine, then
				97	* the implementation works. I don't think you can break the implementation
				98	* in such a way that it still passes the tests, unless you are malicious.
				99	* In other words: if the initialisation routine returns,
				100	* you have successfully ported the implementation.
				101	* (Or not implemented the fatal routine properly, but that is your problem.)
				102	*
				103	* I'm indebted to many people who helped me in one way or another to write
				104	* this code. During the design of Twofish and the AES process I had very
				105	* extensive discussions of all implementation issues with various people.
				106	* Doug Whiting in particular provided a wealth of information. The Twofish
				107	* team spent untold hours discussion various cipher features, and their
				108	* implementation. Brian Gladman implemented all AES candidates in C,
				109	* and we had some fruitful discussions on how to implement Twofish in C.
				110	* Jan Nieuwenhuizen tested this code on Linux using GCC.
				111	*
				112	* Now for the license:
				113	* The author hereby grants a perpetual license to everybody to
				114	* use this code for any purpose as long as the copyright message is included
				115	* in the source code of this or any derived work.
				116	*
				117	* Yes, this means that you, your company, your club, and anyone else
				118	* can use this code anywhere you want. You can change it and distribute it
				119	* under the GPL, include it in your commercial product without releasing
				120	* the source code, put it on the web, etc.
				121	* The only thing you cannot do is remove my copyright message,
				122	* or distribute any source code based on this implementation that does not
				123	* include my copyright message.
				124	*
				125	* I appreciate a mention in the documentation or credits,
				126	* but I understand if that is difficult to do.
				127	* I also appreciate it if you tell me where and why you used my code.
				128	*
				129	* Please send any questions or comments to niels@ferguson.net
				130	*
				131	* Have Fun!
				132	*
				133	* Niels
				134	*/
				135
				136	/*
				137	* DISCLAIMER: As I'm giving away my work for free, I'm of course not going
				138	* to accept any liability of any form. This code, or the Twofish cipher,
				139	* might very well be flawed; you have been warned.
				140	* This software is provided as-is, without any kind of warrenty or
				141	* guarantee. And that is really all you can expect when you download
				142	* code for free from the Internet.
				143	*
				144	* I think it is really sad that disclaimers like this seem to be necessary.
				145	* If people only had a little bit more common sense, and didn't come
				146	* whining like little children every time something happens....
				147	*/
				148
				149	/*
				150	* Version history:
				151	* Version 0.0, 2002-08-30
				152	* First written.
				153	* Version 0.1, 2002-09-03
				154	* Added disclaimer. Improved self-tests.
				155	* Version 0.2, 2002-09-09
				156	* Removed last non-portabilities. Default now works completely within
				157	* the C standard. UInt32 can be larger than 32 bits without problems.
				158	* Version 0.3, 2002-09-28
				159	* Bugfix: use instead of to adhere to ANSI/ISO.
				160	* Rename BIG_ENDIAN macro to CPU_IS_BIG_ENDIAN. The gcc library
				161	* header already defines BIG_ENDIAN, even though it is not
				162	* supposed to.
				163	*/
				164
				165
				166	/*
				167	* Minimum set of include files.
				168	* You should not need any application-specific include files for this code.
				169	* In fact, adding you own header files could break one of the many macros or
				170	* functions in this file. Be very careful.
				171	* Standard include files will probably be ok.
				172	*/
				173	#include <stdio.h>
				174	#include <string.h>
				175	#include <stdlib.h>
				176	/* #include * for memset(), memcpy(), and memcmp() */
				177	#include "twofish.h"
				178
				179
				180	/*
				181	* PLATFORM FIXES
				182	* ==============
				183	*
				184	* Fix the type definitions in twofish.h first!
				185	*
				186	* The following definitions have to be fixed for each particular platform
				187	* you work on. If you have a multi-platform program, you no doubt have
				188	* portable definitions that you can substitute here without changing the
				189	* rest of the code.
				190	*/
				191
				192
				193	/*
				194	* Function called if something is fatally wrong with the implementation.
				195	* This fatal function is called when a coding error is detected in the
				196	* Twofish implementation, or when somebody passes an obviously erroneous
				197	* parameter to this implementation. There is not much you can do when
				198	* the code contains bugs, so we just stop.
				199	*
				200	* The argument is a string. Ideally the fatal function prints this string
				201	* as an error message. Whatever else this function does, it should never
				202	* return. A typical implementation would stop the program completely after
				203	* printing the error message.
				204	*
				205	* This default implementation is not very useful,
				206	* but does not assume anything about your environment.
				207	* It will at least let you know something is wrong....
				208	* I didn't want to include any libraries to print and error or so,
				209	* as this makes the code much harder to integrate in a project.
				210	*
				211	* Note that the Twofish_fatal function may not return to the caller.
				212	* Unfortunately this is not something the self-test can test for,
				213	* so you have to make sure of this yourself.
				214	*
				215	* If you want to call an external function, be careful about including
				216	* your own header files here. This code uses a lot of macros, and your
				217	* header file could easily break it. Maybe the best solution is to use
				218	* a separate extern statement for your fatal function.
				219	*/
				220	/* #define Twofish_fatal(pmsgx) { fprintf(stderr, pmsgx); exit(1); } */
				221	#define Twofish_fatal(pmsgx, code) { return(code); }
				222
				223
				224	/*
				225	* The rest of the settings are not important for the functionality
				226	* of this Twofish implementation. That is, their default settings
				227	* work on all platforms. You can change them to improve the
				228	* speed of the implementation on your platform. Erroneous settings
				229	* will result in erroneous implementations, but the self-test should
				230	* catch those.
				231	*/
				232
				233
				234	/*
				235	* Macros to rotate a Twofish_UInt32 value left or right by the
				236	* specified number of bits. This should be a 32-bit rotation,
				237	* and not rotation of, say, 64-bit values.
				238	*
				239	* Every encryption or decryption operation uses 32 of these rotations,
				240	* so it is a good idea to make these macros efficient.
				241	*
				242	* This fully portable definition has one piece of tricky stuff.
				243	* The UInt32 might be larger than 32 bits, so we have to mask
				244	* any higher bits off. The simplest way to do this is to 'and' the
				245	* value first with 0xffffffff and then shift it right. An optimising
				246	* compiler that has a 32-bit type can optimise this 'and' away.
				247	*
				248	* Unfortunately there is no portable way of writing the constant
				249	* 0xffffffff. You don't know which suffix to use (U, or UL?)
				250	* The UINT32_MASK definition uses a bit of trickery. Shift-left
				251	* is only defined if the shift amount is strictly less than the size
				252	* of the UInt32, so we can't use (1<<32). The answer it to take the value
				253	* 2, cast it to a UInt32, shift it left 31 positions, and subtract one.
				254	* Another example of how to make something very simple extremely difficult.
				255	* I hate C.
				256	*
				257	* The rotation macros are straightforward.
				258	* They are only applied to UInt32 values, which are _unsigned_
				259	* so the >> operator must do a logical shift that brings in zeroes.
				260	* On most platforms you will only need to optimise the ROL32 macro; the
				261	* ROR32 macro is not inefficient on an optimising compiler as all rotation
				262	* amounts in this code are known at compile time.
				263	*
				264	* On many platforms there is a faster solution.
				265	* For example, MS compilers have the __rotl and __rotr functions
				266	* that generate x86 rotation instructions.
				267	*/
				268	#define UINT32_MASK ( (((Twofish_UInt32)2)<<31) - 1 )
				269
				270	#ifndef _MSC_VER
				271	#define ROL32(x,n) ( (x)<<(n) \| ((x) & UINT32_MASK) >> (32-(n)) )
				272	#define ROR32(x,n) ( (x)>>(n) \| ((x) & UINT32_MASK) << (32-(n)) )
				273	#else
				274	#define ROL32(x,n) (_lrotl((x), (n)))
				275	#define ROR32(x,n) (_lrotr((x), (n)))
				276	#endif
				277
				278	/*
				279	* Select data type for q-table entries.
				280	*
				281	* Larger entry types cost more memory (1.5 kB), and might be faster
				282	* or slower depending on the CPU and compiler details.
				283	*
				284	* This choice only affects the static data size and the key setup speed.
				285	* Functionality, expanded key size, or encryption speed are not affected.
				286	* Define to 1 to get large q-table entries.
				287	*/
				288	#define LARGE_Q_TABLE 0 /* default = 0 */
				289
				290
				291	/*
				292	* Method to select a single byte from a UInt32.
				293	* WARNING: non-portable code if set; might not work on all platforms.
				294	*
				295	* Inside the inner loop of Twofish it is necessary to access the 4
				296	* individual bytes of a UInt32. This can be done using either shifts
				297	* and masks, or memory accesses.
				298	*
				299	* Set to 0 to use shift and mask operations for the byte selection.
				300	* This is more ALU intensive. It is also fully portable.
				301	*
				302	* Set to 1 to use memory accesses. The UInt32 is stored in memory and
				303	* the individual bytes are read from memory one at a time.
				304	* This solution is more memory-intensive, and not fully portable.
				305	* It might be faster on your platform, or not. If you use this option,
				306	* make sure you set the CPU_IS_BIG_ENDIAN flag appropriately.
				307	*
				308	* This macro does not affect the conversion of the inputs and outputs
				309	* of the cipher. See the CONVERT_USING_CASTS macro for that.
				310	*/
				311	#define SELECT_BYTE_FROM_UINT32_IN_MEMORY 0 /* default = 0 */
				312
				313
				314	/*
				315	* Method used to read the input and write the output.
				316	* WARNING: non-portable code if set; might not work on all platforms.
				317	*
				318	* Twofish operates on 32-bit words. The input to the cipher is
				319	* a byte array, as is the output. The portable method of doing the
				320	* conversion is a bunch of rotate and mask operations, but on many
				321	* platforms it can be done faster using a cast.
				322	* This only works if your CPU allows UInt32 accesses to arbitrary Byte
				323	* addresses.
				324	*
				325	* Set to 0 to use the shift and mask operations. This is fully
				326	* portable. .
				327	*
				328	* Set to 1 to use a cast. The Byte * is cast to a UInt32 *, and a
				329	* UInt32 is read. If necessary (as indicated by the CPU_IS_BIG_ENDIAN
				330	* macro) the byte order in the UInt32 is swapped. The reverse is done
				331	* to write the output of the encryption/decryption. Make sure you set
				332	* the CPU_IS_BIG_ENDIAN flag appropriately.
				333	* This option does not work unless a UInt32 is exactly 32 bits.
				334	*
				335	* This macro only changes the reading/writing of the plaintext/ciphertext.
				336	* See the SELECT_BYTE_FROM_UINT32_IN_MEMORY to affect the way in which
				337	* a UInt32 is split into 4 bytes for the S-box selection.
				338	*/
				339	#define CONVERT_USING_CASTS 0 /* default = 0 */
				340
				341
				342	/*
				343	* Endianness switch.
				344	* Only relevant if SELECT_BYTE_FROM_UINT32_IN_MEMORY or
				345	* CONVERT_USING_CASTS is set.
				346	*
				347	* Set to 1 on a big-endian machine, and to 0 on a little-endian machine.
				348	* Twofish uses the little-endian convention (least significant byte first)
				349	* and big-endian machines (using most significant byte first)
				350	* have to do a few conversions.
				351	*
				352	* CAUTION: This code has never been tested on a big-endian machine,
				353	* because I don't have access to one. Feedback appreciated.
				354	*/
				355	#define CPU_IS_BIG_ENDIAN 0
				356
				357
				358	/*
				359	* Macro to reverse the order of the bytes in a UInt32.
				360	* Used to convert to little-endian on big-endian machines.
				361	* This macro is always tested, but only used in the encryption and
				362	* decryption if CONVERT_USING_CASTS, and CPU_IS_BIG_ENDIAN
				363	* are both set. In other words: this macro is only speed-critical if
				364	* both these flags have been set.
				365	*
				366	* This default definition of SWAP works, but on many platforms there is a
				367	* more efficient implementation.
				368	*/
				369	#define BSWAP(x) ((ROL32((x),8)&0x00ff00ff) \| (ROR32((x),8) & 0xff00ff00))
				370
				371
				372	/*
				373	* END OF PLATFORM FIXES
				374	* =====================
				375	*
				376	* You should not have to touch the rest of this file.
				377	*/
				378
				379
				380	/*
				381	* Convert the external type names to some that are easier to use inside
				382	* this file. I didn't want to use the names Byte and UInt32 in the
				383	* header file, because many programs already define them and using two
				384	* conventions at once can be very difficult.
				385	* Don't change these definitions! Change the originals
				386	* in twofish.h instead.
				387	*/
				388	/* A Byte must be an unsigned integer, 8 bits long. */
				389	/* typedef Twofish_Byte Byte; */
				390	/* A UInt32 must be an unsigned integer at least 32 bits long. */
				391	/* typedef Twofish_UInt32 UInt32; */
				392
				393
				394	/*
				395	* Define a macro ENDIAN_CONVERT.
				396	*
				397	* We define a macro ENDIAN_CONVERT that performs a BSWAP on big-endian
				398	* machines, and is the identity function on little-endian machines.
				399	* The code then uses this macro without considering the endianness.
				400	*/
				401
				402	#if CPU_IS_BIG_ENDIAN
				403	#define ENDIAN_CONVERT(x) BSWAP(x)
				404	#else
				405	#define ENDIAN_CONVERT(x) (x)
				406	#endif
				407
				408
				409	/*
				410	* Compute byte offset within a UInt32 stored in memory.
				411	*
				412	* This is only used when SELECT_BYTE_FROM_UINT32_IN_MEMORY is set.
				413	*
				414	* The input is the byte number 0..3, 0 for least significant.
				415	* Note the use of sizeof() to support UInt32 types that are larger
				416	* than 4 bytes.
				417	*/
				418	#if CPU_IS_BIG_ENDIAN
				419	#define BYTE_OFFSET( n ) (sizeof(Twofish_UInt32) - 1 - (n) )
				420	#else
				421	#define BYTE_OFFSET( n ) (n)
				422	#endif
				423
				424
				425	/*
				426	* Macro to get Byte no. b from UInt32 value X.
				427	* We use two different definition, depending on the settings.
				428	*/
				429	#if SELECT_BYTE_FROM_UINT32_IN_MEMORY
				430	/* Pick the byte from the memory in which X is stored. */
				431	#define SELECT_BYTE( X, b ) (((Twofish_Byte *)(&(X)))[BYTE_OFFSET(b)])
				432	#else
				433	/* Portable solution: Pick the byte directly from the X value. */
				434	#define SELECT_BYTE( X, b ) (((X) >> (8*(b))) & 0xff)
				435	#endif
				436
				437
				438	/* Some shorthands because we use byte selection in large formulae. */
				439	#define b0(X) SELECT_BYTE((X),0)
				440	#define b1(X) SELECT_BYTE((X),1)
				441	#define b2(X) SELECT_BYTE((X),2)
				442	#define b3(X) SELECT_BYTE((X),3)
				443
				444
				445	/*
				446	* We need macros to load and store UInt32 from/to byte arrays
				447	* using the least-significant-byte-first convention.
				448	*
				449	* GET32( p ) gets a UInt32 in lsb-first form from four bytes pointed to
				450	* by p.
				451	* PUT32( v, p ) writes the UInt32 value v at address p in lsb-first form.
				452	*/
				453	#if CONVERT_USING_CASTS
				454
				455	/* Get UInt32 from four bytes pointed to by p. */
				456	#define GET32( p ) ENDIAN_CONVERT( ((Twofish_UInt32 )(p)) )
				457	/* Put UInt32 into four bytes pointed to by p */
				458	#define PUT32( v, p ) ((Twofish_UInt32 )(p)) = ENDIAN_CONVERT(v)
				459
				460	#else
				461
				462	/* Get UInt32 from four bytes pointed to by p. */
				463	#define GET32( p ) \
				464	( \
				465	(Twofish_UInt32)((p)[0]) \
				466	\| (Twofish_UInt32)((p)[1])<< 8 \
				467	\| (Twofish_UInt32)((p)[2])<<16 \
				468	\| (Twofish_UInt32)((p)[3])<<24 \
				469	)
				470	/* Put UInt32 into four bytes pointed to by p */
				471	#define PUT32( v, p ) \
				472	(p)[0] = (Twofish_Byte)(((v) ) & 0xff); \
				473	(p)[1] = (Twofish_Byte)(((v) >> 8) & 0xff); \
				474	(p)[2] = (Twofish_Byte)(((v) >> 16) & 0xff); \
				475	(p)[3] = (Twofish_Byte)(((v) >> 24) & 0xff)
				476
				477	#endif
				478
				479	#ifdef ANDROID
				480	/**
				481	* Dummy function to disable some compiler optimizations.
				482	*
				483	* See comment in Twofish_cfb128_encrypt().
				484	*/
				485	void Two_debugDummy(Twofish_Byte* in, Twofish_Byte* out, Twofish_Byte* ivec)
				486	{
				487	}
				488	#endif
				489	/*
				490	* Test the platform-specific macros.
				491	* This function tests the macros defined so far to make sure the
				492	* definitions are appropriate for this platform.
				493	* If you make any mistake in the platform configuration, this should detect
				494	* that and inform you what went wrong.
				495	* Somewhere, someday, this is going to save somebody a lot of time,
				496	* because misbehaving macros are hard to debug.
				497	*/
				498	static int test_platform()
				499	{
				500	/* Buffer with test values. */
				501	Twofish_Byte buf[] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xbc, 0xde, 0};
				502	Twofish_UInt32 C;
				503	Twofish_UInt32 x,y;
				504	int i;
				505
				506	/*
				507	* Some sanity checks on the types that can't be done in compile time.
				508	* A smart compiler will just optimise these tests away.
				509	* The pre-processor doesn't understand different types, so we cannot
				510	* do these checks in compile-time.
				511	*
				512	* I hate C.
				513	*
				514	* The first check in each case is to make sure the size is correct.
				515	* The second check is to ensure that it is an unsigned type.
				516	*/
				517	if( ((Twofish_UInt32)((Twofish_UInt32)1 << 31) == 0) \|\| ((Twofish_UInt32)-1 < 0 ))
				518	{
				519	Twofish_fatal( "Twofish code: Twofish_UInt32 type not suitable", ERR_UINT32 );
				520	}
				521	if( (sizeof( Twofish_Byte ) != 1) \|\| (((Twofish_Byte)-1) < 0) )
				522	{
				523	Twofish_fatal( "Twofish code: Twofish_Byte type not suitable", ERR_BYTE );
				524	}
				525
				526	/*
				527	* Sanity-check the endianness conversions.
				528	* This is just an aid to find problems. If you do the endianness
				529	* conversion macros wrong you will fail the full cipher test,
				530	* but that does not help you find the error.
				531	* Always make it easy to find the bugs!
				532	*
				533	* Detail: There is no fully portable way of writing UInt32 constants,
				534	* as you don't know whether to use the U or UL suffix. Using only U you
				535	* might only be allowed 16-bit constants. Using UL you might get 64-bit
				536	* constants which cannot be stored in a UInt32 without warnings, and
				537	* which generally behave subtly different from a true UInt32.
				538	* As long as we're just comparing with the constant,
				539	* we can always use the UL suffix and at worst lose some efficiency.
				540	* I use a separate '32-bit constant' macro in most of my other code.
				541	*
				542	* I hate C.
				543	*
				544	* Start with testing GET32. We test it on all positions modulo 4
				545	* to make sure we can handly any position of inputs. (Some CPUs
				546	* do not allow non-aligned accesses which we would do if you used
				547	* the CONVERT_USING_CASTS option.
				548	*/
				549	if( (GET32( buf ) != 0x78563412UL) \|\| (GET32(buf+1) != 0x9a785634UL)
				550	\|\| (GET32( buf+2 ) != 0xbc9a7856UL) \|\| (GET32(buf+3) != 0xdebc9a78UL) )
				551	{
				552	Twofish_fatal( "Twofish code: GET32 not implemented properly", ERR_GET32 );
				553	}
				554
				555	/*
				556	* We can now use GET32 to test PUT32.
				557	* We don't test the shifted versions. If GET32 can do that then
				558	* so should PUT32.
				559	*/
				560	C = GET32( buf );
				561	PUT32( 3*C, buf );
				562	if( GET32( buf ) != 0x69029c36UL )
				563	{
				564	Twofish_fatal( "Twofish code: PUT32 not implemented properly", ERR_PUT32 );
				565	}
				566
				567
				568	/* Test ROL and ROR */
				569	for( i=1; i<32; i++ )
				570	{
				571	/* Just a simple test. */
				572	x = ROR32( C, i );
				573	y = ROL32( C, i );
				574	x ^= (C>>i) ^ (C<<(32-i));
				575	/y ^= (C<>(32-i)); /
				576	y ^= (C<<i) ^ (C>>(32-i));
				577	x \|= y;
				578	/*
				579	* Now all we check is that x is zero in the least significant
				580	* 32 bits. Using the UL suffix is safe here, as it doesn't matter
				581	* if we get a larger type.
				582	*/
				583	if( (x & 0xffffffffUL) != 0 )
				584	{
				585	Twofish_fatal( "Twofish ROL or ROR not properly defined.", ERR_ROLR );
				586	}
				587	}
				588
				589	/* Test the BSWAP macro */
				590	if( BSWAP(C) != 0x12345678UL )
				591	{
				592	/*
				593	* The BSWAP macro should always work, even if you are not using it.
				594	* A smart optimising compiler will just remove this entire test.
				595	*/
				596	Twofish_fatal( "BSWAP not properly defined.", ERR_BSWAP );
				597	}
				598
				599	/* And we can test the b macros which use SELECT_BYTE. */
				600	if( (b0(C)!=0x12) \|\| (b1(C) != 0x34) \|\| (b2(C) != 0x56) \|\| (b3(C) != 0x78) )
				601	{
				602	/*
				603	* There are many reasons why this could fail.
				604	* Most likely is that CPU_IS_BIG_ENDIAN has the wrong value.
				605	*/
				606	Twofish_fatal( "Twofish code: SELECT_BYTE not implemented properly", ERR_SELECTB );
				607	}
				608	return SUCCESS;
				609	}
				610
				611
				612	/*
				613	* Finally, we can start on the Twofish-related code.
				614	* You really need the Twofish specifications to understand this code. The
				615	* best source is the Twofish book:
				616	* "The Twofish Encryption Algorithm", by Bruce Schneier, John Kelsey,
				617	* Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson.
				618	* you can also use the AES submission document of Twofish, which is
				619	* available from my list of publications on my personal web site at
				620	* http://niels.ferguson.net/.
				621	*
				622	* The first thing we do is write the testing routines. This is what the
				623	* implementation has to satisfy in the end. We only test the external
				624	* behaviour of the implementation of course.
				625	*/
				626
				627
				628	/*
				629	* Perform a single self test on a (plaintext,ciphertext,key) triple.
				630	* Arguments:
				631	* key array of key bytes
				632	* key_len length of key in bytes
				633	* p plaintext
				634	* c ciphertext
				635	*/
				636	static int test_vector( Twofish_Byte key[], int key_len, Twofish_Byte p[16], Twofish_Byte c[16] )
				637	{
				638	Twofish_Byte tmp[16]; /* scratch pad. */
				639	Twofish_key xkey; /* The expanded key */
				640	int i;
				641
				642
				643	/* Prepare the key */
				644	if ((i = Twofish_prepare_key( key, key_len, &xkey)) < 0)
				645	return i;
				646
				647	/*
				648	* We run the test twice to ensure that the xkey structure
				649	* is not damaged by the first encryption.
				650	* Those are hideous bugs to find if you get them in an application.
				651	*/
				652	for( i=0; i<2; i++ )
				653	{
				654	/* Encrypt and test */
				655	Twofish_encrypt( &xkey, p, tmp );
				656	if( memcmp( c, tmp, 16 ) != 0 )
				657	{
				658	Twofish_fatal( "Twofish encryption failure", ERR_TEST_ENC );
				659	}
				660
				661	/* Decrypt and test */
				662	Twofish_decrypt( &xkey, c, tmp );
				663	if( memcmp( p, tmp, 16 ) != 0 )
				664	{
				665	Twofish_fatal( "Twofish decryption failure", ERR_TEST_DEC );
				666	}
				667	}
				668
				669	/* The test keys are not secret, so we don't need to wipe xkey. */
				670	return SUCCESS;
				671	}
				672
				673
				674	/*
				675	* Check implementation using three (key,plaintext,ciphertext)
				676	* test vectors, one for each major key length.
				677	*
				678	* This is an absolutely minimal self-test.
				679	* This routine does not test odd-sized keys.
				680	*/
				681	static int test_vectors()
				682	{
				683	/*
				684	* We run three tests, one for each major key length.
				685	* These test vectors come from the Twofish specification.
				686	* One encryption and one decryption using randomish data and key
				687	* will detect almost any error, especially since we generate the
				688	* tables ourselves, so we don't have the problem of a single
				689	* damaged table entry in the source.
				690	*/
				691
				692	/* 128-bit test is the I=3 case of section B.2 of the Twofish book. */
				693	static Twofish_Byte k128[] = {
				694	0x9F, 0x58, 0x9F, 0x5C, 0xF6, 0x12, 0x2C, 0x32,
				695	0xB6, 0xBF, 0xEC, 0x2F, 0x2A, 0xE8, 0xC3, 0x5A,
				696	};
				697	static Twofish_Byte p128[] = {
				698	0xD4, 0x91, 0xDB, 0x16, 0xE7, 0xB1, 0xC3, 0x9E,
				699	0x86, 0xCB, 0x08, 0x6B, 0x78, 0x9F, 0x54, 0x19
				700	};
				701	static Twofish_Byte c128[] = {
				702	0x01, 0x9F, 0x98, 0x09, 0xDE, 0x17, 0x11, 0x85,
				703	0x8F, 0xAA, 0xC3, 0xA3, 0xBA, 0x20, 0xFB, 0xC3
				704	};
				705
				706	/* 192-bit test is the I=4 case of section B.2 of the Twofish book. */
				707	static Twofish_Byte k192[] = {
				708	0x88, 0xB2, 0xB2, 0x70, 0x6B, 0x10, 0x5E, 0x36,
				709	0xB4, 0x46, 0xBB, 0x6D, 0x73, 0x1A, 0x1E, 0x88,
				710	0xEF, 0xA7, 0x1F, 0x78, 0x89, 0x65, 0xBD, 0x44
				711	};
				712	static Twofish_Byte p192[] = {
				713	0x39, 0xDA, 0x69, 0xD6, 0xBA, 0x49, 0x97, 0xD5,
				714	0x85, 0xB6, 0xDC, 0x07, 0x3C, 0xA3, 0x41, 0xB2
				715	};
				716	static Twofish_Byte c192[] = {
				717	0x18, 0x2B, 0x02, 0xD8, 0x14, 0x97, 0xEA, 0x45,
				718	0xF9, 0xDA, 0xAC, 0xDC, 0x29, 0x19, 0x3A, 0x65
				719	};
				720
				721	/* 256-bit test is the I=4 case of section B.2 of the Twofish book. */
				722	static Twofish_Byte k256[] = {
				723	0xD4, 0x3B, 0xB7, 0x55, 0x6E, 0xA3, 0x2E, 0x46,
				724	0xF2, 0xA2, 0x82, 0xB7, 0xD4, 0x5B, 0x4E, 0x0D,
				725	0x57, 0xFF, 0x73, 0x9D, 0x4D, 0xC9, 0x2C, 0x1B,
				726	0xD7, 0xFC, 0x01, 0x70, 0x0C, 0xC8, 0x21, 0x6F
				727	};
				728	static Twofish_Byte p256[] = {
				729	0x90, 0xAF, 0xE9, 0x1B, 0xB2, 0x88, 0x54, 0x4F,
				730	0x2C, 0x32, 0xDC, 0x23, 0x9B, 0x26, 0x35, 0xE6
				731	};
				732	static Twofish_Byte c256[] = {
				733	0x6C, 0xB4, 0x56, 0x1C, 0x40, 0xBF, 0x0A, 0x97,
				734	0x05, 0x93, 0x1C, 0xB6, 0xD4, 0x08, 0xE7, 0xFA
				735	};
				736
				737	int ret;
				738
				739	/* Run the actual tests. */
				740	if ((ret = test_vector( k128, 16, p128, c128 )) < 0)
				741	return ret;
				742	if ((ret = test_vector( k192, 24, p192, c192 )) < 0)
				743	return ret;
				744	if ((ret = test_vector( k256, 32, p256, c256 )) < 0)
				745	return ret;
				746	return SUCCESS;
				747	}
				748
				749
				750	/*
				751	* Perform extensive test for a single key size.
				752	*
				753	* Test a single key size against the test vectors from section
				754	* B.2 in the Twofish book. This is a sequence of 49 encryptions
				755	* and decryptions. Each plaintext is equal to the ciphertext of
				756	* the previous encryption. The key is made up from the ciphertext
				757	* two and three encryptions ago. Both plaintext and key start
				758	* at the zero value.
				759	* We should have designed a cleaner recurrence relation for
				760	* these tests, but it is too late for that now. At least we learned
				761	* how to do it better next time.
				762	* For details see appendix B of the book.
				763	*
				764	* Arguments:
				765	* key_len Number of bytes of key
				766	* final_value Final plaintext value after 49 iterations
				767	*/
				768	static int test_sequence( int key_len, Twofish_Byte final_value[] )
				769	{
				770	Twofish_Byte buf[ (50+3)16 ]; / Buffer to hold our computation values. */
				771	Twofish_Byte tmp[16]; /* Temp for testing the decryption. */
				772	Twofish_key xkey; /* The expanded key */
				773	int i, ret;
				774	Twofish_Byte * p;
				775
				776	/* Wipe the buffer */
				777	memset( buf, 0, sizeof( buf ) );
				778
				779	/*
				780	* Because the recurrence relation is done in an inconvenient manner
				781	* we end up looping backwards over the buffer.
				782	*/
				783
				784	/* Pointer in buffer points to current plaintext. */
				785	p = &buf[50*16];
				786	for( i=1; i<50; i++ )
				787	{
				788	/*
				789	* Prepare a key.
				790	* This automatically checks that key_len is valid.
				791	*/
				792	if ((ret = Twofish_prepare_key( p+16, key_len, &xkey)) < 0)
				793	return ret;
				794
				795	/* Compute the next 16 bytes in the buffer */
				796	Twofish_encrypt( &xkey, p, p-16 );
				797
				798	/* Check that the decryption is correct. */
				799	Twofish_decrypt( &xkey, p-16, tmp );
				800	if( memcmp( tmp, p, 16 ) != 0 )
				801	{
				802	Twofish_fatal( "Twofish decryption failure in sequence", ERR_SEQ_DEC );
				803	}
				804	/* Move on to next 16 bytes in the buffer. */
				805	p -= 16;
				806	}
				807
				808	/* And check the final value. */
				809	if( memcmp( p, final_value, 16 ) != 0 )
				810	{
				811	Twofish_fatal( "Twofish encryption failure in sequence", ERR_SEQ_ENC );
				812	}
				813
				814	/* None of the data was secret, so there is no need to wipe anything. */
				815	return SUCCESS;
				816	}
				817
				818
				819	/*
				820	* Run all three sequence tests from the Twofish test vectors.
				821	*
				822	* This checks the most extensive test vectors currently available
				823	* for Twofish. The data is from the Twofish book, appendix B.2.
				824	*/
				825	static int test_sequences()
				826	{
				827	static Twofish_Byte r128[] = {
				828	0x5D, 0x9D, 0x4E, 0xEF, 0xFA, 0x91, 0x51, 0x57,
				829	0x55, 0x24, 0xF1, 0x15, 0x81, 0x5A, 0x12, 0xE0
				830	};
				831	static Twofish_Byte r192[] = {
				832	0xE7, 0x54, 0x49, 0x21, 0x2B, 0xEE, 0xF9, 0xF4,
				833	0xA3, 0x90, 0xBD, 0x86, 0x0A, 0x64, 0x09, 0x41
				834	};
				835	static Twofish_Byte r256[] = {
				836	0x37, 0xFE, 0x26, 0xFF, 0x1C, 0xF6, 0x61, 0x75,
				837	0xF5, 0xDD, 0xF4, 0xC3, 0x3B, 0x97, 0xA2, 0x05
				838	};
				839
				840	/* Run the three sequence test vectors */
				841	int ret;
				842	if ((ret = test_sequence( 16, r128)) < 0)
				843	return ret;
				844	if ((ret = test_sequence( 24, r192)) < 0)
				845	return ret;
				846	if ((ret = test_sequence( 32, r256)) < 0)
				847	return ret;
				848	return SUCCESS;
				849	}
				850
				851
				852	/*
				853	* Test the odd-sized keys.
				854	*
				855	* Every odd-sized key is equivalent to a one of 128, 192, or 256 bits.
				856	* The equivalent key is found by padding at the end with zero bytes
				857	* until a regular key size is reached.
				858	*
				859	* We just test that the key expansion routine behaves properly.
				860	* If the expanded keys are identical, then the encryptions and decryptions
				861	* will behave the same.
				862	*/
				863	static int test_odd_sized_keys()
				864	{
				865	Twofish_Byte buf[32];
				866	Twofish_key xkey;
				867	Twofish_key xkey_two;
				868	int i, ret;
				869
				870	/*
				871	* We first create an all-zero key to use as PRNG key.
				872	* Normally we would not have to fill the buffer with zeroes, as we could
				873	* just pass a zero key length to the Twofish_prepare_key function.
				874	* However, this relies on using odd-sized keys, and those are just the
				875	* ones we are testing here. We can't use an untested function to test
				876	* itself.
				877	*/
				878	memset( buf, 0, sizeof( buf ) );
				879	if ((ret = Twofish_prepare_key( buf, 16, &xkey)) < 0)
				880	return ret;
				881
				882	/* Fill buffer with pseudo-random data derived from two encryptions */
				883	Twofish_encrypt( &xkey, buf, buf );
				884	Twofish_encrypt( &xkey, buf, buf+16 );
				885
				886	/* Create all possible shorter keys that are prefixes of the buffer. */
				887	for( i=31; i>=0; i-- )
				888	{
				889	/* Set a byte to zero. This is the new padding byte */
				890	buf[i] = 0;
				891
				892	/* Expand the key with only i bytes of length */
				893	if ((ret = Twofish_prepare_key( buf, i, &xkey)) < 0)
				894	return ret;
				895
				896	/* Expand the corresponding padded key of regular length */
				897	if ((ret = Twofish_prepare_key( buf, i<=16 ? 16 : (i<= 24 ? 24 : 32), &xkey_two )) < 0)
				898	return ret;
				899
				900	/* Compare the two */
				901	if( memcmp( &xkey, &xkey_two, sizeof( xkey ) ) != 0 )
				902	{
				903	Twofish_fatal( "Odd sized keys do not expand properly", ERR_ODD_KEY );
				904	}
				905	}
				906
				907	/* None of the key values are secret, so we don't need to wipe them. */
				908	return SUCCESS;
				909	}
				910
				911
				912	/*
				913	* Test the Twofish implementation.
				914	*
				915	* This routine runs all the self tests, in order of importance.
				916	* It is called by the Twofish_initialise routine.
				917	*
				918	* In almost all applications the cost of running the self tests during
				919	* initialisation is insignificant, especially
				920	* compared to the time it takes to load the application from disk.
				921	* If you are very pressed for initialisation performance,
				922	* you could remove some of the tests. Make sure you did run them
				923	* once in the software and hardware configuration you are using.
				924	*/
				925	static int self_test()
				926	{
				927	int ret;
				928	/* The three test vectors form an absolute minimal test set. */
				929	if ((ret = test_vectors()) < 0)
				930	return ret;
				931
				932	/*
				933	* If at all possible you should run these tests too. They take
				934	* more time, but provide a more thorough coverage.
				935	*/
				936	if ((ret = test_sequences()) < 0)
				937	return ret;
				938
				939	/* Test the odd-sized keys. */
				940	if ((ret = test_odd_sized_keys()) < 0)
				941	return ret;
				942	return SUCCESS;
				943	}
				944
				945
				946	/*
				947	* And now, the actual Twofish implementation.
				948	*
				949	* This implementation generates all the tables during initialisation.
				950	* I don't like large tables in the code, especially since they are easily
				951	* damaged in the source without anyone noticing it. You need code to
				952	* generate them anyway, and this way all the code is close together.
				953	* Generating them in the application leads to a smaller executable
				954	* (the code is smaller than the tables it generates) and a
				955	* larger static memory footprint.
				956	*
				957	* Twofish can be implemented in many ways. I have chosen to
				958	* use large tables with a relatively long key setup time.
				959	* If you encrypt more than a few blocks of data it pays to pre-compute
				960	* as much as possible. This implementation is relatively inefficient for
				961	* applications that need to re-key every block or so.
				962	*/
				963
				964	/*
				965	* We start with the t-tables, directly from the Twofish definition.
				966	* These are nibble-tables, but merging them and putting them two nibbles
				967	* in one byte is more work than it is worth.
				968	*/
				969	static Twofish_Byte t_table[2][4][16] = {
				970	{
				971	{0x8,0x1,0x7,0xD,0x6,0xF,0x3,0x2,0x0,0xB,0x5,0x9,0xE,0xC,0xA,0x4},
				972	{0xE,0xC,0xB,0x8,0x1,0x2,0x3,0x5,0xF,0x4,0xA,0x6,0x7,0x0,0x9,0xD},
				973	{0xB,0xA,0x5,0xE,0x6,0xD,0x9,0x0,0xC,0x8,0xF,0x3,0x2,0x4,0x7,0x1},
				974	{0xD,0x7,0xF,0x4,0x1,0x2,0x6,0xE,0x9,0xB,0x3,0x0,0x8,0x5,0xC,0xA}
				975	},
				976	{
				977	{0x2,0x8,0xB,0xD,0xF,0x7,0x6,0xE,0x3,0x1,0x9,0x4,0x0,0xA,0xC,0x5},
				978	{0x1,0xE,0x2,0xB,0x4,0xC,0x3,0x7,0x6,0xD,0xA,0x5,0xF,0x9,0x0,0x8},
				979	{0x4,0xC,0x7,0x5,0x1,0x6,0x9,0xA,0x0,0xE,0xD,0x8,0x2,0xB,0x3,0xF},
				980	{0xB,0x9,0x5,0x1,0xC,0x3,0xD,0xE,0x6,0x4,0x7,0xF,0x2,0x0,0x8,0xA}
				981	}
				982	};
				983
				984
				985	/* A 1-bit rotation of 4-bit values. Input must be in range 0..15 */
				986	#define ROR4BY1( x ) (((x)>>1) \| (((x)<<3) & 0x8) )
				987
				988	/*
				989	* The q-boxes are only used during the key schedule computations.
				990	* These are 8->8 bit lookup tables. Some CPUs prefer to have 8->32 bit
				991	* lookup tables as it is faster to load a 32-bit value than to load an
				992	* 8-bit value and zero the rest of the register.
				993	* The LARGE_Q_TABLE switch allows you to choose 32-bit entries in
				994	* the q-tables. Here we just define the Qtype which is used to store
				995	* the entries of the q-tables.
				996	*/
				997	#if LARGE_Q_TABLE
				998	typedef Twofish_UInt32 Qtype;
				999	#else
				1000	typedef Twofish_Byte Qtype;
				1001	#endif
				1002
				1003	/*
				1004	* The actual q-box tables.
				1005	* There are two q-boxes, each having 256 entries.
				1006	*/
				1007	static Qtype q_table[2][256];
				1008
				1009
				1010	/*
				1011	* Now the function that converts a single t-table into a q-table.
				1012	*
				1013	* Arguments:
				1014	* t[4][16] : four 4->4bit lookup tables that define the q-box
				1015	* q[256] : output parameter: the resulting q-box as a lookup table.
				1016	*/
				1017	static void make_q_table( Twofish_Byte t[4][16], Qtype q[256] )
				1018	{
				1019	int ae,be,ao,bo; /* Some temporaries. */
				1020	int i;
				1021	/* Loop over all input values and compute the q-box result. */
				1022	for( i=0; i<256; i++ ) {
				1023	/*
				1024	* This is straight from the Twofish specifications.
				1025	*
				1026	* The ae variable is used for the a_i values from the specs
				1027	* with even i, and ao for the odd i's. Similarly for the b's.
				1028	*/
				1029	ae = i>>4; be = i&0xf;
				1030	ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8);
				1031	ae = t[0][ao]; be = t[1][bo];
				1032	ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8);
				1033	ae = t[2][ao]; be = t[3][bo];
				1034
				1035	/* Store the result in the q-box table, the cast avoids a warning. */
				1036	q[i] = (Qtype) ((be<<4) \| ae);
				1037	}
				1038	}
				1039
				1040
				1041	/*
				1042	* Initialise both q-box tables.
				1043	*/
				1044	static void initialise_q_boxes() {
				1045	/* Initialise each of the q-boxes using the t-tables */
				1046	make_q_table( t_table[0], q_table[0] );
				1047	make_q_table( t_table[1], q_table[1] );
				1048	}
				1049
				1050
				1051	/*
				1052	* Next up is the MDS matrix multiplication.
				1053	* The MDS matrix multiplication operates in the field
				1054	* GF(2)[x]/p(x) with p(x)=x^8+x^6+x^5+x^3+1.
				1055	* If you don't understand this, read a book on finite fields. You cannot
				1056	* follow the finite-field computations without some background.
				1057	*
				1058	* In this field, multiplication by x is easy: shift left one bit
				1059	* and if bit 8 is set then xor the result with 0x169.
				1060	*
				1061	* The MDS coefficients use a multiplication by 1/x,
				1062	* or rather a division by x. This is easy too: first make the
				1063	* value 'even' (i.e. bit 0 is zero) by xorring with 0x169 if necessary,
				1064	* and then shift right one position.
				1065	* Even easier: shift right and xor with 0xb4 if the lsbit was set.
				1066	*
				1067	* The MDS coefficients are 1, EF, and 5B, and we use the fact that
				1068	* EF = 1 + 1/x + 1/x^2
				1069	* 5B = 1 + 1/x^2
				1070	* in this field. This makes multiplication by EF and 5B relatively easy.
				1071	*
				1072	* This property is no accident, the MDS matrix was designed to allow
				1073	* this implementation technique to be used.
				1074	*
				1075	* We have four MDS tables, each mapping 8 bits to 32 bits.
				1076	* Each table performs one column of the matrix multiplication.
				1077	* As the MDS is always preceded by q-boxes, each of these tables
				1078	* also implements the q-box just previous to that column.
				1079	*/
				1080
				1081	/* The actual MDS tables. */
				1082	static Twofish_UInt32 MDS_table[4][256];
				1083
				1084	/* A small table to get easy conditional access to the 0xb4 constant. */
				1085	static Twofish_UInt32 mds_poly_divx_const[] = {0,0xb4};
				1086
				1087	/* Function to initialise the MDS tables. */
				1088	static void initialise_mds_tables()
				1089	{
				1090	int i;
				1091	Twofish_UInt32 q,qef,q5b; /* Temporary variables. */
				1092
				1093	/* Loop over all 8-bit input values */
				1094	for( i=0; i<256; i++ )
				1095	{
				1096	/*
				1097	* To save some work during the key expansion we include the last
				1098	* of the q-box layers from the h() function in these MDS tables.
				1099	*/
				1100
				1101	/* We first do the inputs that are mapped through the q0 table. */
				1102	q = q_table[0][i];
				1103	/*
				1104	* Here we divide by x, note the table to get 0xb4 only if the
				1105	* lsbit is set.
				1106	* This sets qef = (1/x)*q in the finite field
				1107	*/
				1108	qef = (q >> 1) ^ mds_poly_divx_const[ q & 1 ];
				1109	/*
				1110	* Divide by x again, and add q to get (1+1/x^2)*q.
				1111	* Note that (1+1/x^2) = 5B in the field, and addition in the field
				1112	* is exclusive or on the bits.
				1113	*/
				1114	q5b = (qef >> 1) ^ mds_poly_divx_const[ qef & 1 ] ^ q;
				1115	/*
				1116	* Add q5b to qef to set qef = (1+1/x+1/x^2)*q.
				1117	* Again, (1+1/x+1/x^2) = EF in the field.
				1118	*/
				1119	qef ^= q5b;
				1120
				1121	/*
				1122	* Now that we have q5b = 5B * q and qef = EF * q
				1123	* we can fill two of the entries in the MDS matrix table.
				1124	* See the Twofish specifications for the order of the constants.
				1125	*/
				1126	MDS_table[1][i] = (q <<24) \| (q5b<<16) \| (qef<<8) \| qef;
				1127	MDS_table[3][i] = (q5b<<24) \| (qef<<16) \| (q <<8) \| q5b;
				1128
				1129	/* Now we do it all again for the two columns that have a q1 box. */
				1130	q = q_table[1][i];
				1131	qef = (q >> 1) ^ mds_poly_divx_const[ q & 1 ];
				1132	q5b = (qef >> 1) ^ mds_poly_divx_const[ qef & 1 ] ^ q;
				1133	qef ^= q5b;
				1134
				1135	/* The other two columns use the coefficient in a different order. */
				1136	MDS_table[0][i] = (qef<<24) \| (qef<<16) \| (q5b<<8) \| q ;
				1137	MDS_table[2][i] = (qef<<24) \| (q <<16) \| (qef<<8) \| q5b;
				1138	}
				1139	}
				1140
				1141
				1142	/*
				1143	* The h() function is the heart of the Twofish cipher.
				1144	* It is a complicated sequence of q-box lookups, key material xors,
				1145	* and finally the MDS matrix.
				1146	* We use lots of macros to make this reasonably fast.
				1147	*/
				1148
				1149	/* First a shorthand for the two q-tables */
				1150	#define q0 q_table[0]
				1151	#define q1 q_table[1]
				1152
				1153	/*
				1154	* Each macro computes one column of the h for either 2, 3, or 4 stages.
				1155	* As there are 4 columns, we have 12 macros in all.
				1156	*
				1157	* The key bytes are stored in the Byte array L at offset
				1158	* 0,1,2,3, 8,9,10,11, [16,17,18,19, [24,25,26,27]] as this is the
				1159	* order we get the bytes from the user. If you look at the Twofish
				1160	* specs, you'll see that h() is applied to the even key words or the
				1161	* odd key words. The bytes of the even words appear in this spacing,
				1162	* and those of the odd key words too.
				1163	*
				1164	* These macros are the only place where the q-boxes and the MDS table
				1165	* are used.
				1166	*/
				1167	#define H02( y, L ) MDS_table[0][q0[q0[y]^L[ 8]]^L[0]]
				1168	#define H12( y, L ) MDS_table[1][q0[q1[y]^L[ 9]]^L[1]]
				1169	#define H22( y, L ) MDS_table[2][q1[q0[y]^L[10]]^L[2]]
				1170	#define H32( y, L ) MDS_table[3][q1[q1[y]^L[11]]^L[3]]
				1171	#define H03( y, L ) H02( q1[y]^L[16], L )
				1172	#define H13( y, L ) H12( q1[y]^L[17], L )
				1173	#define H23( y, L ) H22( q0[y]^L[18], L )
				1174	#define H33( y, L ) H32( q0[y]^L[19], L )
				1175	#define H04( y, L ) H03( q1[y]^L[24], L )
				1176	#define H14( y, L ) H13( q0[y]^L[25], L )
				1177	#define H24( y, L ) H23( q0[y]^L[26], L )
				1178	#define H34( y, L ) H33( q1[y]^L[27], L )
				1179
				1180	/*
				1181	* Now we can define the h() function given an array of key bytes.
				1182	* This function is only used in the key schedule, and not to pre-compute
				1183	* the keyed S-boxes.
				1184	*
				1185	* In the key schedule, the input is always of the form k*(1+2^8+2^16+2^24)
				1186	* so we only provide k as an argument.
				1187	*
				1188	* Arguments:
				1189	* k input to the h() function.
				1190	* L pointer to array of key bytes at
				1191	* offsets 0,1,2,3, ... 8,9,10,11, [16,17,18,19, [24,25,26,27]]
				1192	* kCycles # key cycles, 2, 3, or 4.
				1193	*/
				1194	static Twofish_UInt32 h( int k, Twofish_Byte L[], int kCycles )
				1195	{
				1196	switch( kCycles ) {
				1197	/* We code all 3 cases separately for speed reasons. */
				1198	case 2:
				1199	return H02(k,L) ^ H12(k,L) ^ H22(k,L) ^ H32(k,L);
				1200	case 3:
				1201	return H03(k,L) ^ H13(k,L) ^ H23(k,L) ^ H33(k,L);
				1202	case 4:
				1203	return H04(k,L) ^ H14(k,L) ^ H24(k,L) ^ H34(k,L);
				1204	default:
				1205	/* This is always a coding error, which is fatal. */
				1206	Twofish_fatal( "Twofish h(): Illegal argument", ERR_ILL_ARG );
				1207	return ERR_ILL_ARG;
				1208	}
				1209	}
				1210
				1211
				1212	/*
				1213	* Pre-compute the keyed S-boxes.
				1214	* Fill the pre-computed S-box array in the expanded key structure.
				1215	* Each pre-computed S-box maps 8 bits to 32 bits.
				1216	*
				1217	* The S argument contains half the number of bytes of the full key, but is
				1218	* derived from the full key. (See Twofish specifications for details.)
				1219	* S has the weird byte input order used by the Hxx macros.
				1220	*
				1221	* This function takes most of the time of a key expansion.
				1222	*
				1223	* Arguments:
				1224	* S pointer to array of 8*kCycles Bytes containing the S vector.
				1225	* kCycles number of key words, must be in the set {2,3,4}
				1226	* xkey pointer to Twofish_key structure that will contain the S-boxes.
				1227	*/
				1228	static int fill_keyed_sboxes( Twofish_Byte S[], int kCycles, Twofish_key * xkey )
				1229	{
				1230	int i;
				1231	switch( kCycles ) {
				1232	/* We code all 3 cases separately for speed reasons. */
				1233	case 2:
				1234	for( i=0; i<256; i++ )
				1235	{
				1236	xkey->s[0][i]= H02( i, S );
				1237	xkey->s[1][i]= H12( i, S );
				1238	xkey->s[2][i]= H22( i, S );
				1239	xkey->s[3][i]= H32( i, S );
				1240	}
				1241	break;
				1242	case 3:
				1243	for( i=0; i<256; i++ )
				1244	{
				1245	xkey->s[0][i]= H03( i, S );
				1246	xkey->s[1][i]= H13( i, S );
				1247	xkey->s[2][i]= H23( i, S );
				1248	xkey->s[3][i]= H33( i, S );
				1249	}
				1250	break;
				1251	case 4:
				1252	for( i=0; i<256; i++ )
				1253	{
				1254	xkey->s[0][i]= H04( i, S );
				1255	xkey->s[1][i]= H14( i, S );
				1256	xkey->s[2][i]= H24( i, S );
				1257	xkey->s[3][i]= H34( i, S );
				1258	}
				1259	break;
				1260	default:
				1261	/* This is always a coding error, which is fatal. */
				1262	Twofish_fatal( "Twofish fill_keyed_sboxes(): Illegal argument", ERR_ILL_ARG );
				1263	}
				1264	return SUCCESS;
				1265	}
				1266
				1267
				1268	/* A flag to keep track of whether we have been initialised or not. */
				1269	static int Twofish_initialised = 0;
				1270
				1271	/*
				1272	* Initialise the Twofish implementation.
				1273	* This function must be called before any other function in the
				1274	* Twofish implementation is called.
				1275	* This routine also does some sanity checks, to make sure that
				1276	* all the macros behave, and it tests the whole cipher.
				1277	*/
				1278	int Twofish_initialise()
				1279	{
				1280	int ret;
				1281	/* First test the various platform-specific definitions. */
				1282	if ((ret = test_platform()) < 0)
				1283	return ret;
				1284
				1285	/* We can now generate our tables, in the right order of course. */
				1286	initialise_q_boxes();
				1287	initialise_mds_tables();
				1288
				1289	/* We're finished with the initialisation itself. */
				1290	Twofish_initialised = 1;
				1291
				1292	/*
				1293	* And run some tests on the whole cipher.
				1294	* Yes, you need to do this every time you start your program.
				1295	* It is called assurance; you have to be certain that your program
				1296	* still works properly.
				1297	*/
				1298	return self_test();
				1299	}
				1300
				1301
				1302	/*
				1303	* The Twofish key schedule uses an Reed-Solomon code matrix multiply.
				1304	* Just like the MDS matrix, the RS-matrix is designed to be easy
				1305	* to implement. Details are below in the code.
				1306	*
				1307	* These constants make it easy to compute in the finite field used
				1308	* for the RS code.
				1309	*
				1310	* We use Bytes for the RS computation, but these are automatically
				1311	* widened to unsigned integers in the expressions. Having unsigned
				1312	* ints in these tables therefore provides the fastest access.
				1313	*/
				1314	static unsigned int rs_poly_const[] = {0, 0x14d};
				1315	static unsigned int rs_poly_div_const[] = {0, 0xa6 };
				1316
				1317
				1318	/*
				1319	* Prepare a key for use in encryption and decryption.
				1320	* Like most block ciphers, Twofish allows the key schedule
				1321	* to be pre-computed given only the key.
				1322	* Twofish has a fairly 'heavy' key schedule that takes a lot of time
				1323	* to compute. The main work is pre-computing the S-boxes used in the
				1324	* encryption and decryption. We feel that this makes the cipher much
				1325	* harder to attack. The attacker doesn't even know what the S-boxes
				1326	* contain without including the entire key schedule in the analysis.
				1327	*
				1328	* Unlike most Twofish implementations, this one allows any key size from
				1329	* 0 to 32 bytes. Odd key sizes are defined for Twofish (see the
				1330	* specifications); the key is simply padded with zeroes to the next real
				1331	* key size of 16, 24, or 32 bytes.
				1332	* Each odd-sized key is thus equivalent to a single normal-sized key.
				1333	*
				1334	* Arguments:
				1335	* key array of key bytes
				1336	* key_len number of bytes in the key, must be in the range 0,...,32.
				1337	* xkey Pointer to an Twofish_key structure that will be filled
				1338	* with the internal form of the cipher key.
				1339	*/
				1340	int Twofish_prepare_key( Twofish_Byte key[], int key_len, Twofish_key * xkey )
				1341	{
				1342	/* We use a single array to store all key material in,
				1343	* to simplify the wiping of the key material at the end.
				1344	* The first 32 bytes contain the actual (padded) cipher key.
				1345	* The next 32 bytes contain the S-vector in its weird format,
				1346	* and we have 4 bytes of overrun necessary for the RS-reduction.
				1347	*/
				1348	Twofish_Byte K[32+32+4];
				1349
				1350	int kCycles; /* # key cycles, 2,3, or 4. */
				1351
				1352	int i;
				1353	Twofish_UInt32 A, B; /* Used to compute the round keys. */
				1354
				1355	Twofish_Byte * kptr; /* Three pointers for the RS computation. */
				1356	Twofish_Byte * sptr;
				1357	Twofish_Byte * t;
				1358
				1359	Twofish_Byte b,bx,bxx; /* Some more temporaries for the RS computation. */
				1360
				1361	/* Check that the Twofish implementation was initialised. */
				1362	if( Twofish_initialised == 0 )
				1363	{
				1364	/*
				1365	* You didn't call Twofish_initialise before calling this routine.
				1366	* This is a programming error, and therefore we call the fatal
				1367	* routine.
				1368	*
				1369	* I could of course call the initialisation routine here,
				1370	* but there are a few reasons why I don't. First of all, the
				1371	* self-tests have to be done at startup. It is no good to inform
				1372	* the user that the cipher implementation fails when he wants to
				1373	* write his data to disk in encrypted form. You have to warn him
				1374	* before he spends time typing his data. Second, the initialisation
				1375	* and self test are much slower than a single key expansion.
				1376	* Calling the initialisation here makes the performance of the
				1377	* cipher unpredictable. This can lead to really weird problems
				1378	* if you use the cipher for a real-time task. Suddenly it fails
				1379	* once in a while the first time you try to use it. Things like
				1380	* that are almost impossible to debug.
				1381	*/
				1382	/* Twofish_fatal( "Twofish implementation was not initialised.", ERR_INIT ); */
				1383
				1384	/*
				1385	* There is always a danger that the Twofish_fatal routine returns,
				1386	* in spite of the specifications that it should not.
				1387	* (A good programming rule: don't trust the rest of the code.)
				1388	* This would be disasterous. If the q-tables and MDS-tables have
				1389	* not been initialised, they are probably still filled with zeroes.
				1390	* Suppose the MDS-tables are all zero. The key expansion would then
				1391	* generate all-zero round keys, and all-zero s-boxes. The danger
				1392	* is that nobody would notice as the encry
				1393	* mangles the input, and the decryption still 'decrypts' it,
				1394	* but now in a completely key-independent manner.
				1395	* To stop such security disasters, we use blunt force.
				1396	* If your program hangs here: fix the fatal routine!
				1397	*/
				1398	for(;;); /* Infinite loop, which beats being insecure. */
				1399	}
				1400
				1401	/* Check for valid key length. */
				1402	if( key_len < 0 \|\| key_len > 32 )
				1403	{
				1404	/*
				1405	* This can only happen if a programmer didn't read the limitations
				1406	* on the key size.
				1407	*/
				1408	Twofish_fatal( "Twofish_prepare_key: illegal key length", ERR_KEY_LEN );
				1409	/*
				1410	* A return statement just in case the fatal macro returns.
				1411	* The rest of the code assumes that key_len is in range, and would
				1412	* buffer-overflow if it wasn't.
				1413	*
				1414	* Why do we still use a programming language that has problems like
				1415	* buffer overflows, when these problems were solved in 1960 with
				1416	* the development of Algol? Have we not leared anything?
				1417	*/
				1418	return ERR_KEY_LEN;
				1419	}
				1420
				1421	/* Pad the key with zeroes to the next suitable key length. */
				1422	memcpy( K, key, key_len );
				1423	memset( K+key_len, 0, sizeof(K)-key_len );
				1424
				1425	/*
				1426	* Compute kCycles: the number of key cycles used in the cipher.
				1427	* 2 for 128-bit keys, 3 for 192-bit keys, and 4 for 256-bit keys.
				1428	*/
				1429	kCycles = (key_len + 7) >> 3;
				1430	/* Handle the special case of very short keys: minimum 2 cycles. */
				1431	if( kCycles < 2 )
				1432	{
				1433	kCycles = 2;
				1434	}
				1435
				1436	/*
				1437	* From now on we just pretend to have 8*kCycles bytes of
				1438	* key material in K. This handles all the key size cases.
				1439	*/
				1440
				1441	/*
				1442	* We first compute the 40 expanded key words,
				1443	* formulas straight from the Twofish specifications.
				1444	*/
				1445	for( i=0; i<40; i+=2 )
				1446	{
				1447	/*
				1448	* Due to the byte spacing expected by the h() function
				1449	* we can pick the bytes directly from the key K.
				1450	* As we use bytes, we never have the little/big endian
				1451	* problem.
				1452	*
				1453	* Note that we apply the rotation function only to simple
				1454	* variables, as the rotation macro might evaluate its argument
				1455	* more than once.
				1456	*/
				1457	A = h( i , K , kCycles );
				1458	B = h( i+1, K+4, kCycles );
				1459	B = ROL32( B, 8 );
				1460
				1461	/* Compute and store the round keys. */
				1462	A += B;
				1463	B += A;
				1464	xkey->K[i] = A;
				1465	xkey->K[i+1] = ROL32( B, 9 );
				1466	}
				1467
				1468	/* Wipe variables that contained key material. */
				1469	A=B=0;
				1470
				1471	/*
				1472	* And now the dreaded RS multiplication that few seem to understand.
				1473	* The RS matrix is not random, and is specially designed to compute the
				1474	* RS matrix multiplication in a simple way.
				1475	*
				1476	* We work in the field GF(2)[x]/x^8+x^6+x^3+x^2+1. Note that this is a
				1477	* different field than used for the MDS matrix.
				1478	* (At least, it is a different representation because all GF(2^8)
				1479	* representations are equivalent in some form.)
				1480	*
				1481	* We take 8 consecutive bytes of the key and interpret them as
				1482	* a polynomial k_0 + k_1 y + k_2 y^2 + ... + k_7 y^7 where
				1483	* the k_i bytes are the key bytes and are elements of the finite field.
				1484	* We multiply this polynomial by y^4 and reduce it modulo
				1485	* y^4 + (x + 1/x)y^3 + (x)y^2 + (x + 1/x)y + 1.
				1486	* using straightforward polynomial modulo reduction.
				1487	* The coefficients of the result are the result of the RS
				1488	* matrix multiplication. When we wrote the Twofish specification,
				1489	* the original RS definition used the polynomials,
				1490	* but that requires much more mathematical knowledge.
				1491	* We were already using matrix multiplication in a finite field for
				1492	* the MDS matrix, so I re-wrote the RS operation as a matrix
				1493	* multiplication to reduce the difficulty of understanding it.
				1494	* Some implementors have not picked up on this simpler method of
				1495	* computing the RS operation, even though it is mentioned in the
				1496	* specifications.
				1497	*
				1498	* It is possible to perform these computations faster by using 32-bit
				1499	* word operations, but that is not portable and this is not a speed-
				1500	* critical area.
				1501	*
				1502	* We explained the 1/x computation when we did the MDS matrix.
				1503	*
				1504	* The S vector is stored in K[32..64].
				1505	* The S vector has to be reversed, so we loop cross-wise.
				1506	*
				1507	* Note the weird byte spacing of the S-vector, to match the even
				1508	* or odd key words arrays. See the discussion at the Hxx macros for
				1509	* details.
				1510	*/
				1511	kptr = K + 8kCycles; / Start at end of key */
				1512	sptr = K + 32; /* Start at start of S */
				1513
				1514	/* Loop over all key material */
				1515	while( kptr > K )
				1516	{
				1517	kptr -= 8;
				1518	/*
				1519	* Initialise the polynimial in sptr[0..12]
				1520	* The first four coefficients are 0 as we have to multiply by y^4.
				1521	* The next 8 coefficients are from the key material.
				1522	*/
				1523	memset( sptr, 0, 4 );
				1524	memcpy( sptr+4, kptr, 8 );
				1525
				1526	/*
				1527	* The 12 bytes starting at sptr are now the coefficients of
				1528	* the polynomial we need to reduce.
				1529	*/
				1530
				1531	/* Loop over the polynomial coefficients from high to low */
				1532	t = sptr+11;
				1533	/* Keep looping until polynomial is degree 3; */
				1534	while( t > sptr+3 )
				1535	{
				1536	/* Pick up the highest coefficient of the poly. */
				1537	b = *t;
				1538
				1539	/*
				1540	* Compute x and (x+1/x) times this coefficient.
				1541	* See the MDS matrix implementation for a discussion of
				1542	* multiplication by x and 1/x. We just use different
				1543	* constants here as we are in a
				1544	* different finite field representation.
				1545	*
				1546	* These two statements set
				1547	* bx = (x) * b
				1548	* bxx= (x + 1/x) * b
				1549	*/
				1550	bx = (Twofish_Byte)((b<<1) ^ rs_poly_const[ b>>7 ]);
				1551	bxx= (Twofish_Byte)((b>>1) ^ rs_poly_div_const[ b&1 ] ^ bx);
				1552
				1553	/*
				1554	* Subtract suitable multiple of
				1555	* y^4 + (x + 1/x)y^3 + (x)y^2 + (x + 1/x)y + 1
				1556	* from the polynomial, except that we don't bother
				1557	* updating t[0] as it will become zero anyway.
				1558	*/
				1559	t[-1] ^= bxx;
				1560	t[-2] ^= bx;
				1561	t[-3] ^= bxx;
				1562	t[-4] ^= b;
				1563
				1564	/* Go to the next coefficient. */
				1565	t--;
				1566	}
				1567
				1568	/* Go to next S-vector word, obeying the weird spacing rules. */
				1569	sptr += 8;
				1570	}
				1571
				1572	/* Wipe variables that contained key material. */
				1573	b = bx = bxx = 0;
				1574
				1575	/* And finally, we can compute the key-dependent S-boxes. */
				1576	fill_keyed_sboxes( &K[32], kCycles, xkey );
				1577
				1578	/* Wipe array that contained key material. */
				1579	memset( K, 0, sizeof( K ) );
				1580	return SUCCESS;
				1581	}
				1582
				1583
				1584	/*
				1585	* We can now start on the actual encryption and decryption code.
				1586	* As these are often speed-critical we will use a lot of macros.
				1587	*/
				1588
				1589	/*
				1590	* The g() function is the heart of the round function.
				1591	* We have two versions of the g() function, one without an input
				1592	* rotation and one with.
				1593	* The pre-computed S-boxes make this pretty simple.
				1594	*/
				1595	#define g0(X,xkey) \
				1596	(xkey->s[0][b0(X)]^xkey->s[1][b1(X)]^xkey->s[2][b2(X)]^xkey->s[3][b3(X)])
				1597
				1598	#define g1(X,xkey) \
				1599	(xkey->s[0][b3(X)]^xkey->s[1][b0(X)]^xkey->s[2][b1(X)]^xkey->s[3][b2(X)])
				1600
				1601	/*
				1602	* A single round of Twofish. The A,B,C,D are the four state variables,
				1603	* T0 and T1 are temporaries, xkey is the expanded key, and r the
				1604	* round number.
				1605	*
				1606	* Note that this macro does not implement the swap at the end of the round.
				1607	*/
				1608	#define ENCRYPT_RND( A,B,C,D, T0, T1, xkey, r ) \
				1609	T0 = g0(A,xkey); T1 = g1(B,xkey);\
				1610	C ^= T0+T1+xkey->K[8+2*(r)]; C = ROR32(C,1);\
				1611	D = ROL32(D,1); D ^= T0+2T1+xkey->K[8+2(r)+1]
				1612
				1613	/*
				1614	* Encrypt a single cycle, consisting of two rounds.
				1615	* This avoids the swapping of the two halves.
				1616	* Parameter r is now the cycle number.
				1617	*/
				1618	#define ENCRYPT_CYCLE( A, B, C, D, T0, T1, xkey, r ) \
				1619	ENCRYPT_RND( A,B,C,D,T0,T1,xkey,2*(r) );\
				1620	ENCRYPT_RND( C,D,A,B,T0,T1,xkey,2*(r)+1 )
				1621
				1622	/* Full 16-round encryption */
				1623	#define ENCRYPT( A,B,C,D,T0,T1,xkey ) \
				1624	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 0 );\
				1625	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 1 );\
				1626	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 2 );\
				1627	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 3 );\
				1628	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 4 );\
				1629	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 5 );\
				1630	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 6 );\
				1631	ENCRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 7 )
				1632
				1633	/*
				1634	* A single round of Twofish for decryption. It differs from
				1635	* ENCRYTP_RND only because of the 1-bit rotations.
				1636	*/
				1637	#define DECRYPT_RND( A,B,C,D, T0, T1, xkey, r ) \
				1638	T0 = g0(A,xkey); T1 = g1(B,xkey);\
				1639	C = ROL32(C,1); C ^= T0+T1+xkey->K[8+2*(r)];\
				1640	D ^= T0+2T1+xkey->K[8+2(r)+1]; D = ROR32(D,1)
				1641
				1642	/*
				1643	* Decrypt a single cycle, consisting of two rounds.
				1644	* This avoids the swapping of the two halves.
				1645	* Parameter r is now the cycle number.
				1646	*/
				1647	#define DECRYPT_CYCLE( A, B, C, D, T0, T1, xkey, r ) \
				1648	DECRYPT_RND( A,B,C,D,T0,T1,xkey,2*(r)+1 );\
				1649	DECRYPT_RND( C,D,A,B,T0,T1,xkey,2*(r) )
				1650
				1651	/* Full 16-round decryption. */
				1652	#define DECRYPT( A,B,C,D,T0,T1, xkey ) \
				1653	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 7 );\
				1654	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 6 );\
				1655	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 5 );\
				1656	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 4 );\
				1657	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 3 );\
				1658	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 2 );\
				1659	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 1 );\
				1660	DECRYPT_CYCLE( A,B,C,D,T0,T1,xkey, 0 )
				1661
				1662	/*
				1663	* A macro to read the state from the plaintext and do the initial key xors.
				1664	* The koff argument allows us to use the same macro
				1665	* for the decryption which uses different key words at the start.
				1666	*/
				1667	#define GET_INPUT( src, A,B,C,D, xkey, koff ) \
				1668	A = GET32(src )^xkey->K[ koff]; B = GET32(src+ 4)^xkey->K[1+koff]; \
				1669	C = GET32(src+ 8)^xkey->K[2+koff]; D = GET32(src+12)^xkey->K[3+koff]
				1670
				1671	/*
				1672	* Similar macro to put the ciphertext in the output buffer.
				1673	* We xor the keys into the state variables before we use the PUT32
				1674	* macro as the macro might use its argument multiple times.
				1675	*/
				1676	#define PUT_OUTPUT( A,B,C,D, dst, xkey, koff ) \
				1677	A ^= xkey->K[ koff]; B ^= xkey->K[1+koff]; \
				1678	C ^= xkey->K[2+koff]; D ^= xkey->K[3+koff]; \
				1679	PUT32( A, dst ); PUT32( B, dst+ 4 ); \
				1680	PUT32( C, dst+8 ); PUT32( D, dst+12 )
				1681
				1682
				1683	/*
				1684	* Twofish block encryption
				1685	*
				1686	* Arguments:
				1687	* xkey expanded key array
				1688	* p 16 bytes of plaintext
				1689	* c 16 bytes in which to store the ciphertext
				1690	*/
				1691	void Twofish_encrypt( Twofish_key * xkey, Twofish_Byte p[16], Twofish_Byte c[16])
				1692	{
				1693	Twofish_UInt32 A,B,C,D,T0,T1; /* Working variables */
				1694
				1695	/* Get the four plaintext words xorred with the key */
				1696	GET_INPUT( p, A,B,C,D, xkey, 0 );
				1697
				1698	/* Do 8 cycles (= 16 rounds) */
				1699	ENCRYPT( A,B,C,D,T0,T1,xkey );
				1700
				1701	/* Store them with the final swap and the output whitening. */
				1702	PUT_OUTPUT( C,D,A,B, c, xkey, 4 );
				1703	}
				1704
				1705
				1706	/*
				1707	* Twofish block decryption.
				1708	*
				1709	* Arguments:
				1710	* xkey expanded key array
				1711	* p 16 bytes of plaintext
				1712	* c 16 bytes in which to store the ciphertext
				1713	*/
				1714	void Twofish_decrypt( Twofish_key * xkey, Twofish_Byte c[16], Twofish_Byte p[16])
				1715	{
				1716	Twofish_UInt32 A,B,C,D,T0,T1; /* Working variables */
				1717
				1718	/* Get the four plaintext words xorred with the key */
				1719	GET_INPUT( c, A,B,C,D, xkey, 4 );
				1720
				1721	/* Do 8 cycles (= 16 rounds) */
				1722	DECRYPT( A,B,C,D,T0,T1,xkey );
				1723
				1724	/* Store them with the final swap and the output whitening. */
				1725	PUT_OUTPUT( C,D,A,B, p, xkey, 0 );
				1726	}
				1727
				1728	/*
				1729	* Using the macros it is easy to make special routines for
				1730	* CBC mode, CTR mode etc. The only thing you might want to
				1731	* add is a XOR_PUT_OUTPUT which xors the outputs into the
				1732	* destinationa instead of overwriting the data. This requires
				1733	* a XOR_PUT32 macro as well, but that should all be trivial.
				1734	*
				1735	* I thought about including routines for the separate cipher
				1736	* modes here, but it is unclear which modes should be included,
				1737	* and each encryption or decryption routine takes up a lot of code space.
				1738	* Also, I don't have any test vectors for any cipher modes
				1739	* with Twofish.
				1740	*/
				1741
				1742