Alexandre Savard | 1b09e31 | 2012-08-07 20:33:29 -0400 | [diff] [blame] | 1 | Notes: 2001-09-24 |
| 2 | ----------------- |
| 3 | |
| 4 | This "description" (if one chooses to call it that) needed some major updating |
| 5 | so here goes. This update addresses a change being made at the same time to |
| 6 | OpenSSL, and it pretty much completely restructures the underlying mechanics of |
| 7 | the "ENGINE" code. So it serves a double purpose of being a "ENGINE internals |
| 8 | for masochists" document *and* a rather extensive commit log message. (I'd get |
| 9 | lynched for sticking all this in CHANGES or the commit mails :-). |
| 10 | |
| 11 | ENGINE_TABLE underlies this restructuring, as described in the internal header |
| 12 | "eng_int.h", implemented in eng_table.c, and used in each of the "class" files; |
| 13 | tb_rsa.c, tb_dsa.c, etc. |
| 14 | |
| 15 | However, "EVP_CIPHER" underlies the motivation and design of ENGINE_TABLE so |
| 16 | I'll mention a bit about that first. EVP_CIPHER (and most of this applies |
| 17 | equally to EVP_MD for digests) is both a "method" and a algorithm/mode |
| 18 | identifier that, in the current API, "lingers". These cipher description + |
| 19 | implementation structures can be defined or obtained directly by applications, |
| 20 | or can be loaded "en masse" into EVP storage so that they can be catalogued and |
| 21 | searched in various ways, ie. two ways of encrypting with the "des_cbc" |
| 22 | algorithm/mode pair are; |
| 23 | |
| 24 | (i) directly; |
| 25 | const EVP_CIPHER *cipher = EVP_des_cbc(); |
| 26 | EVP_EncryptInit(&ctx, cipher, key, iv); |
| 27 | [ ... use EVP_EncryptUpdate() and EVP_EncryptFinal() ...] |
| 28 | |
| 29 | (ii) indirectly; |
| 30 | OpenSSL_add_all_ciphers(); |
| 31 | cipher = EVP_get_cipherbyname("des_cbc"); |
| 32 | EVP_EncryptInit(&ctx, cipher, key, iv); |
| 33 | [ ... etc ... ] |
| 34 | |
| 35 | The latter is more generally used because it also allows ciphers/digests to be |
| 36 | looked up based on other identifiers which can be useful for automatic cipher |
| 37 | selection, eg. in SSL/TLS, or by user-controllable configuration. |
| 38 | |
| 39 | The important point about this is that EVP_CIPHER definitions and structures are |
| 40 | passed around with impunity and there is no safe way, without requiring massive |
| 41 | rewrites of many applications, to assume that EVP_CIPHERs can be reference |
| 42 | counted. One an EVP_CIPHER is exposed to the caller, neither it nor anything it |
| 43 | comes from can "safely" be destroyed. Unless of course the way of getting to |
| 44 | such ciphers is via entirely distinct API calls that didn't exist before. |
| 45 | However existing API usage cannot be made to understand when an EVP_CIPHER |
| 46 | pointer, that has been passed to the caller, is no longer being used. |
| 47 | |
| 48 | The other problem with the existing API w.r.t. to hooking EVP_CIPHER support |
| 49 | into ENGINE is storage - the OBJ_NAME-based storage used by EVP to register |
| 50 | ciphers simultaneously registers cipher *types* and cipher *implementations* - |
| 51 | they are effectively the same thing, an "EVP_CIPHER" pointer. The problem with |
| 52 | hooking in ENGINEs is that multiple ENGINEs may implement the same ciphers. The |
| 53 | solution is necessarily that ENGINE-provided ciphers simply are not registered, |
| 54 | stored, or exposed to the caller in the same manner as existing ciphers. This is |
| 55 | especially necessary considering the fact ENGINE uses reference counts to allow |
| 56 | for cleanup, modularity, and DSO support - yet EVP_CIPHERs, as exposed to |
| 57 | callers in the current API, support no such controls. |
| 58 | |
| 59 | Another sticking point for integrating cipher support into ENGINE is linkage. |
| 60 | Already there is a problem with the way ENGINE supports RSA, DSA, etc whereby |
| 61 | they are available *because* they're part of a giant ENGINE called "openssl". |
| 62 | Ie. all implementations *have* to come from an ENGINE, but we get round that by |
| 63 | having a giant ENGINE with all the software support encapsulated. This creates |
| 64 | linker hassles if nothing else - linking a 1-line application that calls 2 basic |
| 65 | RSA functions (eg. "RSA_free(RSA_new());") will result in large quantities of |
| 66 | ENGINE code being linked in *and* because of that DSA, DH, and RAND also. If we |
| 67 | continue with this approach for EVP_CIPHER support (even if it *was* possible) |
| 68 | we would lose our ability to link selectively by selectively loading certain |
| 69 | implementations of certain functionality. Touching any part of any kind of |
| 70 | crypto would result in massive static linkage of everything else. So the |
| 71 | solution is to change the way ENGINE feeds existing "classes", ie. how the |
| 72 | hooking to ENGINE works from RSA, DSA, DH, RAND, as well as adding new hooking |
| 73 | for EVP_CIPHER, and EVP_MD. |
| 74 | |
| 75 | The way this is now being done is by mostly reverting back to how things used to |
| 76 | work prior to ENGINE :-). Ie. RSA now has a "RSA_METHOD" pointer again - this |
| 77 | was previously replaced by an "ENGINE" pointer and all RSA code that required |
| 78 | the RSA_METHOD would call ENGINE_get_RSA() each time on its ENGINE handle to |
| 79 | temporarily get and use the ENGINE's RSA implementation. Apart from being more |
| 80 | efficient, switching back to each RSA having an RSA_METHOD pointer also allows |
| 81 | us to conceivably operate with *no* ENGINE. As we'll see, this removes any need |
| 82 | for a fallback ENGINE that encapsulates default implementations - we can simply |
| 83 | have our RSA structure pointing its RSA_METHOD pointer to the software |
| 84 | implementation and have its ENGINE pointer set to NULL. |
| 85 | |
| 86 | A look at the EVP_CIPHER hooking is most explanatory, the RSA, DSA (etc) cases |
| 87 | turn out to be degenerate forms of the same thing. The EVP storage of ciphers, |
| 88 | and the existing EVP API functions that return "software" implementations and |
| 89 | descriptions remain untouched. However, the storage takes more meaning in terms |
| 90 | of "cipher description" and less meaning in terms of "implementation". When an |
| 91 | EVP_CIPHER_CTX is actually initialised with an EVP_CIPHER method and is about to |
| 92 | begin en/decryption, the hooking to ENGINE comes into play. What happens is that |
| 93 | cipher-specific ENGINE code is asked for an ENGINE pointer (a functional |
| 94 | reference) for any ENGINE that is registered to perform the algo/mode that the |
| 95 | provided EVP_CIPHER structure represents. Under normal circumstances, that |
| 96 | ENGINE code will return NULL because no ENGINEs will have had any cipher |
| 97 | implementations *registered*. As such, a NULL ENGINE pointer is stored in the |
| 98 | EVP_CIPHER_CTX context, and the EVP_CIPHER structure is left hooked into the |
| 99 | context and so is used as the implementation. Pretty much how things work now |
| 100 | except we'd have a redundant ENGINE pointer set to NULL and doing nothing. |
| 101 | |
| 102 | Conversely, if an ENGINE *has* been registered to perform the algorithm/mode |
| 103 | combination represented by the provided EVP_CIPHER, then a functional reference |
| 104 | to that ENGINE will be returned to the EVP_CIPHER_CTX during initialisation. |
| 105 | That functional reference will be stored in the context (and released on |
| 106 | cleanup) - and having that reference provides a *safe* way to use an EVP_CIPHER |
| 107 | definition that is private to the ENGINE. Ie. the EVP_CIPHER provided by the |
| 108 | application will actually be replaced by an EVP_CIPHER from the registered |
| 109 | ENGINE - it will support the same algorithm/mode as the original but will be a |
| 110 | completely different implementation. Because this EVP_CIPHER isn't stored in the |
| 111 | EVP storage, nor is it returned to applications from traditional API functions, |
| 112 | there is no associated problem with it not having reference counts. And of |
| 113 | course, when one of these "private" cipher implementations is hooked into |
| 114 | EVP_CIPHER_CTX, it is done whilst the EVP_CIPHER_CTX holds a functional |
| 115 | reference to the ENGINE that owns it, thus the use of the ENGINE's EVP_CIPHER is |
| 116 | safe. |
| 117 | |
| 118 | The "cipher-specific ENGINE code" I mentioned is implemented in tb_cipher.c but |
| 119 | in essence it is simply an instantiation of "ENGINE_TABLE" code for use by |
| 120 | EVP_CIPHER code. tb_digest.c is virtually identical but, of course, it is for |
| 121 | use by EVP_MD code. Ditto for tb_rsa.c, tb_dsa.c, etc. These instantiations of |
| 122 | ENGINE_TABLE essentially provide linker-separation of the classes so that even |
| 123 | if ENGINEs implement *all* possible algorithms, an application using only |
| 124 | EVP_CIPHER code will link at most code relating to EVP_CIPHER, tb_cipher.c, core |
| 125 | ENGINE code that is independant of class, and of course the ENGINE |
| 126 | implementation that the application loaded. It will *not* however link any |
| 127 | class-specific ENGINE code for digests, RSA, etc nor will it bleed over into |
| 128 | other APIs, such as the RSA/DSA/etc library code. |
| 129 | |
| 130 | ENGINE_TABLE is a little more complicated than may seem necessary but this is |
| 131 | mostly to avoid a lot of "init()"-thrashing on ENGINEs (that may have to load |
| 132 | DSOs, and other expensive setup that shouldn't be thrashed unnecessarily) *and* |
| 133 | to duplicate "default" behaviour. Basically an ENGINE_TABLE instantiation, for |
| 134 | example tb_cipher.c, implements a hash-table keyed by integer "nid" values. |
| 135 | These nids provide the uniquenness of an algorithm/mode - and each nid will hash |
| 136 | to a potentially NULL "ENGINE_PILE". An ENGINE_PILE is essentially a list of |
| 137 | pointers to ENGINEs that implement that particular 'nid'. Each "pile" uses some |
| 138 | caching tricks such that requests on that 'nid' will be cached and all future |
| 139 | requests will return immediately (well, at least with minimal operation) unless |
| 140 | a change is made to the pile, eg. perhaps an ENGINE was unloaded. The reason is |
| 141 | that an application could have support for 10 ENGINEs statically linked |
| 142 | in, and the machine in question may not have any of the hardware those 10 |
| 143 | ENGINEs support. If each of those ENGINEs has a "des_cbc" implementation, we |
| 144 | want to avoid every EVP_CIPHER_CTX setup from trying (and failing) to initialise |
| 145 | each of those 10 ENGINEs. Instead, the first such request will try to do that |
| 146 | and will either return (and cache) a NULL ENGINE pointer or will return a |
| 147 | functional reference to the first that successfully initialised. In the latter |
| 148 | case it will also cache an extra functional reference to the ENGINE as a |
| 149 | "default" for that 'nid'. The caching is acknowledged by a 'uptodate' variable |
| 150 | that is unset only if un/registration takes place on that pile. Ie. if |
| 151 | implementations of "des_cbc" are added or removed. This behaviour can be |
| 152 | tweaked; the ENGINE_TABLE_FLAG_NOINIT value can be passed to |
| 153 | ENGINE_set_table_flags(), in which case the only ENGINEs that tb_cipher.c will |
| 154 | try to initialise from the "pile" will be those that are already initialised |
| 155 | (ie. it's simply an increment of the functional reference count, and no real |
| 156 | "initialisation" will take place). |
| 157 | |
| 158 | RSA, DSA, DH, and RAND all have their own ENGINE_TABLE code as well, and the |
| 159 | difference is that they all use an implicit 'nid' of 1. Whereas EVP_CIPHERs are |
| 160 | actually qualitatively different depending on 'nid' (the "des_cbc" EVP_CIPHER is |
| 161 | not an interoperable implementation of "aes_256_cbc"), RSA_METHODs are |
| 162 | necessarily interoperable and don't have different flavours, only different |
| 163 | implementations. In other words, the ENGINE_TABLE for RSA will either be empty, |
| 164 | or will have a single ENGING_PILE hashed to by the 'nid' 1 and that pile |
| 165 | represents ENGINEs that implement the single "type" of RSA there is. |
| 166 | |
| 167 | Cleanup - the registration and unregistration may pose questions about how |
| 168 | cleanup works with the ENGINE_PILE doing all this caching nonsense (ie. when the |
| 169 | application or EVP_CIPHER code releases its last reference to an ENGINE, the |
| 170 | ENGINE_PILE code may still have references and thus those ENGINEs will stay |
| 171 | hooked in forever). The way this is handled is via "unregistration". With these |
| 172 | new ENGINE changes, an abstract ENGINE can be loaded and initialised, but that |
| 173 | is an algorithm-agnostic process. Even if initialised, it will not have |
| 174 | registered any of its implementations (to do so would link all class "table" |
| 175 | code despite the fact the application may use only ciphers, for example). This |
| 176 | is deliberately a distinct step. Moreover, registration and unregistration has |
| 177 | nothing to do with whether an ENGINE is *functional* or not (ie. you can even |
| 178 | register an ENGINE and its implementations without it being operational, you may |
| 179 | not even have the drivers to make it operate). What actually happens with |
| 180 | respect to cleanup is managed inside eng_lib.c with the "engine_cleanup_***" |
| 181 | functions. These functions are internal-only and each part of ENGINE code that |
| 182 | could require cleanup will, upon performing its first allocation, register a |
| 183 | callback with the "engine_cleanup" code. The other part of this that makes it |
| 184 | tick is that the ENGINE_TABLE instantiations (tb_***.c) use NULL as their |
| 185 | initialised state. So if RSA code asks for an ENGINE and no ENGINE has |
| 186 | registered an implementation, the code will simply return NULL and the tb_rsa.c |
| 187 | state will be unchanged. Thus, no cleanup is required unless registration takes |
| 188 | place. ENGINE_cleanup() will simply iterate across a list of registered cleanup |
| 189 | callbacks calling each in turn, and will then internally delete its own storage |
| 190 | (a STACK). When a cleanup callback is next registered (eg. if the cleanup() is |
| 191 | part of a gracefull restart and the application wants to cleanup all state then |
| 192 | start again), the internal STACK storage will be freshly allocated. This is much |
| 193 | the same as the situation in the ENGINE_TABLE instantiations ... NULL is the |
| 194 | initialised state, so only modification operations (not queries) will cause that |
| 195 | code to have to register a cleanup. |
| 196 | |
| 197 | What else? The bignum callbacks and associated ENGINE functions have been |
| 198 | removed for two obvious reasons; (i) there was no way to generalise them to the |
| 199 | mechanism now used by RSA/DSA/..., because there's no such thing as a BIGNUM |
| 200 | method, and (ii) because of (i), there was no meaningful way for library or |
| 201 | application code to automatically hook and use ENGINE supplied bignum functions |
| 202 | anyway. Also, ENGINE_cpy() has been removed (although an internal-only version |
| 203 | exists) - the idea of providing an ENGINE_cpy() function probably wasn't a good |
| 204 | one and now certainly doesn't make sense in any generalised way. Some of the |
| 205 | RSA, DSA, DH, and RAND functions that were fiddled during the original ENGINE |
| 206 | changes have now, as a consequence, been reverted back. This is because the |
| 207 | hooking of ENGINE is now automatic (and passive, it can interally use a NULL |
| 208 | ENGINE pointer to simply ignore ENGINE from then on). |
| 209 | |
| 210 | Hell, that should be enough for now ... comments welcome: geoff@openssl.org |
| 211 | |