Tristan Matthews | 0a329cc | 2013-07-17 13:20:14 -0400 | [diff] [blame] | 1 | /* $Id$ */ |
| 2 | /* |
| 3 | * Copyright (C) 2008-2011 Teluu Inc. (http://www.teluu.com) |
| 4 | * |
| 5 | * This program is free software; you can redistribute it and/or modify |
| 6 | * it under the terms of the GNU General Public License as published by |
| 7 | * the Free Software Foundation; either version 2 of the License, or |
| 8 | * (at your option) any later version. |
| 9 | * |
| 10 | * This program is distributed in the hope that it will be useful, |
| 11 | * but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 12 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 13 | * GNU General Public License for more details. |
| 14 | * |
| 15 | * You should have received a copy of the GNU General Public License |
| 16 | * along with this program; if not, write to the Free Software |
| 17 | * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |
| 18 | */ |
| 19 | |
| 20 | |
| 21 | /** |
| 22 | |
| 23 | @defgroup nat_intro Introduction to Network Address Translation (NAT) and NAT Traversal |
| 24 | @brief This page describes NAT and the problems caused by it and the solutions |
| 25 | |
| 26 | |
| 27 | |
| 28 | \section into Introduction to NAT |
| 29 | |
| 30 | |
| 31 | NAT (Network Address Translation) is a mechanism where a device performs |
| 32 | modifications to the TCP/IP address/port number of a packet and maps the |
| 33 | IP address from one realm to another (usually from private IP address to |
| 34 | public IP address and vice versa). This works by the NAT device allocating |
| 35 | a temporary port number on the public side of the NAT upon forwarding |
| 36 | outbound packet from the internal host towards the Internet, maintaining |
| 37 | this mapping for some predefined time, and forwarding the inbound packets |
| 38 | received from the Internet on this public port back to the internal host. |
| 39 | |
| 40 | |
| 41 | NAT devices are installed primarily to alleviate the exhaustion of IPv4 |
| 42 | address space by allowing multiple hosts to share a public/Internet address. |
| 43 | Also due to its mapping nature (i.e. a mapping can only be created by |
| 44 | a transmission from an internal host), NAT device is preferred to be |
| 45 | installed even when IPv4 address exhaustion is not a problem (for example |
| 46 | when there is only one host at home), to provide some sort of security/shield |
| 47 | for the internal hosts against threats from the Internet. |
| 48 | |
| 49 | |
| 50 | Despite the fact that NAT provides some shields for the internal network, |
| 51 | one must distinguish NAT solution from firewall solution. NAT is not |
| 52 | a firewall solution. A firewall is a security solution designed to enforce |
| 53 | the security policy of an organization, while NAT is a connectivity solution |
| 54 | to allow multiple hosts to use a single public IP address. Understandably |
| 55 | both functionalities are difficult to separate at times, since many |
| 56 | (typically consumer) products claims to do both with the same device and |
| 57 | simply label the device a NAT box. But we do want to make this distinction |
| 58 | rather clear, as PJNATH is a NAT traversal helper and not a firewall bypass |
| 59 | solution (yet). |
| 60 | |
| 61 | |
| 62 | |
| 63 | \section problems The NAT traversal problems |
| 64 | |
| 65 | |
| 66 | While NAT would work well for typical client server communications (such as |
| 67 | web and email), since it's always the client that initiates the conversation |
| 68 | and normally client doesn't need to maintain the connection for a long time, |
| 69 | installation of NAT would cause major problem for peer-to-peer communication, |
| 70 | such as (and especially) VoIP. These problems will be explained in more detail |
| 71 | below. |
| 72 | |
| 73 | |
| 74 | \subsection peer_addr Peer address problem |
| 75 | |
| 76 | |
| 77 | In VoIP, normally we want the media (audio, and video) to flow directly |
| 78 | between the clients, since relaying is costly (both in terms of bandwidth |
| 79 | cost for service provider, and additional latency introduced by relaying). |
| 80 | To do this, each client informs its media transport address to the other |
| 81 | client , by sending it via the VoIP signaling path, and the other side would |
| 82 | send its media to this transport address. |
| 83 | |
| 84 | |
| 85 | And there lies the problem. If the client software is not NAT aware, then |
| 86 | it would send its private IP address to the other client, and the other |
| 87 | client would not be able to send media to this address. |
| 88 | |
| 89 | |
| 90 | Traditionally this was solved by using STUN. With this mechanism, the client |
| 91 | first finds out its public IP address/port by querying a STUN server, then |
| 92 | send sthis public address instead of its private address to the other |
| 93 | client. When both sides are using this mechanism, they can then send media |
| 94 | packets to these addresses, thereby creating a mapping in the NAT (also |
| 95 | called opening a "hole", hence this mechanism is also popularly called |
| 96 | "hole punching") and both can then communicate with each other. |
| 97 | |
| 98 | |
| 99 | But this mechanism does not work in all cases, as will be explained below. |
| 100 | |
| 101 | |
| 102 | |
| 103 | \subsection hairpin Hairpinning behavior |
| 104 | |
| 105 | |
| 106 | Hairpin is a behavior where a NAT device forwards packets from a host in |
| 107 | internal network (lets call it host A) back to some other host (host B) in |
| 108 | the same internal network, when it detects that the (public IP address) |
| 109 | destination of the packet is actually a mapped IP address that was created |
| 110 | for the internal host (host B). This is a desirable behavior of a NAT, |
| 111 | but unfortunately not all NAT devices support this. |
| 112 | |
| 113 | |
| 114 | Lacking this behavior, two (internal) hosts behind the same NAT will not |
| 115 | be able to communicate with each other if they exchange their public |
| 116 | addresses (resolved by STUN above) to each other. |
| 117 | |
| 118 | |
| 119 | |
| 120 | \subsection symmetric Symmetric behavior |
| 121 | |
| 122 | |
| 123 | NAT devices don't behave uniformly and people have been trying to classify |
| 124 | their behavior into different classes. Traditionally NAT devices are |
| 125 | classified into Full Cone, Restricted Cone, Port Restricted Cone, and |
| 126 | Symmetric types, according to <A HREF="http://www.ietf.org/rfc/rfc3489.txt">RFC 3489</A> |
| 127 | section 5. A more recent method of classification, as explained by |
| 128 | <A HREF="http://www.ietf.org/rfc/rfc4787.txt">RFC 4787</A>, divides |
| 129 | the NAT behavioral types into two attributes: the mapping behavior |
| 130 | attribute and the filtering behavior attribute. Each attribute can be |
| 131 | one of three types: <i>Endpoint-Independent</i>, <i>Address-Dependent</i>, |
| 132 | or <i>Address and Port-Dependent</i>. With this new classification method, |
| 133 | a Symmetric NAT actually is an Address and Port-Dependent mapping NAT. |
| 134 | |
| 135 | |
| 136 | Among these types, the Symmetric type is the hardest one to work with. |
| 137 | The problem is because the NAT allocates different mapping (of the same |
| 138 | internal host) for the communication to the STUN server and the |
| 139 | communication to the other (external) hosts, so the IP address/port that |
| 140 | is informed by one host to the other is meaningless for the recipient |
| 141 | since this is not the actual IP address/port mapping that the NAT device |
| 142 | creates. The result is when the recipient host tries to send a packet to |
| 143 | this address, the NAT device would drop the packet since it does not |
| 144 | recognize the sender of the packet as the "authorized" hosts to send |
| 145 | to this address. |
| 146 | |
| 147 | |
| 148 | There are two solutions for this. The first, we could make the client |
| 149 | smarter by switching transmission of the media to the source address of |
| 150 | the media packets. This would work since normally clients uses a well |
| 151 | known trick called symmetric RTP, where they use one socket for both |
| 152 | transmitting and receiving RTP/media packets. We also use this |
| 153 | mechanism in PJMEDIA media transport. But this solution only works |
| 154 | if a client behind a symmetric NAT is not communicating with other |
| 155 | client behind either symmetric NAT or port-restricted NAT. |
| 156 | |
| 157 | |
| 158 | The second solution is to use media relay, but as have been mentioned |
| 159 | above, relaying is costly, both in terms of bandwidth cost for service |
| 160 | provider and additional latency introduced by relaying. |
| 161 | |
| 162 | |
| 163 | |
| 164 | \subsection binding_timeout Binding timeout |
| 165 | |
| 166 | When a NAT device creates a binding (a public-private IP address |
| 167 | mapping), it will associate a timer with it. The timer is used to |
| 168 | destroy the binding once there is no activity/traffic associated with |
| 169 | the binding. Because of this, a NAT aware application that wishes to |
| 170 | keep the binding open must periodically send outbound packets, |
| 171 | a mechanism known as keep-alive, or otherwise it will ultimately |
| 172 | loose the binding and unable to receive incoming packets from Internet. |
| 173 | |
| 174 | |
| 175 | \section solutions The NAT traversal solutions |
| 176 | |
| 177 | |
| 178 | \subsection stun Old STUN (RFC 3489) |
| 179 | |
| 180 | The original STUN (Simple Traversal of User Datagram Protocol (UDP) |
| 181 | Through Network Address Translators (NATs)) as defined by |
| 182 | <A HREF="http://www.ietf.org/rfc/rfc3489.txt">RFC 3489</A> |
| 183 | (published in 2003, but the work was started as early as 2001) was |
| 184 | meant to be a standalone, standard-based solution for the NAT |
| 185 | connectivity problems above. It is equipped with NAT type detection |
| 186 | algoritm and methods to hole-punch the NAT in order to let traffic |
| 187 | to get through and has been proven to be quite successful in |
| 188 | traversing many types of NATs, hence it has gained a lot of popularity |
| 189 | as a simple and effective NAT traversal solution. |
| 190 | |
| 191 | But since then the smart people at IETF has realized that STUN alone |
| 192 | is not going to be enough. Besides its nature that STUN solution cannot |
| 193 | solve the symmetric-to-symmetric or port-restricted connection, |
| 194 | people have also discovered that NAT behavior can change for different |
| 195 | traffic (or for the same traffic overtime) hence it was concluded that |
| 196 | NAT type detection could produce unreliable results hence one should not |
| 197 | rely too much on it. |
| 198 | |
| 199 | Because of this, STUN has since moved its efforts to different strategy. |
| 200 | Instead of attempting to provide a standalone solution, it's now providing |
| 201 | a part solution and framework to build other (STUN based) protocols |
| 202 | on top of it, such as TURN and ICE. |
| 203 | |
| 204 | |
| 205 | \subsection stunbis STUN/STUNbis (RFC 5389) |
| 206 | |
| 207 | The Session Traversal Utilities for NAT (STUN) is the further development |
| 208 | of the old STUN. While it still provides a mechanism for a client to |
| 209 | query its public/mapped address to a STUN server, it has deprecated |
| 210 | the use of NAT type detection, and now it serves as a framework to build |
| 211 | other protocols on top of it (such as TURN and ICE). |
| 212 | |
| 213 | |
| 214 | \subsection midcom_turn Old TURN (draft-rosenberg-midcom-turn) |
| 215 | |
| 216 | Traversal Using Relay NAT (TURN), a standard-based effort started as early |
| 217 | as in November 2001, was meant to be the complementary method for the |
| 218 | (old) STUN to complete the solution. The original idea was the host to use |
| 219 | STUN to detect the NAT type, and when it has found that the NAT type is |
| 220 | symmetric it would use TURN to relay the traffic. But as stated above, |
| 221 | this approach was deemed to be unreliable, and now the prefered way to use |
| 222 | TURN (and it's a new TURN specification as well) is to combine it with ICE. |
| 223 | |
| 224 | |
| 225 | \subsection turn TURN (draft-ietf-behave-turn) |
| 226 | |
| 227 | Traversal Using Relays around NAT (TURN) is the latest development of TURN. |
| 228 | While the protocol details have changed a lot, the objective is still |
| 229 | the same, that is to provide relaying control for the application. |
| 230 | As mentioned above, preferably TURN should be used with ICE since relaying |
| 231 | is costly in terms of both bandwidth and latency, hence it should be used |
| 232 | as the last resort. |
| 233 | |
| 234 | |
| 235 | \subsection b2bua B2BUA approach |
| 236 | |
| 237 | A SIP Back to Back User Agents (B2BUA) is a SIP entity that sits in the |
| 238 | middle of SIP traffic and acts as SIP user agents on both call legs. |
| 239 | The primary motivations to have a B2BUA are to be able to provision |
| 240 | the call (e.g. billing, enforcing policy) and to help with NAT traversal |
| 241 | for the clients. Normally a B2BUA would be equipped with media relaying |
| 242 | or otherwise it wouldn't be very useful. |
| 243 | |
| 244 | Products that fall into this category include SIP Session Border |
| 245 | Controllers (SBC), and PBXs such as Asterisk are technically a B2BUA |
| 246 | as well. |
| 247 | |
| 248 | The benefit of B2BUA with regard to helping NAT traversal is it does not |
| 249 | require any modifications to the client to make it go through NATs. |
| 250 | And since basically it is a relay, it should be able to traverse |
| 251 | symmetric NAT successfully. |
| 252 | |
| 253 | However, since it is a relay, the usual relaying drawbacks apply, |
| 254 | namely the bandwidth and latency issue. More over, since a B2BUA acts |
| 255 | as user agent in either call-legs (i.e. it terminates the SIP |
| 256 | signaling/call on one leg, albeit it creates another call on the other |
| 257 | leg), it may also introduce serious issues with end-to-end SIP signaling. |
| 258 | |
| 259 | |
| 260 | \subsection alg ALG approach |
| 261 | |
| 262 | Nowdays many NAT devices (such as consumer ADSL routers) are equipped |
| 263 | with intelligence to inspect and fix VoIP traffic in its effort to help |
| 264 | it with the NAT traversal. This feature is called Application Layer |
| 265 | Gateway (ALG) intelligence. The idea is since the NAT device knows about |
| 266 | the mapping, it might as well try to fix the application traffic so that |
| 267 | the traffic could better traverse the NAT. Some tricks that are |
| 268 | performed include for example replacing the private IP addresses/ports |
| 269 | in the SIP/SDP packet with the mapped public address/port of the host |
| 270 | that sends the packet. |
| 271 | |
| 272 | Despite many claims about its usefullness, in reality this has given us |
| 273 | more problems than the fix. Too many devices such as these break the |
| 274 | SIP signaling, and in more advanced case, ICE negotiation. Some |
| 275 | examples of bad situations that we have encountered in the past: |
| 276 | |
| 277 | - NAT device alters the Via address/port fields in the SIP response |
| 278 | message, making the response fail to pass SIP response verification |
| 279 | as defined by SIP RFC. |
| 280 | - In other case, the modifications in the Via headers of the SIP |
| 281 | response hides the important information from the SIP server, |
| 282 | nameny the actual IP address/port of the client as seen by the SIP |
| 283 | server. |
| 284 | - Modifications in the Contact URI of REGISTER request/response makes |
| 285 | the client unable to detect it's registered binding. |
| 286 | - Modifications in the IP addresses/ports in SDP causes ICE |
| 287 | negotiation to fail with ice-mismatch status. |
| 288 | - The complexity of the ALG processing in itself seems to have caused |
| 289 | the device to behave erraticly with managing the address bindings |
| 290 | (e.g. it creates a new binding for the second packet sent by the |
| 291 | client, even when the previous packet was sent just second ago, or |
| 292 | it just sends inbound packet to the wrong host). |
| 293 | |
| 294 | |
| 295 | Many man-months efforts have been spent just to troubleshoot issues |
| 296 | caused by these ALG (mal)functioning, and as it adds complexity to |
| 297 | the problem rather than solving it, in general we do not like this |
| 298 | approach at all and would prefer it to go away. |
| 299 | |
| 300 | |
| 301 | \subsection upnp UPnP |
| 302 | |
| 303 | The Universal Plug and Play (UPnP) is a set of protocol specifications |
| 304 | to control network appliances and one of its specification is to |
| 305 | control NAT device. With this protocol, a client can instruct the |
| 306 | NAT device to open a port in the NAT's public side and use this port |
| 307 | for its communication. UPnP has gained popularity due to its |
| 308 | simplicity, and one can expect it to be available on majority of |
| 309 | NAT devices. |
| 310 | |
| 311 | The drawback of UPnP is since it uses multicast in its communication, |
| 312 | it will only allow client to control one NAT device that is in the |
| 313 | same multicast domain. While this normally is not a problem in |
| 314 | household installations (where people normally only have one NAT |
| 315 | router), it will not work if the client is behind cascaded routers |
| 316 | installation. More over uPnP has serious issues with security due to |
| 317 | its lack of authentication, it's probably not the prefered solution |
| 318 | for organizations. |
| 319 | |
| 320 | \subsection other Other solutions |
| 321 | |
| 322 | Other solutions to NAT traversal includes: |
| 323 | |
| 324 | - SOCKS, which supports UDP protocol since SOCKS5. |
| 325 | |
| 326 | |
| 327 | |
| 328 | \section ice ICE Solution - The Protocol that Works Harder |
| 329 | |
| 330 | A new protocol is being standardized (it's in Work Group Last Call/WGLC |
| 331 | stage at the time this article was written) by the IETF, called |
| 332 | Interactive Connectivity Establishment (ICE). ICE is the ultimate |
| 333 | weapon a client can have in its NAT traversal solution arsenals, |
| 334 | as it promises that if there is indeed one path for two clients |
| 335 | to communicate, then ICE will find this path. And if there are |
| 336 | more than one paths which the clients can communicate, ICE will |
| 337 | use the best/most efficient one. |
| 338 | |
| 339 | ICE works by combining several protocols (such as STUN and TURN) |
| 340 | altogether and offering several candidate paths for the communication, |
| 341 | thereby maximising the chance of success, but at the same time also |
| 342 | has the capability to prioritize the candidates, so that the more |
| 343 | expensive alternative (namely relay) will only be used as the last |
| 344 | resort when else fails. ICE negotiation process involves several |
| 345 | stages: |
| 346 | |
| 347 | - candidate gathering, where the client finds out all the possible |
| 348 | addresses that it can use for the communication. It may find |
| 349 | three types of candidates: host candidate to represent its |
| 350 | physical NICs, server reflexive candidate for the address that |
| 351 | has been resolved from STUN, and relay candidate for the address |
| 352 | that the client has allocated from a TURN relay. |
| 353 | - prioritizing these candidates. Typically the relay candidate will |
| 354 | have the lowest priority to use since it's the most expensive. |
| 355 | - encoding these candidates, sending it to remote peer, and |
| 356 | negotiating it with offer-answer. |
| 357 | - pairing the candidates, where it pairs every local candidates |
| 358 | with every remote candidates that it receives from the remote peer. |
| 359 | - checking the connectivity for each candidate pairs. |
| 360 | - concluding the result. Since every possible path combinations are |
| 361 | checked, if there is a path to communicate ICE will find it. |
| 362 | |
| 363 | |
| 364 | There are many benetifs of ICE: |
| 365 | |
| 366 | - it's standard based. |
| 367 | - it works where STUN works (and more) |
| 368 | - unlike standalone STUN solution, it solves the hairpinning issue, |
| 369 | since it also offers host candidates. |
| 370 | - just as relaying solutions, it works with symmetric NATs. But unlike |
| 371 | plain relaying, relay is only used as the last resort, thereby |
| 372 | minimizing the bandwidth and latency issue of relaying. |
| 373 | - it offers a generic framework for offering and checking address |
| 374 | candidates. While the ICE core standard only talks about using STUN |
| 375 | and TURN, implementors can add more types of candidates in the ICE |
| 376 | offer, for example UDP over TCP or HTTP relays, or even uPnP |
| 377 | candidates, and this could be done transparently for the remote |
| 378 | peer hence it's compatible and usable even when the remote peer |
| 379 | does not support these. |
| 380 | - it also adds some kind of security particularly against DoS attacks, |
| 381 | since media address must be acknowledged before it can be used. |
| 382 | |
| 383 | |
| 384 | Having said that, ICE is a complex protocol to implement, making |
| 385 | interoperability an issue, and at this time of writing we don't see |
| 386 | many implementations of it yet. Fortunately, PJNATH has been one of |
| 387 | the first hence more mature ICE implementation, being first released |
| 388 | on mid-2007, and we have been testing our implementation at |
| 389 | <A HREF="http://www.sipit.net">SIP Interoperability Test (SIPit)</A> |
| 390 | events regularly, so hopefully we are one of the most stable as well. |
| 391 | |
| 392 | |
| 393 | \section pjnath PJNATH - The building blocks for effective NAT traversal solution |
| 394 | |
| 395 | PJSIP NAT Helper (PJNATH) is a library which contains the implementation |
| 396 | of standard based NAT traversal solutions. PJNATH can be used as a |
| 397 | stand-alone library for your software, or you may use PJSUA-LIB library, |
| 398 | a very high level library integrating PJSIP, PJMEDIA, and PJNATH into |
| 399 | simple to use APIs. |
| 400 | |
| 401 | PJNATH has the following features: |
| 402 | |
| 403 | - STUNbis implementation, providing both ready to use STUN-aware socket |
| 404 | and framework to implement higher level STUN based protocols such as |
| 405 | TURN and ICE. |
| 406 | - NAT type detection, useful for troubleshooting purposes. |
| 407 | - TURN implementation. |
| 408 | - ICE implementation. |
| 409 | |
| 410 | |
| 411 | More protocols will be implemented in the future. |
| 412 | |
| 413 | Go back to \ref index. |
| 414 | |
| 415 | */ |