Blame - jni/libpcre/sources/doc/html/pcreposix.html - jami-client-android

blob: 6bd4b96e2be685e2cb506b0814ed577322a7ee91 [file] [log] [blame]

Tristan Matthews	0461646	2013-11-14 16:09:34 -0500	[diff] [blame]	1	<html>
				2	<head>
				3	<title>pcreposix specification</title>
				4	</head>
				5	<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
				6	<h1>pcreposix man page</h1>
				7	<p>
				8	Return to the <a href="index.html">PCRE index page</a>.
				9	</p>
				10	<p>
				11	This page is part of the PCRE HTML documentation. It was generated automatically
				12	from the original man page. If there is any nonsense in it, please consult the
				13	man page, in case the conversion went wrong.
				14	<br>
				15	<ul>
				16	<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
				17	<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
				18	<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
				19	<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
				20	<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
				21	<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
				22	<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
				23	<li><a name="TOC8" href="#SEC8">AUTHOR</a>
				24	<li><a name="TOC9" href="#SEC9">REVISION</a>
				25	</ul>
				26	<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
				27	<P>
				28	<b>#include <pcreposix.h></b>
				29	</P>
				30	<P>
				31	<b>int regcomp(regex_t <i>preg</i>, const char <i>pattern</i>,</b>
				32	<b>int <i>cflags</i>);</b>
				33	</P>
				34	<P>
				35	<b>int regexec(regex_t <i>preg</i>, const char <i>string</i>,</b>
				36	<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
				37	</P>
				38	<P>
				39	<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
				40	<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
				41	</P>
				42	<P>
				43	<b>void regfree(regex_t *<i>preg</i>);</b>
				44	</P>
				45	<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
				46	<P>
				47	This set of functions provides a POSIX-style API to the PCRE regular expression
				48	package. See the
				49	<a href="pcreapi.html"><b>pcreapi</b></a>
				50	documentation for a description of PCRE's native API, which contains much
				51	additional functionality.
				52	</P>
				53	<P>
				54	The functions described here are just wrapper functions that ultimately call
				55	the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
				56	header file, and on Unix systems the library itself is called
				57	<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
				58	command for linking an application that uses them. Because the POSIX functions
				59	call the native ones, it is also necessary to add <b>-lpcre</b>.
				60	</P>
				61	<P>
				62	I have implemented only those POSIX option bits that can be reasonably mapped
				63	to PCRE native options. In addition, the option REG_EXTENDED is defined with
				64	the value zero. This has no effect, but since programs that are written to the
				65	POSIX interface often use it, this makes it easier to slot in PCRE as a
				66	replacement library. Other POSIX options are not even defined.
				67	</P>
				68	<P>
				69	There are also some other options that are not defined by POSIX. These have
				70	been added at the request of users who want to make use of certain
				71	PCRE-specific features via the POSIX calling interface.
				72	</P>
				73	<P>
				74	When PCRE is called via these functions, it is only the API that is POSIX-like
				75	in style. The syntax and semantics of the regular expressions themselves are
				76	still those of Perl, subject to the setting of various PCRE options, as
				77	described below. "POSIX-like in style" means that the API approximates to the
				78	POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
				79	domains it is probably even less compatible.
				80	</P>
				81	<P>
				82	The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
				83	potential clash with other POSIX libraries. It can, of course, be renamed or
				84	aliased as <b>regex.h</b>, which is the "correct" name. It provides two
				85	structure types, <i>regex_t</i> for compiled internal forms, and
				86	<i>regmatch_t</i> for returning captured substrings. It also defines some
				87	constants whose names start with "REG_"; these are used for setting options and
				88	identifying error codes.
				89	</P>
				90	<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
				91	<P>
				92	The function <b>regcomp()</b> is called to compile a pattern into an
				93	internal form. The pattern is a C string terminated by a binary zero, and
				94	is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
				95	to a <b>regex_t</b> structure that is used as a base for storing information
				96	about the compiled regular expression.
				97	</P>
				98	<P>
				99	The argument <i>cflags</i> is either zero, or contains one or more of the bits
				100	defined by the following macros:
				101	<pre>
				102	REG_DOTALL
				103	</pre>
				104	The PCRE_DOTALL option is set when the regular expression is passed for
				105	compilation to the native function. Note that REG_DOTALL is not part of the
				106	POSIX standard.
				107	<pre>
				108	REG_ICASE
				109	</pre>
				110	The PCRE_CASELESS option is set when the regular expression is passed for
				111	compilation to the native function.
				112	<pre>
				113	REG_NEWLINE
				114	</pre>
				115	The PCRE_MULTILINE option is set when the regular expression is passed for
				116	compilation to the native function. Note that this does <i>not</i> mimic the
				117	defined POSIX behaviour for REG_NEWLINE (see the following section).
				118	<pre>
				119	REG_NOSUB
				120	</pre>
				121	The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
				122	for compilation to the native function. In addition, when a pattern that is
				123	compiled with this flag is passed to <b>regexec()</b> for matching, the
				124	<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
				125	are returned.
				126	<pre>
				127	REG_UCP
				128	</pre>
				129	The PCRE_UCP option is set when the regular expression is passed for
				130	compilation to the native function. This causes PCRE to use Unicode properties
				131	when matchine \d, \w, etc., instead of just recognizing ASCII values. Note
				132	that REG_UTF8 is not part of the POSIX standard.
				133	<pre>
				134	REG_UNGREEDY
				135	</pre>
				136	The PCRE_UNGREEDY option is set when the regular expression is passed for
				137	compilation to the native function. Note that REG_UNGREEDY is not part of the
				138	POSIX standard.
				139	<pre>
				140	REG_UTF8
				141	</pre>
				142	The PCRE_UTF8 option is set when the regular expression is passed for
				143	compilation to the native function. This causes the pattern itself and all data
				144	strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
				145	is not part of the POSIX standard.
				146	</P>
				147	<P>
				148	In the absence of these flags, no options are passed to the native function.
				149	This means the the regex is compiled with PCRE default semantics. In
				150	particular, the way it handles newline characters in the subject string is the
				151	Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
				152	<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
				153	newlines are matched by . (they are not) or by a negative class such as [^a]
				154	(they are).
				155	</P>
				156	<P>
				157	The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
				158	<i>preg</i> structure is filled in on success, and one member of the structure
				159	is public: <i>re_nsub</i> contains the number of capturing subpatterns in
				160	the regular expression. Various error codes are defined in the header file.
				161	</P>
				162	<P>
				163	NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
				164	use the contents of the <i>preg</i> structure. If, for example, you pass it to
				165	<b>regexec()</b>, the result is undefined and your program is likely to crash.
				166	</P>
				167	<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
				168	<P>
				169	This area is not simple, because POSIX and Perl take different views of things.
				170	It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
				171	intended to be a POSIX engine. The following table lists the different
				172	possibilities for matching newline characters in PCRE:
				173	<pre>
				174	Default Change with
				175
				176	. matches newline no PCRE_DOTALL
				177	newline matches [^a] yes not changeable
				178	$ matches \n at end yes PCRE_DOLLARENDONLY
				179	$ matches \n in middle no PCRE_MULTILINE
				180	^ matches \n in middle no PCRE_MULTILINE
				181	</pre>
				182	This is the equivalent table for POSIX:
				183	<pre>
				184	Default Change with
				185
				186	. matches newline yes REG_NEWLINE
				187	newline matches [^a] yes REG_NEWLINE
				188	$ matches \n at end no REG_NEWLINE
				189	$ matches \n in middle no REG_NEWLINE
				190	^ matches \n in middle no REG_NEWLINE
				191	</pre>
				192	PCRE's behaviour is the same as Perl's, except that there is no equivalent for
				193	PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
				194	newline from matching [^a].
				195	</P>
				196	<P>
				197	The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
				198	PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
				199	REG_NEWLINE action.
				200	</P>
				201	<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
				202	<P>
				203	The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
				204	against a given <i>string</i>, which is by default terminated by a zero byte
				205	(but see REG_STARTEND below), subject to the options in <i>eflags</i>. These can
				206	be:
				207	<pre>
				208	REG_NOTBOL
				209	</pre>
				210	The PCRE_NOTBOL option is set when calling the underlying PCRE matching
				211	function.
				212	<pre>
				213	REG_NOTEMPTY
				214	</pre>
				215	The PCRE_NOTEMPTY option is set when calling the underlying PCRE matching
				216	function. Note that REG_NOTEMPTY is not part of the POSIX standard. However,
				217	setting this option can give more POSIX-like behaviour in some situations.
				218	<pre>
				219	REG_NOTEOL
				220	</pre>
				221	The PCRE_NOTEOL option is set when calling the underlying PCRE matching
				222	function.
				223	<pre>
				224	REG_STARTEND
				225	</pre>
				226	The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and
				227	to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i>
				228	(there need not actually be a NUL at that location), regardless of the value of
				229	<i>nmatch</i>. This is a BSD extension, compatible with but not specified by
				230	IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
				231	intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does
				232	not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
				233	how it is matched.
				234	</P>
				235	<P>
				236	If the pattern was compiled with the REG_NOSUB flag, no data about any matched
				237	strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
				238	<b>regexec()</b> are ignored.
				239	</P>
				240	<P>
				241	If the value of <i>nmatch</i> is zero, or if the value <i>pmatch</i> is NULL,
				242	no data about any matched strings is returned.
				243	</P>
				244	<P>
				245	Otherwise,the portion of the string that was matched, and also any captured
				246	substrings, are returned via the <i>pmatch</i> argument, which points to an
				247	array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
				248	members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
				249	character of each substring and the offset to the first character after the end
				250	of each substring, respectively. The 0th element of the vector relates to the
				251	entire portion of <i>string</i> that was matched; subsequent elements relate to
				252	the capturing subpatterns of the regular expression. Unused entries in the
				253	array have both structure members set to -1.
				254	</P>
				255	<P>
				256	A successful match yields a zero return; various error codes are defined in the
				257	header file, of which REG_NOMATCH is the "expected" failure code.
				258	</P>
				259	<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
				260	<P>
				261	The <b>regerror()</b> function maps a non-zero errorcode from either
				262	<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
				263	NULL, the error should have arisen from the use of that structure. A message
				264	terminated by a binary zero is placed in <i>errbuf</i>. The length of the
				265	message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
				266	function is the size of buffer needed to hold the whole message.
				267	</P>
				268	<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
				269	<P>
				270	Compiling a regular expression causes memory to be allocated and associated
				271	with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
				272	memory, after which <i>preg</i> may no longer be used as a compiled expression.
				273	</P>
				274	<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
				275	<P>
				276	Philip Hazel
				277	<br>
				278	University Computing Service
				279	<br>
				280	Cambridge CB2 3QH, England.
				281	<br>
				282	</P>
				283	<br><a name="SEC9" href="#TOC1">REVISION</a><br>
				284	<P>
				285	Last updated: 16 May 2010
				286	<br>
				287	Copyright © 1997-2010 University of Cambridge.
				288	<br>
				289	<p>
				290	Return to the <a href="index.html">PCRE index page</a>.
				291	</p>