Blame - jni/libpcre/sources/doc/html/pcrecpp.html - jami-client-android

blob: 0ef2d4f3ee69e9b64ca919fbcac163b0d5b34b29 [file] [log] [blame]

Tristan Matthews	0461646	2013-11-14 16:09:34 -0500	[diff] [blame]	1	<html>
				2	<head>
				3	<title>pcrecpp specification</title>
				4	</head>
				5	<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
				6	<h1>pcrecpp man page</h1>
				7	<p>
				8	Return to the <a href="index.html">PCRE index page</a>.
				9	</p>
				10	<p>
				11	This page is part of the PCRE HTML documentation. It was generated automatically
				12	from the original man page. If there is any nonsense in it, please consult the
				13	man page, in case the conversion went wrong.
				14	<br>
				15	<ul>
				16	<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
				17	<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
				18	<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
				19	<li><a name="TOC4" href="#SEC4">QUOTING METACHARACTERS</a>
				20	<li><a name="TOC5" href="#SEC5">PARTIAL MATCHES</a>
				21	<li><a name="TOC6" href="#SEC6">UTF-8 AND THE MATCHING INTERFACE</a>
				22	<li><a name="TOC7" href="#SEC7">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
				23	<li><a name="TOC8" href="#SEC8">SCANNING TEXT INCREMENTALLY</a>
				24	<li><a name="TOC9" href="#SEC9">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
				25	<li><a name="TOC10" href="#SEC10">REPLACING PARTS OF STRINGS</a>
				26	<li><a name="TOC11" href="#SEC11">AUTHOR</a>
				27	<li><a name="TOC12" href="#SEC12">REVISION</a>
				28	</ul>
				29	<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
				30	<P>
				31	<b>#include <pcrecpp.h></b>
				32	</P>
				33	<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
				34	<P>
				35	The C++ wrapper for PCRE was provided by Google Inc. Some additional
				36	functionality was added by Giuseppe Maxia. This brief man page was constructed
				37	from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
				38	further details.
				39	</P>
				40	<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
				41	<P>
				42	The "FullMatch" operation checks that supplied text matches a supplied pattern
				43	exactly. If pointer arguments are supplied, it copies matched sub-strings that
				44	match sub-patterns into them.
				45	<pre>
				46	Example: successful match
				47	pcrecpp::RE re("h.*o");
				48	re.FullMatch("hello");
				49
				50	Example: unsuccessful match (requires full match):
				51	pcrecpp::RE re("e");
				52	!re.FullMatch("hello");
				53
				54	Example: creating a temporary RE object:
				55	pcrecpp::RE("h.*o").FullMatch("hello");
				56	</pre>
				57	You can pass in a "const char*" or a "string" for "text". The examples below
				58	tend to use a const char*. You can, as in the different examples above, store
				59	the RE object explicitly in a variable or use a temporary RE object. The
				60	examples below use one mode or the other arbitrarily. Either could correctly be
				61	used for any of these examples.
				62	</P>
				63	<P>
				64	You must supply extra pointer arguments to extract matched subpieces.
				65	<pre>
				66	Example: extracts "ruby" into "s" and 1234 into "i"
				67	int i;
				68	string s;
				69	pcrecpp::RE re("(\\w+):(\\d+)");
				70	re.FullMatch("ruby:1234", &s, &i);
				71
				72	Example: does not try to extract any extra sub-patterns
				73	re.FullMatch("ruby:1234", &s);
				74
				75	Example: does not try to extract into NULL
				76	re.FullMatch("ruby:1234", NULL, &i);
				77
				78	Example: integer overflow causes failure
				79	!re.FullMatch("ruby:1234567891234", NULL, &i);
				80
				81	Example: fails because there aren't enough sub-patterns:
				82	!pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
				83
				84	Example: fails because string cannot be stored in integer
				85	!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
				86	</pre>
				87	The provided pointer arguments can be pointers to any scalar numeric
				88	type, or one of:
				89	<pre>
				90	string (matched piece is copied to string)
				91	StringPiece (StringPiece is mutated to point to matched piece)
				92	T (where "bool T::ParseFrom(const char*, int)" exists)
				93	NULL (the corresponding matched sub-pattern is not copied)
				94	</pre>
				95	The function returns true iff all of the following conditions are satisfied:
				96	<pre>
				97	a. "text" matches "pattern" exactly;
				98
				99	b. The number of matched sub-patterns is >= number of supplied
				100	pointers;
				101
				102	c. The "i"th argument has a suitable type for holding the
				103	string captured as the "i"th sub-pattern. If you pass in
				104	void * NULL for the "i"th argument, or a non-void * NULL
				105	of the correct type, or pass fewer arguments than the
				106	number of sub-patterns, "i"th captured sub-pattern is
				107	ignored.
				108	</pre>
				109	CAVEAT: An optional sub-pattern that does not exist in the matched
				110	string is assigned the empty string. Therefore, the following will
				111	return false (because the empty string is not a valid number):
				112	<pre>
				113	int number;
				114	pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
				115	</pre>
				116	The matching interface supports at most 16 arguments per call.
				117	If you need more, consider using the more general interface
				118	<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
				119	<b>DoMatch</b>.
				120	</P>
				121	<P>
				122	NOTE: Do not use <b>no_arg</b>, which is used internally to mark the end of a
				123	list of optional arguments, as a placeholder for missing arguments, as this can
				124	lead to segfaults.
				125	</P>
				126	<br><a name="SEC4" href="#TOC1">QUOTING METACHARACTERS</a><br>
				127	<P>
				128	You can use the "QuoteMeta" operation to insert backslashes before all
				129	potentially meaningful characters in a string. The returned string, used as a
				130	regular expression, will exactly match the original string.
				131	<pre>
				132	Example:
				133	string quoted = RE::QuoteMeta(unquoted);
				134	</pre>
				135	Note that it's legal to escape a character even if it has no special meaning in
				136	a regular expression -- so this function does that. (This also makes it
				137	identical to the perl function of the same name; see "perldoc -f quotemeta".)
				138	For example, "1.5-2.0?" becomes "1\.5\-2\.0\?".
				139	</P>
				140	<br><a name="SEC5" href="#TOC1">PARTIAL MATCHES</a><br>
				141	<P>
				142	You can use the "PartialMatch" operation when you want the pattern
				143	to match any substring of the text.
				144	<pre>
				145	Example: simple search for a string:
				146	pcrecpp::RE("ell").PartialMatch("hello");
				147
				148	Example: find first number in a string:
				149	int number;
				150	pcrecpp::RE re("(\\d+)");
				151	re.PartialMatch("x*100 + 20", &number);
				152	assert(number == 100);
				153	</PRE>
				154	</P>
				155	<br><a name="SEC6" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
				156	<P>
				157	By default, pattern and text are plain text, one byte per character. The UTF8
				158	flag, passed to the constructor, causes both pattern and string to be treated
				159	as UTF-8 text, still a byte stream but potentially multiple bytes per
				160	character. In practice, the text is likelier to be UTF-8 than the pattern, but
				161	the match returned may depend on the UTF8 flag, so always use it when matching
				162	UTF8 text. For example, "." will match one byte normally but with UTF8 set may
				163	match up to three bytes of a multi-byte character.
				164	<pre>
				165	Example:
				166	pcrecpp::RE_Options options;
				167	options.set_utf8();
				168	pcrecpp::RE re(utf8_pattern, options);
				169	re.FullMatch(utf8_string);
				170
				171	Example: using the convenience function UTF8():
				172	pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
				173	re.FullMatch(utf8_string);
				174	</pre>
				175	NOTE: The UTF8 flag is ignored if pcre was not configured with the
				176	<pre>
				177	--enable-utf8 flag.
				178	</PRE>
				179	</P>
				180	<br><a name="SEC7" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
				181	<P>
				182	PCRE defines some modifiers to change the behavior of the regular expression
				183	engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
				184	pass such modifiers to a RE class. Currently, the following modifiers are
				185	supported:
				186	<pre>
				187	modifier description Perl corresponding
				188
				189	PCRE_CASELESS case insensitive match /i
				190	PCRE_MULTILINE multiple lines match /m
				191	PCRE_DOTALL dot matches newlines /s
				192	PCRE_DOLLAR_ENDONLY $ matches only at end N/A
				193	PCRE_EXTRA strict escape parsing N/A
				194	PCRE_EXTENDED ignore whitespaces /x
				195	PCRE_UTF8 handles UTF8 chars built-in
				196	PCRE_UNGREEDY reverses * and *? N/A
				197	PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
				198	</pre>
				199	(*) Both Perl and PCRE allow non capturing parentheses by means of the
				200	"?:" modifier within the pattern itself. e.g. (?:ab\|cd) does not
				201	capture, while (ab\|cd) does.
				202	</P>
				203	<P>
				204	For a full account on how each modifier works, please check the
				205	PCRE API reference page.
				206	</P>
				207	<P>
				208	For each modifier, there are two member functions whose name is made
				209	out of the modifier in lowercase, without the "PCRE_" prefix. For
				210	instance, PCRE_CASELESS is handled by
				211	<pre>
				212	bool caseless()
				213	</pre>
				214	which returns true if the modifier is set, and
				215	<pre>
				216	RE_Options & set_caseless(bool)
				217	</pre>
				218	which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
				219	accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
				220	functions. Setting <i>match_limit</i> to a non-zero value will limit the
				221	execution of pcre to keep it from doing bad things like blowing the stack or
				222	taking an eternity to return a result. A value of 5000 is good enough to stop
				223	stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
				224	match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
				225	which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
				226	recurses. <b>match_limit()</b> limits the number of matches PCRE does;
				227	<b>match_limit_recursion()</b> limits the depth of internal recursion, and
				228	therefore the amount of stack that is used.
				229	</P>
				230	<P>
				231	Normally, to pass one or more modifiers to a RE class, you declare
				232	a <i>RE_Options</i> object, set the appropriate options, and pass this
				233	object to a RE constructor. Example:
				234	<pre>
				235	RE_Options opt;
				236	opt.set_caseless(true);
				237	if (RE("HELLO", opt).PartialMatch("hello world")) ...
				238	</pre>
				239	RE_options has two constructors. The default constructor takes no arguments and
				240	creates a set of flags that are off by default. The optional parameter
				241	<i>option_flags</i> is to facilitate transfer of legacy code from C programs.
				242	This lets you do
				243	<pre>
				244	RE(pattern,
				245	RE_Options(PCRE_CASELESS\|PCRE_MULTILINE)).PartialMatch(str);
				246	</pre>
				247	However, new code is better off doing
				248	<pre>
				249	RE(pattern,
				250	RE_Options().set_caseless(true).set_multiline(true))
				251	.PartialMatch(str);
				252	</pre>
				253	If you are going to pass one of the most used modifiers, there are some
				254	convenience functions that return a RE_Options class with the
				255	appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
				256	<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
				257	</P>
				258	<P>
				259	If you need to set several options at once, and you don't want to go through
				260	the pains of declaring a RE_Options object and setting several options, there
				261	is a parallel method that give you such ability on the fly. You can concatenate
				262	several <b>set_xxxxx()</b> member functions, since each of them returns a
				263	reference to its class object. For example, to pass PCRE_CASELESS,
				264	PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
				265	<pre>
				266	RE(" ^ xyz \\s+ .* blah$",
				267	RE_Options()
				268	.set_caseless(true)
				269	.set_extended(true)
				270	.set_multiline(true)).PartialMatch(sometext);
				271
				272	</PRE>
				273	</P>
				274	<br><a name="SEC8" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
				275	<P>
				276	The "Consume" operation may be useful if you want to repeatedly
				277	match regular expressions at the front of a string and skip over
				278	them as they match. This requires use of the "StringPiece" type,
				279	which represents a sub-range of a real string. Like RE, StringPiece
				280	is defined in the pcrecpp namespace.
				281	<pre>
				282	Example: read lines of the form "var = value" from a string.
				283	string contents = ...; // Fill string somehow
				284	pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
				285
				286	string var;
				287	int value;
				288	pcrecpp::RE re("(\\w+) = (\\d+)\n");
				289	while (re.Consume(&input, &var, &value)) {
				290	...;
				291	}
				292	</pre>
				293	Each successful call to "Consume" will set "var/value", and also
				294	advance "input" so it points past the matched text.
				295	</P>
				296	<P>
				297	The "FindAndConsume" operation is similar to "Consume" but does not
				298	anchor your match at the beginning of the string. For example, you
				299	could extract all words from a string by repeatedly calling
				300	<pre>
				301	pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
				302	</PRE>
				303	</P>
				304	<br><a name="SEC9" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
				305	<P>
				306	By default, if you pass a pointer to a numeric value, the
				307	corresponding text is interpreted as a base-10 number. You can
				308	instead wrap the pointer with a call to one of the operators Hex(),
				309	Octal(), or CRadix() to interpret the text in another base. The
				310	CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
				311	prefixes, but defaults to base-10.
				312	<pre>
				313	Example:
				314	int a, b, c, d;
				315	pcrecpp::RE re("(.) (.) (.) (.)");
				316	re.FullMatch("100 40 0100 0x40",
				317	pcrecpp::Octal(&a), pcrecpp::Hex(&b),
				318	pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
				319	</pre>
				320	will leave 64 in a, b, c, and d.
				321	</P>
				322	<br><a name="SEC10" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
				323	<P>
				324	You can replace the first match of "pattern" in "str" with "rewrite".
				325	Within "rewrite", backslash-escaped digits (\1 to \9) can be
				326	used to insert text matching corresponding parenthesized group
				327	from the pattern. \0 in "rewrite" refers to the entire matching
				328	text. For example:
				329	<pre>
				330	string s = "yabba dabba doo";
				331	pcrecpp::RE("b+").Replace("d", &s);
				332	</pre>
				333	will leave "s" containing "yada dabba doo". The result is true if the pattern
				334	matches and a replacement occurs, false otherwise.
				335	</P>
				336	<P>
				337	<b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
				338	occurrences of the pattern in the string with the rewrite. Replacements are
				339	not subject to re-matching. For example:
				340	<pre>
				341	string s = "yabba dabba doo";
				342	pcrecpp::RE("b+").GlobalReplace("d", &s);
				343	</pre>
				344	will leave "s" containing "yada dada doo". It returns the number of
				345	replacements made.
				346	</P>
				347	<P>
				348	<b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
				349	"rewrite" is copied into "out" (an additional argument) with substitutions.
				350	The non-matching portions of "text" are ignored. Returns true iff a match
				351	occurred and the extraction happened successfully; if no match occurs, the
				352	string is left unaffected.
				353	</P>
				354	<br><a name="SEC11" href="#TOC1">AUTHOR</a><br>
				355	<P>
				356	The C++ wrapper was contributed by Google Inc.
				357	<br>
				358	Copyright © 2007 Google Inc.
				359	<br>
				360	</P>
				361	<br><a name="SEC12" href="#TOC1">REVISION</a><br>
				362	<P>
				363	Last updated: 17 March 2009
				364	<br>
				365	Minor typo fixed: 25 July 2011
				366	<br>
				367	<p>
				368	Return to the <a href="index.html">PCRE index page</a>.
				369	</p>