Blame - jni/libpcre/sources/doc/pcrecpp.3 - jami-client-android

blob: f82d7e250180ece41b8046210a9b07e7786b97fc [file] [log] [blame]

Tristan Matthews	0461646	2013-11-14 16:09:34 -0500	[diff] [blame]	1	.TH PCRECPP 3
				2	.SH NAME
				3	PCRE - Perl-compatible regular expressions.
				4	.SH "SYNOPSIS OF C++ WRAPPER"
				5	.rs
				6	.sp
				7	.B #include <pcrecpp.h>
				8	.
				9	.SH DESCRIPTION
				10	.rs
				11	.sp
				12	The C++ wrapper for PCRE was provided by Google Inc. Some additional
				13	functionality was added by Giuseppe Maxia. This brief man page was constructed
				14	from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
				15	further details.
				16	.
				17	.
				18	.SH "MATCHING INTERFACE"
				19	.rs
				20	.sp
				21	The "FullMatch" operation checks that supplied text matches a supplied pattern
				22	exactly. If pointer arguments are supplied, it copies matched sub-strings that
				23	match sub-patterns into them.
				24	.sp
				25	Example: successful match
				26	pcrecpp::RE re("h.*o");
				27	re.FullMatch("hello");
				28	.sp
				29	Example: unsuccessful match (requires full match):
				30	pcrecpp::RE re("e");
				31	!re.FullMatch("hello");
				32	.sp
				33	Example: creating a temporary RE object:
				34	pcrecpp::RE("h.*o").FullMatch("hello");
				35	.sp
				36	You can pass in a "const char*" or a "string" for "text". The examples below
				37	tend to use a const char*. You can, as in the different examples above, store
				38	the RE object explicitly in a variable or use a temporary RE object. The
				39	examples below use one mode or the other arbitrarily. Either could correctly be
				40	used for any of these examples.
				41	.P
				42	You must supply extra pointer arguments to extract matched subpieces.
				43	.sp
				44	Example: extracts "ruby" into "s" and 1234 into "i"
				45	int i;
				46	string s;
				47	pcrecpp::RE re("(\e\ew+):(\e\ed+)");
				48	re.FullMatch("ruby:1234", &s, &i);
				49	.sp
				50	Example: does not try to extract any extra sub-patterns
				51	re.FullMatch("ruby:1234", &s);
				52	.sp
				53	Example: does not try to extract into NULL
				54	re.FullMatch("ruby:1234", NULL, &i);
				55	.sp
				56	Example: integer overflow causes failure
				57	!re.FullMatch("ruby:1234567891234", NULL, &i);
				58	.sp
				59	Example: fails because there aren't enough sub-patterns:
				60	!pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
				61	.sp
				62	Example: fails because string cannot be stored in integer
				63	!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
				64	.sp
				65	The provided pointer arguments can be pointers to any scalar numeric
				66	type, or one of:
				67	.sp
				68	string (matched piece is copied to string)
				69	StringPiece (StringPiece is mutated to point to matched piece)
				70	T (where "bool T::ParseFrom(const char*, int)" exists)
				71	NULL (the corresponding matched sub-pattern is not copied)
				72	.sp
				73	The function returns true iff all of the following conditions are satisfied:
				74	.sp
				75	a. "text" matches "pattern" exactly;
				76	.sp
				77	b. The number of matched sub-patterns is >= number of supplied
				78	pointers;
				79	.sp
				80	c. The "i"th argument has a suitable type for holding the
				81	string captured as the "i"th sub-pattern. If you pass in
				82	void * NULL for the "i"th argument, or a non-void * NULL
				83	of the correct type, or pass fewer arguments than the
				84	number of sub-patterns, "i"th captured sub-pattern is
				85	ignored.
				86	.sp
				87	CAVEAT: An optional sub-pattern that does not exist in the matched
				88	string is assigned the empty string. Therefore, the following will
				89	return false (because the empty string is not a valid number):
				90	.sp
				91	int number;
				92	pcrecpp::RE::FullMatch("abc", "[a-z]+(\e\ed+)?", &number);
				93	.sp
				94	The matching interface supports at most 16 arguments per call.
				95	If you need more, consider using the more general interface
				96	\fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
				97	\fBDoMatch\fP.
				98	.P
				99	NOTE: Do not use \fBno_arg\fP, which is used internally to mark the end of a
				100	list of optional arguments, as a placeholder for missing arguments, as this can
				101	lead to segfaults.
				102	.
				103	.
				104	.SH "QUOTING METACHARACTERS"
				105	.rs
				106	.sp
				107	You can use the "QuoteMeta" operation to insert backslashes before all
				108	potentially meaningful characters in a string. The returned string, used as a
				109	regular expression, will exactly match the original string.
				110	.sp
				111	Example:
				112	string quoted = RE::QuoteMeta(unquoted);
				113	.sp
				114	Note that it's legal to escape a character even if it has no special meaning in
				115	a regular expression -- so this function does that. (This also makes it
				116	identical to the perl function of the same name; see "perldoc -f quotemeta".)
				117	For example, "1.5-2.0?" becomes "1\e.5\e-2\e.0\e?".
				118	.
				119	.SH "PARTIAL MATCHES"
				120	.rs
				121	.sp
				122	You can use the "PartialMatch" operation when you want the pattern
				123	to match any substring of the text.
				124	.sp
				125	Example: simple search for a string:
				126	pcrecpp::RE("ell").PartialMatch("hello");
				127	.sp
				128	Example: find first number in a string:
				129	int number;
				130	pcrecpp::RE re("(\e\ed+)");
				131	re.PartialMatch("x*100 + 20", &number);
				132	assert(number == 100);
				133	.
				134	.
				135	.SH "UTF-8 AND THE MATCHING INTERFACE"
				136	.rs
				137	.sp
				138	By default, pattern and text are plain text, one byte per character. The UTF8
				139	flag, passed to the constructor, causes both pattern and string to be treated
				140	as UTF-8 text, still a byte stream but potentially multiple bytes per
				141	character. In practice, the text is likelier to be UTF-8 than the pattern, but
				142	the match returned may depend on the UTF8 flag, so always use it when matching
				143	UTF8 text. For example, "." will match one byte normally but with UTF8 set may
				144	match up to three bytes of a multi-byte character.
				145	.sp
				146	Example:
				147	pcrecpp::RE_Options options;
				148	options.set_utf8();
				149	pcrecpp::RE re(utf8_pattern, options);
				150	re.FullMatch(utf8_string);
				151	.sp
				152	Example: using the convenience function UTF8():
				153	pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
				154	re.FullMatch(utf8_string);
				155	.sp
				156	NOTE: The UTF8 flag is ignored if pcre was not configured with the
				157	--enable-utf8 flag.
				158	.
				159	.
				160	.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
				161	.rs
				162	.sp
				163	PCRE defines some modifiers to change the behavior of the regular expression
				164	engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
				165	pass such modifiers to a RE class. Currently, the following modifiers are
				166	supported:
				167	.sp
				168	modifier description Perl corresponding
				169	.sp
				170	PCRE_CASELESS case insensitive match /i
				171	PCRE_MULTILINE multiple lines match /m
				172	PCRE_DOTALL dot matches newlines /s
				173	PCRE_DOLLAR_ENDONLY $ matches only at end N/A
				174	PCRE_EXTRA strict escape parsing N/A
				175	PCRE_EXTENDED ignore whitespaces /x
				176	PCRE_UTF8 handles UTF8 chars built-in
				177	PCRE_UNGREEDY reverses * and *? N/A
				178	PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
				179	.sp
				180	(*) Both Perl and PCRE allow non capturing parentheses by means of the
				181	"?:" modifier within the pattern itself. e.g. (?:ab\|cd) does not
				182	capture, while (ab\|cd) does.
				183	.P
				184	For a full account on how each modifier works, please check the
				185	PCRE API reference page.
				186	.P
				187	For each modifier, there are two member functions whose name is made
				188	out of the modifier in lowercase, without the "PCRE_" prefix. For
				189	instance, PCRE_CASELESS is handled by
				190	.sp
				191	bool caseless()
				192	.sp
				193	which returns true if the modifier is set, and
				194	.sp
				195	RE_Options & set_caseless(bool)
				196	.sp
				197	which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
				198	accessed through the \fBset_match_limit()\fP and \fBmatch_limit()\fP member
				199	functions. Setting \fImatch_limit\fP to a non-zero value will limit the
				200	execution of pcre to keep it from doing bad things like blowing the stack or
				201	taking an eternity to return a result. A value of 5000 is good enough to stop
				202	stack blowup in a 2MB thread stack. Setting \fImatch_limit\fP to zero disables
				203	match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP
				204	which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
				205	recurses. \fBmatch_limit()\fP limits the number of matches PCRE does;
				206	\fBmatch_limit_recursion()\fP limits the depth of internal recursion, and
				207	therefore the amount of stack that is used.
				208	.P
				209	Normally, to pass one or more modifiers to a RE class, you declare
				210	a \fIRE_Options\fP object, set the appropriate options, and pass this
				211	object to a RE constructor. Example:
				212	.sp
				213	RE_Options opt;
				214	opt.set_caseless(true);
				215	if (RE("HELLO", opt).PartialMatch("hello world")) ...
				216	.sp
				217	RE_options has two constructors. The default constructor takes no arguments and
				218	creates a set of flags that are off by default. The optional parameter
				219	\fIoption_flags\fP is to facilitate transfer of legacy code from C programs.
				220	This lets you do
				221	.sp
				222	RE(pattern,
				223	RE_Options(PCRE_CASELESS\|PCRE_MULTILINE)).PartialMatch(str);
				224	.sp
				225	However, new code is better off doing
				226	.sp
				227	RE(pattern,
				228	RE_Options().set_caseless(true).set_multiline(true))
				229	.PartialMatch(str);
				230	.sp
				231	If you are going to pass one of the most used modifiers, there are some
				232	convenience functions that return a RE_Options class with the
				233	appropriate modifier already set: \fBCASELESS()\fP, \fBUTF8()\fP,
				234	\fBMULTILINE()\fP, \fBDOTALL\fP(), and \fBEXTENDED()\fP.
				235	.P
				236	If you need to set several options at once, and you don't want to go through
				237	the pains of declaring a RE_Options object and setting several options, there
				238	is a parallel method that give you such ability on the fly. You can concatenate
				239	several \fBset_xxxxx()\fP member functions, since each of them returns a
				240	reference to its class object. For example, to pass PCRE_CASELESS,
				241	PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
				242	.sp
				243	RE(" ^ xyz \e\es+ .* blah$",
				244	RE_Options()
				245	.set_caseless(true)
				246	.set_extended(true)
				247	.set_multiline(true)).PartialMatch(sometext);
				248	.sp
				249	.
				250	.
				251	.SH "SCANNING TEXT INCREMENTALLY"
				252	.rs
				253	.sp
				254	The "Consume" operation may be useful if you want to repeatedly
				255	match regular expressions at the front of a string and skip over
				256	them as they match. This requires use of the "StringPiece" type,
				257	which represents a sub-range of a real string. Like RE, StringPiece
				258	is defined in the pcrecpp namespace.
				259	.sp
				260	Example: read lines of the form "var = value" from a string.
				261	string contents = ...; // Fill string somehow
				262	pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
				263	.sp
				264	string var;
				265	int value;
				266	pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
				267	while (re.Consume(&input, &var, &value)) {
				268	...;
				269	}
				270	.sp
				271	Each successful call to "Consume" will set "var/value", and also
				272	advance "input" so it points past the matched text.
				273	.P
				274	The "FindAndConsume" operation is similar to "Consume" but does not
				275	anchor your match at the beginning of the string. For example, you
				276	could extract all words from a string by repeatedly calling
				277	.sp
				278	pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
				279	.
				280	.
				281	.SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
				282	.rs
				283	.sp
				284	By default, if you pass a pointer to a numeric value, the
				285	corresponding text is interpreted as a base-10 number. You can
				286	instead wrap the pointer with a call to one of the operators Hex(),
				287	Octal(), or CRadix() to interpret the text in another base. The
				288	CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
				289	prefixes, but defaults to base-10.
				290	.sp
				291	Example:
				292	int a, b, c, d;
				293	pcrecpp::RE re("(.) (.) (.) (.)");
				294	re.FullMatch("100 40 0100 0x40",
				295	pcrecpp::Octal(&a), pcrecpp::Hex(&b),
				296	pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
				297	.sp
				298	will leave 64 in a, b, c, and d.
				299	.
				300	.
				301	.SH "REPLACING PARTS OF STRINGS"
				302	.rs
				303	.sp
				304	You can replace the first match of "pattern" in "str" with "rewrite".
				305	Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
				306	used to insert text matching corresponding parenthesized group
				307	from the pattern. \e0 in "rewrite" refers to the entire matching
				308	text. For example:
				309	.sp
				310	string s = "yabba dabba doo";
				311	pcrecpp::RE("b+").Replace("d", &s);
				312	.sp
				313	will leave "s" containing "yada dabba doo". The result is true if the pattern
				314	matches and a replacement occurs, false otherwise.
				315	.P
				316	\fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
				317	occurrences of the pattern in the string with the rewrite. Replacements are
				318	not subject to re-matching. For example:
				319	.sp
				320	string s = "yabba dabba doo";
				321	pcrecpp::RE("b+").GlobalReplace("d", &s);
				322	.sp
				323	will leave "s" containing "yada dada doo". It returns the number of
				324	replacements made.
				325	.P
				326	\fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
				327	"rewrite" is copied into "out" (an additional argument) with substitutions.
				328	The non-matching portions of "text" are ignored. Returns true iff a match
				329	occurred and the extraction happened successfully; if no match occurs, the
				330	string is left unaffected.
				331	.
				332	.
				333	.SH AUTHOR
				334	.rs
				335	.sp
				336	.nf
				337	The C++ wrapper was contributed by Google Inc.
				338	Copyright (c) 2007 Google Inc.
				339	.fi
				340	.
				341	.
				342	.SH REVISION
				343	.rs
				344	.sp
				345	.nf
				346	Last updated: 17 March 2009
				347	Minor typo fixed: 25 July 2011
				348	.fi