Blame - jni/libpcre/sources/doc/html/pcresyntax.html - jami-client-android

blob: 9fa3ebd61af0a81a06152f39fde6d0509d64a6f8 [file] [log] [blame]

Tristan Matthews	0461646	2013-11-14 16:09:34 -0500	[diff] [blame]	1	<html>
				2	<head>
				3	<title>pcresyntax specification</title>
				4	</head>
				5	<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
				6	<h1>pcresyntax man page</h1>
				7	<p>
				8	Return to the <a href="index.html">PCRE index page</a>.
				9	</p>
				10	<p>
				11	This page is part of the PCRE HTML documentation. It was generated automatically
				12	from the original man page. If there is any nonsense in it, please consult the
				13	man page, in case the conversion went wrong.
				14	<br>
				15	<ul>
				16	<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
				17	<li><a name="TOC2" href="#SEC2">QUOTING</a>
				18	<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
				19	<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
				20	<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
				21	<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
				22	<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
				23	<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
				24	<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
				25	<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
				26	<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
				27	<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
				28	<li><a name="TOC13" href="#SEC13">CAPTURING</a>
				29	<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
				30	<li><a name="TOC15" href="#SEC15">COMMENT</a>
				31	<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
				32	<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
				33	<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
				34	<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
				35	<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
				36	<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
				37	<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
				38	<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
				39	<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
				40	<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
				41	<li><a name="TOC26" href="#SEC26">AUTHOR</a>
				42	<li><a name="TOC27" href="#SEC27">REVISION</a>
				43	</ul>
				44	<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
				45	<P>
				46	The full syntax and semantics of the regular expressions that are supported by
				47	PCRE are described in the
				48	<a href="pcrepattern.html"><b>pcrepattern</b></a>
				49	documentation. This document contains just a quick-reference summary of the
				50	syntax.
				51	</P>
				52	<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
				53	<P>
				54	<pre>
				55	\x where x is non-alphanumeric is a literal x
				56	\Q...\E treat enclosed characters as literal
				57	</PRE>
				58	</P>
				59	<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
				60	<P>
				61	<pre>
				62	\a alarm, that is, the BEL character (hex 07)
				63	\cx "control-x", where x is any ASCII character
				64	\e escape (hex 1B)
				65	\f formfeed (hex 0C)
				66	\n newline (hex 0A)
				67	\r carriage return (hex 0D)
				68	\t tab (hex 09)
				69	\ddd character with octal code ddd, or backreference
				70	\xhh character with hex code hh
				71	\x{hhh..} character with hex code hhh..
				72	</PRE>
				73	</P>
				74	<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
				75	<P>
				76	<pre>
				77	. any character except newline;
				78	in dotall mode, any character whatsoever
				79	\C one byte, even in UTF-8 mode (best avoided)
				80	\d a decimal digit
				81	\D a character that is not a decimal digit
				82	\h a horizontal whitespace character
				83	\H a character that is not a horizontal whitespace character
				84	\N a character that is not a newline
				85	\p{<i>xx</i>} a character with the <i>xx</i> property
				86	\P{<i>xx</i>} a character without the <i>xx</i> property
				87	\R a newline sequence
				88	\s a whitespace character
				89	\S a character that is not a whitespace character
				90	\v a vertical whitespace character
				91	\V a character that is not a vertical whitespace character
				92	\w a "word" character
				93	\W a "non-word" character
				94	\X an extended Unicode sequence
				95	</pre>
				96	In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII
				97	characters, even in UTF-8 mode. However, this can be changed by setting the
				98	PCRE_UCP option.
				99	</P>
				100	<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
				101	<P>
				102	<pre>
				103	C Other
				104	Cc Control
				105	Cf Format
				106	Cn Unassigned
				107	Co Private use
				108	Cs Surrogate
				109
				110	L Letter
				111	Ll Lower case letter
				112	Lm Modifier letter
				113	Lo Other letter
				114	Lt Title case letter
				115	Lu Upper case letter
				116	L& Ll, Lu, or Lt
				117
				118	M Mark
				119	Mc Spacing mark
				120	Me Enclosing mark
				121	Mn Non-spacing mark
				122
				123	N Number
				124	Nd Decimal number
				125	Nl Letter number
				126	No Other number
				127
				128	P Punctuation
				129	Pc Connector punctuation
				130	Pd Dash punctuation
				131	Pe Close punctuation
				132	Pf Final punctuation
				133	Pi Initial punctuation
				134	Po Other punctuation
				135	Ps Open punctuation
				136
				137	S Symbol
				138	Sc Currency symbol
				139	Sk Modifier symbol
				140	Sm Mathematical symbol
				141	So Other symbol
				142
				143	Z Separator
				144	Zl Line separator
				145	Zp Paragraph separator
				146	Zs Space separator
				147	</PRE>
				148	</P>
				149	<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
				150	<P>
				151	<pre>
				152	Xan Alphanumeric: union of properties L and N
				153	Xps POSIX space: property Z or tab, NL, VT, FF, CR
				154	Xsp Perl space: property Z or tab, NL, FF, CR
				155	Xwd Perl word: property Xan or underscore
				156	</PRE>
				157	</P>
				158	<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
				159	<P>
				160	Arabic,
				161	Armenian,
				162	Avestan,
				163	Balinese,
				164	Bamum,
				165	Bengali,
				166	Bopomofo,
				167	Braille,
				168	Buginese,
				169	Buhid,
				170	Canadian_Aboriginal,
				171	Carian,
				172	Cham,
				173	Cherokee,
				174	Common,
				175	Coptic,
				176	Cuneiform,
				177	Cypriot,
				178	Cyrillic,
				179	Deseret,
				180	Devanagari,
				181	Egyptian_Hieroglyphs,
				182	Ethiopic,
				183	Georgian,
				184	Glagolitic,
				185	Gothic,
				186	Greek,
				187	Gujarati,
				188	Gurmukhi,
				189	Han,
				190	Hangul,
				191	Hanunoo,
				192	Hebrew,
				193	Hiragana,
				194	Imperial_Aramaic,
				195	Inherited,
				196	Inscriptional_Pahlavi,
				197	Inscriptional_Parthian,
				198	Javanese,
				199	Kaithi,
				200	Kannada,
				201	Katakana,
				202	Kayah_Li,
				203	Kharoshthi,
				204	Khmer,
				205	Lao,
				206	Latin,
				207	Lepcha,
				208	Limbu,
				209	Linear_B,
				210	Lisu,
				211	Lycian,
				212	Lydian,
				213	Malayalam,
				214	Meetei_Mayek,
				215	Mongolian,
				216	Myanmar,
				217	New_Tai_Lue,
				218	Nko,
				219	Ogham,
				220	Old_Italic,
				221	Old_Persian,
				222	Old_South_Arabian,
				223	Old_Turkic,
				224	Ol_Chiki,
				225	Oriya,
				226	Osmanya,
				227	Phags_Pa,
				228	Phoenician,
				229	Rejang,
				230	Runic,
				231	Samaritan,
				232	Saurashtra,
				233	Shavian,
				234	Sinhala,
				235	Sundanese,
				236	Syloti_Nagri,
				237	Syriac,
				238	Tagalog,
				239	Tagbanwa,
				240	Tai_Le,
				241	Tai_Tham,
				242	Tai_Viet,
				243	Tamil,
				244	Telugu,
				245	Thaana,
				246	Thai,
				247	Tibetan,
				248	Tifinagh,
				249	Ugaritic,
				250	Vai,
				251	Yi.
				252	</P>
				253	<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
				254	<P>
				255	<pre>
				256	[...] positive character class
				257	[^...] negative character class
				258	[x-y] range (can be used for hex characters)
				259	[[:xxx:]] positive POSIX named set
				260	[[:^xxx:]] negative POSIX named set
				261
				262	alnum alphanumeric
				263	alpha alphabetic
				264	ascii 0-127
				265	blank space or tab
				266	cntrl control character
				267	digit decimal digit
				268	graph printing, excluding space
				269	lower lower case letter
				270	print printing, including space
				271	punct printing, excluding alphanumeric
				272	space whitespace
				273	upper upper case letter
				274	word same as \w
				275	xdigit hexadecimal digit
				276	</pre>
				277	In PCRE, POSIX character set names recognize only ASCII characters by default,
				278	but some of them use Unicode properties if PCRE_UCP is set. You can use
				279	\Q...\E inside a character class.
				280	</P>
				281	<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
				282	<P>
				283	<pre>
				284	? 0 or 1, greedy
				285	?+ 0 or 1, possessive
				286	?? 0 or 1, lazy
				287	* 0 or more, greedy
				288	*+ 0 or more, possessive
				289	*? 0 or more, lazy
				290	+ 1 or more, greedy
				291	++ 1 or more, possessive
				292	+? 1 or more, lazy
				293	{n} exactly n
				294	{n,m} at least n, no more than m, greedy
				295	{n,m}+ at least n, no more than m, possessive
				296	{n,m}? at least n, no more than m, lazy
				297	{n,} n or more, greedy
				298	{n,}+ n or more, possessive
				299	{n,}? n or more, lazy
				300	</PRE>
				301	</P>
				302	<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
				303	<P>
				304	<pre>
				305	\b word boundary
				306	\B not a word boundary
				307	^ start of subject
				308	also after internal newline in multiline mode
				309	\A start of subject
				310	$ end of subject
				311	also before newline at end of subject
				312	also before internal newline in multiline mode
				313	\Z end of subject
				314	also before newline at end of subject
				315	\z end of subject
				316	\G first matching position in subject
				317	</PRE>
				318	</P>
				319	<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
				320	<P>
				321	<pre>
				322	\K reset start of match
				323	</PRE>
				324	</P>
				325	<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
				326	<P>
				327	<pre>
				328	expr\|expr\|expr...
				329	</PRE>
				330	</P>
				331	<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
				332	<P>
				333	<pre>
				334	(...) capturing group
				335	(?<name>...) named capturing group (Perl)
				336	(?'name'...) named capturing group (Perl)
				337	(?P<name>...) named capturing group (Python)
				338	(?:...) non-capturing group
				339	(?\|...) non-capturing group; reset group numbers for
				340	capturing groups in each alternative
				341	</PRE>
				342	</P>
				343	<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
				344	<P>
				345	<pre>
				346	(?>...) atomic, non-capturing group
				347	</PRE>
				348	</P>
				349	<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
				350	<P>
				351	<pre>
				352	(?#....) comment (not nestable)
				353	</PRE>
				354	</P>
				355	<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
				356	<P>
				357	<pre>
				358	(?i) caseless
				359	(?J) allow duplicate names
				360	(?m) multiline
				361	(?s) single line (dotall)
				362	(?U) default ungreedy (lazy)
				363	(?x) extended (ignore white space)
				364	(?-...) unset option(s)
				365	</pre>
				366	The following are recognized only at the start of a pattern or after one of the
				367	newline-setting options with similar syntax:
				368	<pre>
				369	(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
				370	(*UTF8) set UTF-8 mode (PCRE_UTF8)
				371	(*UCP) set PCRE_UCP (use Unicode properties for \d etc)
				372	</PRE>
				373	</P>
				374	<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
				375	<P>
				376	<pre>
				377	(?=...) positive look ahead
				378	(?!...) negative look ahead
				379	(?<=...) positive look behind
				380	(?<!...) negative look behind
				381	</pre>
				382	Each top-level branch of a look behind must be of a fixed length.
				383	</P>
				384	<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
				385	<P>
				386	<pre>
				387	\n reference by number (can be ambiguous)
				388	\gn reference by number
				389	\g{n} reference by number
				390	\g{-n} relative reference by number
				391	\k<name> reference by name (Perl)
				392	\k'name' reference by name (Perl)
				393	\g{name} reference by name (Perl)
				394	\k{name} reference by name (.NET)
				395	(?P=name) reference by name (Python)
				396	</PRE>
				397	</P>
				398	<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
				399	<P>
				400	<pre>
				401	(?R) recurse whole pattern
				402	(?n) call subpattern by absolute number
				403	(?+n) call subpattern by relative number
				404	(?-n) call subpattern by relative number
				405	(?&name) call subpattern by name (Perl)
				406	(?P>name) call subpattern by name (Python)
				407	\g<name> call subpattern by name (Oniguruma)
				408	\g'name' call subpattern by name (Oniguruma)
				409	\g<n> call subpattern by absolute number (Oniguruma)
				410	\g'n' call subpattern by absolute number (Oniguruma)
				411	\g<+n> call subpattern by relative number (PCRE extension)
				412	\g'+n' call subpattern by relative number (PCRE extension)
				413	\g<-n> call subpattern by relative number (PCRE extension)
				414	\g'-n' call subpattern by relative number (PCRE extension)
				415	</PRE>
				416	</P>
				417	<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
				418	<P>
				419	<pre>
				420	(?(condition)yes-pattern)
				421	(?(condition)yes-pattern\|no-pattern)
				422
				423	(?(n)... absolute reference condition
				424	(?(+n)... relative reference condition
				425	(?(-n)... relative reference condition
				426	(?(<name>)... named reference condition (Perl)
				427	(?('name')... named reference condition (Perl)
				428	(?(name)... named reference condition (PCRE)
				429	(?(R)... overall recursion condition
				430	(?(Rn)... specific group recursion condition
				431	(?(R&name)... specific recursion condition
				432	(?(DEFINE)... define subpattern for reference
				433	(?(assert)... assertion condition
				434	</PRE>
				435	</P>
				436	<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
				437	<P>
				438	The following act immediately they are reached:
				439	<pre>
				440	(*ACCEPT) force successful match
				441	(FAIL) force backtrack; synonym (F)
				442	</pre>
				443	The following act only when a subsequent match failure causes a backtrack to
				444	reach them. They all force a match failure, but they differ in what happens
				445	afterwards. Those that advance the start-of-match point do so only if the
				446	pattern is not anchored.
				447	<pre>
				448	(*COMMIT) overall failure, no advance of starting point
				449	(*PRUNE) advance to next starting character
				450	(*SKIP) advance start to current matching position
				451	(*THEN) local failure, backtrack to next alternation
				452	</PRE>
				453	</P>
				454	<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
				455	<P>
				456	These are recognized only at the very start of the pattern or after a
				457	(BSR_...) or (UTF8) or (*UCP) option.
				458	<pre>
				459	(*CR) carriage return only
				460	(*LF) linefeed only
				461	(*CRLF) carriage return followed by linefeed
				462	(*ANYCRLF) all three of the above
				463	(*ANY) any Unicode newline sequence
				464	</PRE>
				465	</P>
				466	<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
				467	<P>
				468	These are recognized only at the very start of the pattern or after a
				469	(*...) option that sets the newline convention or UTF-8 or UCP mode.
				470	<pre>
				471	(*BSR_ANYCRLF) CR, LF, or CRLF
				472	(*BSR_UNICODE) any Unicode newline sequence
				473	</PRE>
				474	</P>
				475	<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
				476	<P>
				477	<pre>
				478	(?C) callout
				479	(?Cn) callout with data n
				480	</PRE>
				481	</P>
				482	<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
				483	<P>
				484	<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
				485	<b>pcrematching</b>(3), <b>pcre</b>(3).
				486	</P>
				487	<br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
				488	<P>
				489	Philip Hazel
				490	<br>
				491	University Computing Service
				492	<br>
				493	Cambridge CB2 3QH, England.
				494	<br>
				495	</P>
				496	<br><a name="SEC27" href="#TOC1">REVISION</a><br>
				497	<P>
				498	Last updated: 21 November 2010
				499	<br>
				500	Copyright © 1997-2010 University of Cambridge.
				501	<br>
				502	<p>
				503	Return to the <a href="index.html">PCRE index page</a>.
				504	</p>