Planet For Application Life Development Presents
MY IT World

Explore and uptodate your technology skills...

PHP -Regular Expressions

Regular expressions are nothing more than a sequence or pattern of characters itself. They provide the foundation for pattern-matching functionality.

Using regular expression you can search a particular string inside a another string, you can replace one string by another string and you can split a string into many chunks.

PHP offers functions specific to two sets of regular expression functions, each corresponding to a certain type of regular expression. You can use any of them based on your comfort.

  • POSIX Regular Expressions

  • PERL Style Regular Expressions

POSIX Regular Expressions:

The structure of a POSIX regular expression is not dissimilar to that of a typical arithmetic expression: various elements (operators) are combined to form more complex expressions.

The simplest regular expression is one that matches a single character, such as g, inside strings such as g, haggle, or bag.

Lets give explaination for few concepts being used in POSIX regular expression. After that we will introduce you wih regular expression related functions.

Brackets

Brackets ([]) have a special meaning when used in the context of regular expressions. They are used to find a range of characters.

ExpressionDescription
[0-9] It matches any decimal digit from 0 through 9.
[a-z] It matches any character from lowercase a through lowercase z.
[A-Z] It matches any character from uppercase A through uppercase Z.
[a-Z] It matches any character from lowercase a through uppercase Z.

The ranges shown above are general; you could also use the range [0-3] to match any decimal digit ranging from 0 through 3, or the range [b-v] to match any lowercase character ranging from b through v.

Quantifiers:

The frequency or position of bracketed character sequences and single characters can be denoted by a special character. Each pecial character having a specific connotation. The +, *, ?, {int. range}, and $ flags all follow a character sequence.

ExpressionDescription
p+ It matches any string containing at least one p.
p* It matches any string containing zero or more p's.
p? It matches any string containing zero or more p's. This is just an alternative way to use p*.
p{N} It matches any string containing a sequence of N p's
p{2,3} It matches any string containing a sequence of two or three p's.
p{2, } It matches any string containing a sequence of at least two p's.
p$ It matches any string with p at the end of it.
^p It matches any string with p at the beginning of it.

Examples:

Following examples will clear your concepts about matching chracters.

ExpressionDescription
[^a-zA-Z] It matches any string not containing any of the characters ranging from a through z and A through Z.
p.p It matches any string containing p, followed by any character, in turn followed by another p.
^.{2}$ It matches any string containing exactly two characters.
<b>(.*)</b> It matches any string enclosed within <b> and </b>.
p(hp)* It matches any string containing a p followed by zero or more instances of the sequence hp.

Predefined Character Ranges

For your programming convenience several predefined character ranges, also known as character classes, are available. Character classes specify an entire range of characters, for example, the alphabet or an integer set:

ExpressionDescription
[[:alpha:]] It matches any string containing alphabetic characters aA through zZ.
[[:digit:]] It matches any string containing numerical digits 0 through 9.
[[:alnum:]] It matches any string containing alphanumeric characters aA through zZ and 0 through 9.
[[:space:]] It matches any string containing a space.

PERL Style Regular Expressions:

Perl-style regular expressions are similar to their POSIX counterparts. The POSIX syntax can be used almost interchangeably with the Perl-style regular expression functions. In fact, you can use any of the quantifiers introduced in the previous POSIX section.

Lets give explaination for few concepts being used in PERL regular expressions. After that we will introduce you wih regular expression related functions.

Metacharacters

A metacharacter is simply an alphabetical character preceded by a backslash that acts to give the combination a special meaning.

For instance, you can search for large money sums using the '\d' metacharacter: /([\d]+)000/, Here \d will search for any string of numerical character.

Following is the list of metacharacters which can be used in PERL Style Regular Expressions.

Character		Description
.              a single character
\s             a whitespace character (space, tab, newline)
\S             non-whitespace character
\d             a digit (0-9)
\D             a non-digit
\w             a word character (a-z, A-Z, 0-9, _)
\W             a non-word character
[aeiou]        matches a single character in the given set
[^aeiou]       matches a single character outside the given set
(foo|bar|baz)  matches any of the alternatives specified

Modifiers

Several modifiers are available that can make your work with regexps much easier, like case sensitivity, searching in multiple lines etc.

Modifier	Description
i 	Makes the match case insensitive
m 	Specifies that if the string has newline or carriage
	return characters, the ^ and $ operators will now
	match against a newline boundary, instead of a
	string boundary
o 	Evaluates the expression only once
s 	Allows use of . to match a newline character
x 	Allows you to use white space in the expression for clarity
g 	Globally finds all matches
cg 	Allows a search to continue even after a global match fails