Form Checking – Verifying Name Using PHP Ereg

One important use of Regular Expressions (Regex) is to verify fields submitted via a form. In this article, we attempt to write an expression that is able to verify the user’s first name, middle name, last name or just names in general.

The expression should allow names such as “Mary”, “Mr. James Smith” and “Mrs O’Shea” for example. So the challenge here is to allow spaces, periods and single quotation marks in the name field and reject any other characters.

Technique

We try to identify and detect all illegal characters in the name field. I came up with the following list:

Punctuations: ~`!@#$%^&*()=+{}|\:;<>“/?,

Numerics: 0-9

Noticed that I left out the empty space ( ), period (.) and single quotation mark (‘) because we are allowing these 3 characters to pass the verification. In other words, the verification will fail if the name field contains any of the punctuations or numerics above.

The Regex

Now, the hardcore part. The regex pattern I came up with is as follows:

([[:digit:]]|[~`!@#$%^&*\(\)_=\+{}\|\\:;<>“/?,]|\[|\]|-)+

Let me briefly explain what this pattern means. The expression can be represented by:(expression1 | expression2 | expression3 | expression4 | expression5)

What we are trying to do here is to match the name field to the patterns in expression 1, 2, 3, 4 or 5. If you look at the regex closely, you will see that expression1 is actually [[:digit:]].

Expression2 is:

[~`!@#$%^&*\(\)_=\+{}\|\\:;<>“/?,]

Noticed that I added a backslash (\) before each of the 5 characters “()+|\”. By backslashing these characters, I am telling the function to treat the characters as it is and not as special built-in characters. For example, the brackets “()” actually means grouping in regex but if I backslash it, ie “\(\)”, it simply means that I want to match “(” and “)”.

Expression3 is “\[“, expression 4 is “\]” and expression 5 is “-“. We left out the 3 characters “[]-” in expression2 just to avoid confusion because we already used “[]” as the outer brackets. As for “-“, we left it out because it is normally used as a range within the brackets “[]”, like so [A-Z].

Implementation

To implement it in PHP, we write the code as follows:

$pattern = ‘([[:digit:]]|[~`!@#$%^&*\(\)_=\+{}\|\\:;<>“/?,]|\[|\]|-)+’;
$name = stripslashes({$_POST[‘name_field’]});
if (ereg($pattern,{$_POST[‘name_field’]})) {
echo “write your error message”;
}

We stripslashed the name field just in case your have magic quotes turned on. If magic quotes is turned on, the single quotation mark will be passed as \’ instead just ‘. The ereg function will look for digits and illegal punctuations in the $_POST name field. If an error is found, we can do something such as alerting the user of the error.

Conclusion

Hopefully, this article can give you some insight into regex and save you some time when verifying name fields. You can modify the regex to have stricter rules for example, you may not want the name field to start with a space or a period. That’s all for now. Cheers.

 

Introduction to Regular Expressions In PHP

In Linux and Unix, the syntax that is commonly used by many applications for specifying text patterns is known as regular expressions or in short form – regex. Regex is a very powerful technique to describe patterns and many programs use them to describe sequences of characters to be matched. Search programs such as ‘grep’ rely heavily on regex. Basically regex forms the core in the linux world. Many scripting languages such as perl, ruby, php…etc has build in regex functions as well. So you can see, learning regular expression is important because they are used alot in many places and probably more so in the future. 

Regex can be scary at first but if you can get the basics, it is really not too hard to understand. In this article, we are going to look at how regex comes into the picture when writing php applications.

In simple terms, a regular expression is a sequence of literal characters, wildcards, modifiers and anchors.

Literal Characters

Literal characters are letters, digits and special characters that match only themselves. Examples are abc, 123, ~@ and so on (some characters are reserved though).

– An inclusion range [m-n] matches one of any character included in the range from m to n. Example ‘[a-z]’ will match any alpha character that falls within the a to z range.
– An exclusion range [^m-n] matches one of any character not included in the range from m to n. Example ‘[^0-9]’ will match any non-digit character.
– A period “.” matches any character. It is also known as the wildcard. Example ‘a.c’ will match ‘aec’, ‘acc’, ‘a@a’ and so on.
– The escape character ” enable interpretation of special characters. Example ‘a.c’ will match ‘ac’ only. Remember that ‘.’ is a reserved character to represent a wildcard? Therefore to match a period, ie ‘.’, we need to escape it like so ‘.’
– The expression [:alnum:] will match all alpha-numeric characters. It is a shortcut to [A-Za-z0-9]. As you can see, it is not really a shortcut. The expression [:alnum:] might be easier to remember for some people.
– The expression [:alpha:] will match all alpha characters. It is a shortcut to [A-Za-z].
– The expression [:blank:] will match a space or tab.
– The expression [:digit:] will match a numeric digit. It is a shortcut to [0-9].
– The expression [:lower:] will match all lowercase letters. It is a shortcut to [a-z].
– The expression [:upper:] will match all uppercase letters. It is a shortcut to [A-Z].
– The expression [:punct:] will match all printable characters, excluding spaces and alphanumerics.
– The expression [:space:] will match a whitespace character.

Modifiers

A modifier alters the meaning of the immediately preceding pattern character.

– An asterisk (‘*’) matches 0 or more of the preceding term. Example ‘a*’ will match ”, ‘a’, ‘aa’, ‘aaaaa’ and so on (Note the use of ”. It simply means that the expression matches nothing as well).
– A question mark (‘?’) matches 0 or 1 of the preceding term. Example ‘a?’ will match ” and ‘a’ only.
– A plus sign (‘+’) matches 1 or more of the preceding term. Example ‘a+’ will match ‘a’, ‘aaaaaaa’ and so on. It will not match ”.
– {m,n} matches between m and n occurences of the preceding term. Example ‘a{1,3}’ will match ‘a’, ‘aa’ and ‘aaa’ only.
– {n} matches exactly n occurences of the preceding term. Example ‘a{2}’ will match ‘aa’ only.

Anchors

Anchors establish the context for the pattern such as “the beginning of a word” or “end of word”.

– The pike ‘^’ marks the beginning of a line. Example ‘^http’ will match any new line that starts with ‘http’.
– The dollar sign ‘$’ marks the end of a line. Example ‘after$’ will match any line that ends with ‘after’. (Variables in php starts with $. Try not to confuse with it).

Grouping

Grouping ‘( )’ allows modifiers to apply to groups of regex specifiers instead of only the immediately proceding specifier. Example ‘( aa | bb )’ will match either ‘aa’ or ‘bb’

Enough of boring stuff, it is time to put what the theory of regex into good use.

PHP Implementation

There are 2 main variants of regex, Perl-compatible regex (PCRE) and POSIX-Extended. PHP offers quite alot of functions to implement these 2 types of regex. In PHP, the most commonly used PCRE function is ‘preg_match’ and in POSIX-extended regex, ‘ereg’. Both syntax are slightly different but equally powerful. The preference to use ‘preg_match’ or ‘ereg’ is entirely up to individual although Zend suggested that preg_match is slightly faster. I prefer to use ‘eregi’ simply because of my background in linux administration.

Example 1: Matching United States 5 or 9 digit zip codes

Zip codes in USA have the following format ##### or #####-#### where # is a digit. If you want to verify the zip code submitted say from an online form, you will need to use regex somewhere in your script to verify it. The matching POSIX-extended regex pattern will be:

[[:digit:]]{5}(-[[:digit:]]{4})?

Confused? Wait, let me explain why. This regex is split up into 2 parts: [[:digit:]]{5} and (-[[:digit:]]{4})?.

First Part: ‘[[:digit:]]’ means the digit range and {5} means that the digit must occur 5 times.

Second Part: The bracket ‘( )’ groups the ‘-[[:digit:]]{4}’ together and the ‘?’ means the expression ‘(-[[:digit:]]{4})’ can either occur 0 or 1 time.

To implement the regex in PHP, we use the following code:

$zipCodes = ‘xxxxx-xxxx’;
$pattern = ‘[[:digit:]]{5}(-[[:digit:]]{4})?’;
if (ereg($pattern,$zipCodes)) {
echo “matched found “;
}
else {
echo “match not found”;
}

Example 2: Matching Dates

Say we want to verify the dates entered by the user. If we only accept dates like “YYYY-MM-DD” or “YYYY-M-D”, the regex pattern will be

[0-9]{4}(-[0-9]{1,2})+

The ‘+’ behind the term (-[0-9]{1,2}) means that the term must occur at least once. Note that I can also rewrite the regex as:

[[:digit:]]{4}(-[[:digit:]]{1,2})+

or

[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}

As you can see, there can be many solutions to a problem…

Conclusion

Regex may be hard to digest at first but the logic is simple if you are able to practice more. Learning regex is as important as learning PHP. More examples can be seen at web-developer.sitecritic.net. Good luck.