Strings and Patterns – PHP Certification Exam Series [3]


PHP Format and Pattern - PHP Certification Exam

Introduction: This is the third part of my study notes for the Zend PHP Certification Exam. You can read more about my PHP Certification exam journey here.

Article Highlights

String Basics

  • delimited by single or double quotes
  • double quotes is used when parsing variables is needed or special characters (\n) are used, no difference in processing speed
  • strlen($string);   // count the string length, i.e. the number of bytes (rather than characters)
  • str_word_count($string, $format, $charlist);   // count the number of words, $format(0 – return number of words, 1 – return an array containing all words, 2 – an associative array with keys indicating the position in the string), $charlist – the list of characters considered as elements of a word (e.g. 3 then fri3nd is considered a word)
  • count_chars($string,1);   // count the occurrence of characters, 1 is the mode, returns an array containing all the characters used (0 will return characters with 0 occurrence)
  • soundex($string);  // Calculate the soundex key of a string (a string 4 characters long, starting with a letter), for searching with the pronunciation, e.g. soundex(“Euler”)  == soundex(“Ellery”); // E460
  • metaphone($string,$phonemes);  // Calculate the metaphone key of a string (it’s more accurate than soundex() as it knows the basic rules of English pronunciation, the metaphone generated keys are of variable length), $phonemes restrict the number of characters returned
  • trim ($string,$optional_char_list);    // trim whitespaces(space/tab/newline) from the beginning and end; if $optional_char_list present, trim characters present in $optional_char_list
  • ltrim ($string,$optional_char_list);   // trim whitespaces from the beginning
  • rtrim ($string,$optional_char_list);   // same as chop(), trim whitespaces from the end

Comparing Strings

  • $string1 == $string2  // compare with data type conversion
  • $string1 === $string2  //compare with data type check
  • strcmp($string1,$string2) // case-sensitive comparison, returns < 0 if str1 is less than str2; > 0 if str1 is greater than str2, and 0 if they are equal (=== 0)
  • strcasecmp($string1,$string2)  // case-insensitive comparison
  • strncasecmp($string1,$string2,$length)  // $length is the length of string for comparison
  • similar_text($string1,$string2[,$percentage])  // return the number of matching chars, if the third argument is passed in, the % of matching will be assigned to the variable ($percentage)
  • levenshtein($string1,$string2)  // levenshtein distance between strings, i.e. the minimum number of characters to be replaced/inserted/deleted to transform the $string to be the same, used for fussy logic for guessing misspelt words. e.g. levenshtein(“ca”,”cn”); //1

Formatting Strings

  • localeconv ();  // get an array containing localized numeric and monetary formatting information
  • nl_langinfo($item);  // get a partial list based on the $item
  • setlocale ( LC_ALL, ”zh-hk”);   // set the local for locale aware functions
  • number_format ($number);  // returned a formatted number
  • money_format ($format, $number);  // money_format() is undefined in Windows.
  • quotemeta ($string);   // a backslash character (\) before every character that is among these: . \ + * ? [ ^ ] ( $ )
  • htmlspecialchars ($string);  // convert &,”,’,<,> into HTML entities e.g. &quot;
  • htmlspecialchars_decode ($string);  // decode the above HTML entities to special characters
  • htmlentities ($string,FLAGS);  // converts all applicable to HTML entities (European accents, etc.), only necessary if your pages use encodings such as ASCII or LATIN-1 instead of UTF-8.
  • html_entity_decode ($string);  // converts all HTML entities in the string
  • get_html_translation_table (HTML_ENTITIES | HTML_SPECIALCHAR);  // will return the translation table that is used internally for htmlspecialchars() and/or htmlentities(), default both
  • strip_tags ($string,$allowable_tags);   // remove all tags [opening&closing] (except $allowable_tags) from the string
  • nl2br ($string);   // convert newline to <br>
  • wordwrap ($string,$length,$break_symbol,$breakwords);   // insert the $break_symbol or default “\n” at every $length, if $breakwords is TRUE, words will be broken down (default FALSE)
  • ucfirst ($string);   // uppercase the first character
  • lcfirst ($string);    // lowercase the first character
  • strtoupper ($string);   // uppercase the string
  • strtolower ($string);    // lowercase the string
  • ucwords ($string);    // uppercase the first character of each word
  • bin2hex ($string); hex2bin ($string);   // convert binary data into hexadecimal data (ASCII representation), vice versa
  • convert_cyr_string ($string);   // convert from 1 Cyrillic character set to another
  • hebrev ($string);    // convert logical Hebrew text to visual text (for rtl reading direction)
  • hebrevc ($string);    // convert logical Hebrew text to visual text with newline conversion
  • chr($ascii);   // return the special character, e.g. chr(10) -> \n
  • ord($string);    // return the ASCII value, e.g. ord(“\n”) -> 10
  • convert_uuencode($string);    // encodes a string using the uuencode algorithm, translate strings into printable characters, about 35% larger than original
  • convert_uudecode($string);     // decodes
  • base64_encode($string);   // encodes using the base64 algorithm, about 33% larger than original
  • base64_decode($string);   // decodes
  • quoted_printable_encode($string);   // >PHP 5.3 encodes a 8 bit string in Quoted-Printable Content-Transfer-Encoding for use in MIME
  • quoted_printable_decode($string);   // decodes
  • print ($string); or print “$string”;   // always return 1
  • printf ($format, $args, …);   // prints a formatted string, returns the length of the outputted string, e.g. $str = printf(‘Hello %s‘,’Edward’); prints Hello Edward, $str == 12
  • sprintf ($format, $args);   // returns a formatted string
  • vprintf ($format, $array_args);   // prints a formatted string, accepts an array as argument
  • vsprintf ();   // returns a formatted string, accepts an array as argument
  • fprintf ($handle, $format, $args);    // sends a formatted string to a resource, e.g. write to an open file stream
  • sscanf ($string, $format, $optional_args);  // read from $string and parse it to the required $format as an array, if $optional_args are supplied, the values of the $format parts will be assigned to the $optional_args
  • fscanf ($handle, $format, $optional_args);  // read from a resource, e.g. file stream, each call reads 1 line
  • str_pad ($string,$length,$optional_padding_character);   // pad a string to a certain $length with the padding character

Examples

$format = 'The %2$s contains %1$04d monkeys'; // 2$=>the second argument, 04d=>with 4 digits
echo printf($format, $num, $location); // The tree contains 0005 monkeys
$s = 'monkey'; $t = 'many monkeys';
printf("[%s]\n",$s); // standard string output [monkey]
printf("[%10s]\n",$s); // right-justification with spaces [    monkey]
printf("[%-10s]\n",$s); // left-justification with spaces [monkey    ]
printf("[%010s]\n",$s); // zero-padding works on strings too [0000monkey]
printf("[%'#10s]\n",$s); // use the custom padding character '#' [####monkey]
printf("[%10.10s]\n",$t); // left-justification but with a cutoff of 10 characters [many monke]

$number = 1234.56;
setlocale(LC_MONETARY, 'en_US');
$money_format = money_format('%i', $number) . "\n"; // USD 1,234.56
$english_format_number = number_format($number, 2, '.', ','); // 1,234.56

$test = "string 1234 string 5678";
$result = sscanf($test, "%s %d %s %d"); // $result = array('string','1234','string','5678');

Printf Formatting Specifiers

  • % – a literal percent character. No argument is required, i.e. %%
  • b – the argument is treated as an integer, and presented as a binary number.
  • c – the argument is treated as an integer, and presented as the character with that ASCII value.
  • d – the argument is treated as an integer, and presented as a (signed) decimal number. 10d – 10 digits
  • e – the argument is treated as scientific notation (e.g. 1.2e+2).
  • u – the argument is treated as an integer, and presented as an unsigned decimal number.
  • f – the argument is treated as a float, and presented as a floating-point number (locale aware). .10f – 10 decimal places
  • F – the argument is treated as a float, and presented as a floating-point number (non-locale aware). Available since PHP 4.3.10 and PHP 5.0.3.
  • o – the argument is treated as an integer, and presented as an octal number.
  • s – the argument is treated as and presented as a string.
  • x – the argument is treated as an integer and presented as a hexadecimal number (with lowercase letters).
  • X – the argument is treated as an integer and presented as a hexadecimal number (with uppercase letters).

Escape Sequences (Control Characters)

  • \n linefeed (LF or 0x0A (10) in ASCII)
  • \r carriage return (CR or 0x0D (13) in ASCII)
  • \t horizontal tab (HT or 0x09 (9) in ASCII)
  • \\ backslash
  • \$ dollar sign
  • \” double-quote
  • \[0-7]{1,3} the sequence of characters matching the regular expression is a character in octal notation

Functions

substr($string, $start, $length)
– returns the substring from the start position with the given length
– a negative start to count from the end
– a negative length to count from the end
– return FALSE on failure or an empty string

substr_compare($string1, $string2, $offset, $length, $case-insentivity)
– binary safe comparison of two strings from an offset, up to length characters
– return 0 if they are equal

substr_count($haystack, $needle, $offset)
– count the number of substring occurrences

substr_replace($string, $replacement, $start, $length)
– replace text within a portion of a string

strstr($haystack, $needle, TRUE | FALSE)  // TRUE is before, FALSE is including the $needle and after
stristr($haystack, $needle, TRUE | FALSE)  // case-insensitive
strchr($haystack, $needle, TRUE | FALSE )
– return the part of $haystack after(and including) or before the first occurrence of $needle
– e.g. $email  = ‘[email protected]’; $domain = strstr($email, ‘@’); // $domain==’@example.com’
$user = strstr($email, ‘@’, TRUE); // $user == ‘name’

strpos($haystack, $needle, $offset)
stripos($haystack, $needle, $offset)   // case-insensitive
strrpos($haystack, $needle, $offset)   // search from end
– find the position of the first occurrence of a substring in a string (optionally ignore the first few elements as specified by $offset)

strpbrk($string, $charlist)   // case sensitive
– break the string starting from the character found and return, or FALSE if it is not found

str_replace($search, $replace, $subject)
str_ireplace($search, $replace, $subject)  // case-insensitive
– find and replace, $search can be an array, an optional $count for the 4th argument returns the number of replacements

strtr($string, $from, $to)
strtr($string, $replace_array)
– translate characters or replace substrings
– e.g. echo strtr(“baab”, “ab”, “01”); // 1001
– e.g. $trans = array(“ab” => “01”); echo strtr(“baab”, $trans); // ba01

str_repeat($string, $times)
– repeat a string a number of $times

str_split($string,$maximum_length)
– split the string into an array, each with a length equals or below the maximum length

strspn($string,$charlist,$start,$length)
– returns the length of the initial segment of subject that contains only characters from mask

strcspn($string,$charlist,$start,$length)
– returns the length of the initial segment of subject that does not contain any characters from mask

strrev($string)
– reverse a given string

str_shuffle($string)
– shuffle the string

parse_str($string,$array)
– parse the string into variables, e.g. $str = “first=value&arr[]=foo+bar&arr[]=baz”; parse_str($str); // $first == ‘value’

parse_url($url)
– parse the url to get the information in an associative array

str_getcsv($input,$delimiter,…)
– PHP > 5.3, parse a CSV string into an array

get_magic_quotes_gpc()
– return 1 if magic_quotes_gpc is on, 0 is off

addslashes($string)
– quote string with backslashes before “,’,\,NUL
stripslashes($string)
– unquote a quoted string
addcslashes($string, $charList_to_add)
– quote string with backslashes before characters listed in $charList_to_add, [\]^_~ will always be quoted
stripcslashes($string, $charList_to_add)
– unquote a string quoted with addcslashes

strtok($string,$delimiter)
– split and return the $string into a smaller chunk based on the $delimiter (not included)
– subsequent calls only requires strtok($delimiter), delimiter not included

chunk_split($string)
– split the string into smaller chunks, e.g. used in base64_encoding() or used in email output

crypt($string,$optional_salt)
– one-way encryption of $string, return a hashed string using the standard Unix DES-based algorithm or alternative algorithms that may be available on the system (from PHP 5.3, PHP contains its now implementation of algorithms), e.g. $hashed_password = crypt(‘mypassword’); if (crypt($user_input, $hashed_password) == $hashed_password) { echo “Password verified!”; }
– slower for better protection
– crypt_md5() vs md5() // crypt_md5() encrypts many times using md5 algorithm.

hash($algo,$data)
– returns the hash according to the $algo selected
md5($string) // not for password
md5_file($filestream)
– returns the hash as a 32-character hexadecimal number
crc32($string) // not for password
– generates the cyclic redundancy checksum polynomial of 32-bit lengths, usually to validate the integrity of data being transmitted
sha1($string) // not for password
sha1_file($filestream)
– Returns the hash as a 40-character hexadecimal number

str_rot13($string)
– shifts every letter by 13 places in the alphabet, encode and decode with the same function

mbstring(Multibyte String)

  • Internal representation of PHP is always encoded with UTF-8
  • While there are many languages in which every necessary character can be represented by a one-to-one mapping to an 8-bit value, there are also several languages which require so many characters for written communication that they cannot be contained within the range a mere byte can code (A byte is made up of eight bits. Each bit can contain only two distinct values, one or zero. Because of this, a byte can only represent 256 unique values (two to the power of eight)). Multibyte character encoding schemes were developed to express more than 256 characters in the regular bytewise coding system.
  • not a default module, must be enabled with the configure option (–enable-mbstring=all)
  • to use function overloading (automatically use mb_ counterpart of some built-in functions), set mbstring.func_overload in php.ini to a positive value that represents a combination of bitmasks specifying the categories of functions to be overloaded. It should be set to 1 to overload the mail() function. 2 for string functions, 4 for regular expression functions, etc.
  • handles encoding conversion
  • mb_check_encoding ($string, $encoding);   // verifies whether the string is valid for the specified encoding
  • mb_internal_encoding (“UTF-8”);  // set internal character encoding to UTF-8
  • mb_strlen ();

PCRE(Perl Compatible Regular Expressions)

  • multi-byte string compatible
  • delimiter – used in the beginning and end of each pattern, can be manually assigned, usually “/”, “#”, “~”, “!” or use brackets: {pattern}
  • greediness – by default the maximum match is returned for each character symbol

Meta-characters

  • \ general escape character
  • [] a class
  • | or
  • () a sub-pattern
  • [^] negate the class, must be put in the first character
  • [-] range
  • Character Classes
  • \d Digits 0-9 [:digit:]
  • \D Anything not a digit
  • \w Any alphanumeric character or an underscore (_) [:word:]
  • \W Anything not an alphanumeric character or an underscore
  • \s Any whitespace (spaces, tabs, newlines) [:space:]
  • \S Any non-whitespace character
  • Any character except for a newline
  • alnum letter and digits
  • alpha letters
  • lower lower case letters
  • upper upper case letters

Anchors

  • ^ Start of a line
  • $ End of a line (if multiline mode is on, /n evaluates to end of line)

Positioners

  • \b word boundary
  • \B not a word boundary
  • \A Start of a string
  • \Z End of a string or newline at end
  • \z End of a string
  • \G first matching position in subject

Quantifiers

  • ? Occurs 0 or 1 time
  • * Occurs 0 or more times
  • + Occurs 1 or more times
  • {n} Occurs exactly n times
  • {,n} Occurs at most n times
  • {m,} Occurs m or more times
  • {m,n} Occurs between m and n times
  • Combination of ? with * or + makes the pattern non-greedy, i.e. *? or +?

Unicode Character Properties (for UTF-8)

  • \p{xx} a character with the xx property
  • \P{xx] a character without the xx property
  • \X an extended Unicode sequence

Pattern Modifiers

  • i – Case insensitive search
  • m – Multiline, $ and ^ will match at newlines
  • s – Makes the dot metacharacter match newlines
  • x – Allows for commenting
  • U – Makes the engine un-greedy
  • u – Turns on UTF8 support
  • e – Matched with preg_replace() allows you to call

Example
$pattern = ‘/^\s+/i’;

Functions

preg_match ($pattern, $subject, $matches, $flags, $offset);   // perform a regular expression match, stop once matched, return 1 if matched, 0 if not matched, FALSE if error occurs
preg_match_all ();  // Perform a global regular expression match, returns the number of matches
preg_grep ($pattern, $array);  // returns the array consisting of the elements of the input array that match the given pattern, keys preserved, like preg_filter except without replacement
preg_filter ($pattern, $replace, $subject);   // returns and replace the $subject when there is a match, $subject can be arrays
preg_replace ($pattern, $replace, $subject);   // returns all the $subject after replacement with matches
preg_replace_callback ($pattern, $callback, $subject)  // transform using a callback function
$array = preg_split ($pattern, $string);  // the array contains the $string split with $pattern
preg_quote ($sting, $optional_delimiter); // format the string into a PECL pattern with escape characters
preg_last_error ();   // return the error code of the last regex execution, e.g. PREG_NO_ERROR, PREG_BAD_UTF8_OFFSET_ERROR

 

 

You can read more about my PHP Certification exam journey here.

Support website running for FREE, thanks!

If you find this post helpful and if you are thinking of buying from Amazon, please support the running cost of this website at no extra cost to you by searching and buying through the search box below. Thank you very much for your help!

Edward Chung

Edward Chung aspires to become a full-stack web developer and project manager. In the quest to become a more competent professional, Edward studied for and passed the PMP Certification, ITIL v3 Foundation Certification, PMI-ACP Certification and Zend PHP Certification. Edward shares his certification experience and resources here in the hope of helping others who are pursuing these certification exams to achieve exam success.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

2 Responses

  1. Ajay Aggarwal says:

    Hi, I am preparing for Zend PHP certification.
    But i am not able to find latest php 7 study material.
    And its very tough to go through PHP manual and practice and remember all functions as there are thousands of functions.
    Cam you please guide me to how to prepare for exam in less time.
    And have you found php certification mock tests anywhere.

    • Hi Ajay,

      Really sorry that I do not have any shortcuts. At the time of my exam prep, I read through the PHP documentation website page by page – it was hard work.

      Hope you can find an easier way.

      Wish you PHP Certification success!