Using Javascript RegExp

How to use Javascript RegExp objects.

  • For some string methods, such as match(), replace() and search() where you wish to search for strings within strings, it will be a need for a RegExp object that specifies a search pattern.
  • RegExp objects may be created with the RegExp () constructor function, or as a literal expression defined within a couple of slash (/) characters.
    // create an pattern object with the RegExp constructor
    var pattern1 = new RegExp("\s");
    // create an pattern object by using a special literal syntax. 
    var pattern2 = /\s/;
    // Both are a pattern to search for the first space character in a string.
  • Regular expression pattern specification consists of a series of characters, where most characters, including all alphanumeric characters, simply describe the character to be matched literally. Thus, the expression /JavaScript/ matches a string that contains the substring "JavaScript".

Escape sequences characters.

  • Regular-expression syntax supports certain nonalphabetic characters through escape sequences that begin with a backslash (\).
    Character Matches
    Alphanumeric character The character itself.
    \0 The NUL character (\u0000)
    \t Tab (\u0009)
    \n Newline (same as \u000A or x0A or \cJ)
    \v Vertical tab (same as \u000B or x0B or \cK)
    \f Form feed (same as \u000C or x0C or \cL)
    \r Carriage return (same as \u000D or x0D or \cM)
    \xnn The Latin character specified by the hexadecimal number nn'
    \unnnn The Unicode character specified by the hexadecimal number nnnn'
    \cN Where the control characters N is A->Z, which indicates Control+A through Control+Z. These are equivalent to \x01 through \x1A (26 decimal). So for example \cJ = \x10 = \u000A = \n

Regular expression contol characters.

  • Some characters have special meanings in regular expressions. These characters are:
    ^ $ . * + ? = ! : | \ / ( ) [ ] { }
  • If you want to include any of these characters literally in a regular expression, you must precede them with the backslash character \ (escape character).
    var pattern1 = /\\/;  \\ match backslash character
    var pattern2 = /\[/;  \\ match starting square brackets
    var pattern3 = /\:/;  \\ match colon character
    

Regular expression character classes.

  • Characters can be combined into character classes by placing them within square brackets ([]).
  • A character class matches any one character that is contained within it.
    var pattern1 = /[klm]/;  \\ matches any one of the letters k, l, or m.
  • A negated character class is specified by placing a caret (^) as the first character inside the brackets.
    var pattern1 = /[^klm]/;  \\ matches any one character other than k, l, or m.
  • To specify a range of characters within the character class you must use a hyphen (-).
    var pattern1 = /[a-z]/;  \\ matches any character from a to z.
  • Here is a list of available expression that can be used in character classes:
    Character Matches
    [...] Any one character between the brackets. (hyphen (-) may be used to specify a range of characters)
    [^...] Any one character not between the brackets. (hyphen (-) may be used to specify a range of characters)
    \d Any ASCII digit. Equivalent to [0-9].
    \D Any character other than an ASCII digit. Equivalent to [^0-9].
    \w Any ASCII word character. Equivalent to [a-zA-Z0-9_].
    \W Any character that is not an ASCII word character. Equivalent to [^a-zA-Z0-9_].
    \s Any Unicode whitespace character.
    \S Any character that is not Unicode whitespace.
    . Any character except newline or another Unicode line terminator.
    Examples:
    <script type="text/javascript">
    var myarr =["Roast Beef","Tuna salad","Turkey Breast"];
    for (var menu in myarr){
    // replace the space character with underline for each menu
      document.write(myarr[menu].replace(/\s/,"_")+"<br>");
    }
    </script>

Regular expression repetition.

  • You can specify how many times an element of a regular expression may be repeated.
  • The characters indicating the repetition follows always the pattern, which is meant to be repeated.
  • Characters that are used to indicate repetition:
    Character Matches
    {n} Will match exactly n occurrences of the previous item.
    {n,} Will match the previous item n times or more.
    {n,m} Will match the previous item at least n times but no more than m times.
    * Will match zero or more occurrences of the previous item.
    It is a short notation for {0,}.
    + Will match one or more occurrences of the previous item.
    It is a short notation for {1,}.
    ? Will match zero or one occurrence of the previous item.
    It is a short notation for {0,1}.
    Examples:
    <script type="text/javascript">
      var strarr =["Roast Beef $3.80","Tuna salad $.90","Turkey Breast $2.65"];
      for (var menu in strarr){
        // match for all $.dd, $d.dd, $dd.dd $ddd.dd and $dddd.dd
        document.write(strarr[menu].replace(/\$\d{0,4}\.\d{2}/,
                                    "{need new price}")+"<br>");
      }
    </script>
    <script type="text/javascript">
      var str ="The Roast Beef tasted ........ !";
      // match for 2 or more dots followed by a space
      document.write(str.replace(/\.{2,}\s/,"very good"));
    </script>

Alternation, Grouping, and References.

  • The | character is used to separates alternatives in the regular expression.
    Example:
    <script type="text/javascript">
      var strarr =["Roast Beef ....","Tuna salad $$","Turkey Breast ---"];
      for (var menu in strarr){
        // match one or more dot,
        // one or more $
        // or one or mode -
        document.write(strarr[menu].replace(/\.+|\$+|-+/,
                                    "{need new price}")+"<br>");
      }
    </script>
  • Parentheses () can be used to group elements of an expression in a certain sub-expressions, so that these sub-expressions can be treated as individual units.
    Example:
    <script type="text/javascript">
      var str =["Registration number is HG....","Registration number is AC2345",
                "Registration number is KU234523"]
       for( var word in str) {
         // match 2 (A-Z) char. followed with
         // one or more dots
         // or one or more digits
         document.write(str[word].replace(/[A-Z]{2}(\.+|\d+)/,"secret")+ "<br>");
       }
    </script>
  • Sub-expression that you put in brackets are supplied with a reference number you can use later in the same regular expression. You refer to the preceding sub-expression using the notation \n, where n is the reference number.

    A such reference enforce a constraint that separate portions of a string contain exactly the same characters.

    Example:
    <script type="text/javascript">
      var str =["Registration number is 'HG3462'","Registration number is 'AC2345\"",
       "Registration number is \"KU234523'","Registration number is \"BXY523\""];
       // First without any reference
       document.write("First without any reference: <br>");
       for( var word in str) {
         document.write(str[word].replace(/['"][^'"]*['"]/,"secret")+ "<br>");
       }
       // Using reference
       document.write("Using reference: <br>");
       for( var word in str) {
         document.write(str[word].replace(/(['"])[^'"]*\1/,"secret")+ "<br>");
       }
    </script>

    It is not legal to use a reference within a character class (ex.: [^\1]).

    It is possible to group items in a regular expression without creating a numbered reference to those items. To do this, begin the group with (?: and end it with ).

  • Here is a character summary used for Alternation, Grouping, and References:
    Character Matches
    | Will match either the subexpression to the left or the subexpression to the right. Have an alternate function as the character can be used several times in the same regular expressions.
    (...) Group items into a single unit that can be used with *, +, ?, |, and so on. These groups have a reference number that can be used later as a reference in the same expression. With this we want to achieve match for equal characters existing in the referenced group..
    \N Match the same characters that were matched when group number N was first matched. Group numbers are assigned by counting left parentheses from left to right.
    (?:...) Group items into a single unit, but cannot be used as a reference.

To specify regular-expression anchors.

  • The control character, ^, bind the pattern to the beginning of the string, and the control character, $, bind the pattern to the end of the string (remember that the caret (^) is also used to negated character classes).

    Example:
    <script type="text/javascript">
      var str =["Beginning of a string","Start of a string",
                "3. String in array"]
       for( var word in str) {
         // Match only in strings that start with B or S
         document.write(str[word].replace(/^[BS]/,"#")+ "<br>");
       }
       for( var word in str) {
         // Match only in strings that end with y
         document.write(str[word].replace(/[y]$/,"#")+ "<br>");
       }
    </script>
  • Sometimes we want to get a match on a word without having to include characters in front or at end of the word (eg space). In such cases, you can specify word boundaries with, \b, control characters.

    Example:
    <script type="text/javascript">
      var str =["Java is not Javascript","Javascript is untyped",
                "Java is typed"]
       for( var word in str) {
         // Match only the word 'typed' (not 'untyped')
         document.write(str[word].replace(/\btyped\b/,"______")+ "<br>");
       }
    </script>
  • The control characters; \B gives us the opposite meaning and is used for none word boundaries.

    Example:
    <script type="text/javascript">
      var str =["Java is not Javascript","Javascript is untyped",
                "Java is typed"]
       for( var word in str) {
         // Match word 'typed' for words that end with 'typed')
         document.write(str[word].replace(/(\B|\b)typed\b/,"______")+ "<br>");
       }
    </script>
  • You can specify a special expression (?= p ), where p is a pattern that defines a required end boundary for a string match.

    Example:
    <script type="text/javascript">
      var str =["Java is not Javascript","Javascript is untyped",
                "Java is typed"]
       for( var word in str) {
       // Match all 'Java' word where the following character is 's'
         document.write(str[word].replace(/Java(?=s)/,"____")+ "<br>");
       }
    </script>
  • You can specify a special expression (?! p ), where p is a pattern that defines a required end boundary that should not lead to a string match.

    Example:
    <script type="text/javascript">
      var str =["Java is not Javascript","Javascript is untyped",
                "Java is typed"]
       for( var word in str) {
       // Match all Java word where the following character is NOT s
         document.write(str[word].replace(/Java(?!s)/,"____")+ "<br>");
       }
    </script>
  • Summary of regular-expression anchors:
    Character Matches
    ^ Will match the beginning of a string or each line in multiline searches.
    $ Will match the end of a string or each line in multiline searches..
    \b Will match a word boundary.
    \B Will match a position that is not a word boundary.
    (?=p) A match require that the following characters match the pattern p, but do not include those characters in the match.
    (?!p) A match require that the following characters do not match the pattern p.

To specify flags for regular-expression.

  • Unlike the rest of regular-expression syntax, flags are specified outside and after the second slash and defines high-level pattern-matching rules.
  • To to do a case-insensitive search you must end the second slash with 'i' (ex,: /p/i ).

    Example:
    <script type="text/javascript">
      var str =["We are learning javascript","I like both java and javascript",
                "Java is not the same as Javascript"]
       for( var word in str) {
       // Match all Java word where the following char.= s
         document.write(str[word].replace(/\bjava(\b|\B)/i,"[JAVA]")+ "<br>");
       }
    </script>
  • To achieve multiple matches within the searched string you must end the second slash with 'g' (ex,: /p/g ). This is called a global search.
  • You can of course combine this with a case-insensitive search (ex,: /p/gi).

    Example:
    <script type="text/javascript">
      var str =["We are learning javascript","I like both java and javascript",
                "Java is not the same as Javascript"]
       for( var word in str) {
       // Match all Java word where the following char.= s
         document.write(str[word].replace(/\bjava(\b|\B)/gi,"[JAVA]")+ "<br>");
       }
    </script>
  • Summary these regular-expression flags:
    Character Matches
    i Perform a case-insensitive matching search.
    g Perform a global matching search, which is to find all matches rather than stopping after the first match.
    m Perform a multiline mode search. ^ matches beginning of line or beginning of string, and $ matches end of line or end of string.

© 2010 by Finnesand Data. All rights reserved.
This site aims to provide FREE programming training and technics.
Finnesand Data as site owner gives no warranty for the correctness in the pages or source codes.
The risk of using this web-site pages or any program codes from this website is entirely at the individual user.