Regular expressions in java with regex

A regular expression is a string of characters that describes a pattern in a sequence of characters. You can use the API  regex  for:
  • Validate a sequence of characters, for example, check the validity of an email or password;
  • Search in a  string;
  • Replace a pattern or set of characters in a String.
The API java.util.regex has a single interface and three classes:
  • Pattern class: A compiled representation of a regular expression. To create a pattern, you must invoke one of its methods public static compile,  which will return a Pattern object. These methods accept regular expressions as argument.
  • Matcher class: A pattern search engine that parses the string. You get the Matcher object by calling the Matcher method in the Pattern object. These two classes work together.
  • PatternSyntaxException: throws an exception when the regular expression is invalid.

Matcher class

The methods of the Matcher:

< td class="tg-031e">find the next expression that checks the pattern
No. Method Description
1 boolean matches() returns true if the string checks a pattern
2 boolean find()
3 boolean find(int start) find the next expression that checks the pattern from a start index

Pattern class

This is the compiled version of a regular expression. it is used for a pattern or regex.

< td class="tg-031e">4
No. Method Description
1 static Pattern compile(String regex) compiles the regex and returns an instance of Pattern
2 Matcher matcher( CharSequence input) creates a Matcher that parses the input sequence
3 static boolean matches(String regex, CharSequence input) compiles and parses the input sequence.
String[] split(CharSequence c) returns an array of substrings that begin with the character c
5 String pattern() returns the regular expression string

Example

import java.util.regex.*; 

public class regexTest {

public static void main(String args[]) {
Pattern p;
Matcher m;
//compilation of the regex with the pattern: "a"
p = Pattern.compile("a");
//create and associate the engine with the regex on the string "ab"
m = p.matcher("ab");
//if the pattern is found
if(m.find()) {
System.out.println("pattern found");
}
}
}
pattern found

Regular expression syntax

1- Meta characters

Meta characters are characters with a meaning or in some other way, how the pattern is constructed. For example, if you precede a meta character with the character , it would not be interpreted by the parser. The metacharacters supported by java regular expressions are in the following table:

Character Description
[ ] defines a set of characters within a
{ } Quantizer
\ character is not considered metacharacter
^ Beginning of line
$ Endline
| Operator OU
? 0 or once the preceding expression
* 0 or more than one time the preceding expression
+ one or more times the preceding expression
. Replaces any character

Example:

import java.util.regex.Matcher; 
import java.util.regex.Pattern;

public class metaCharactersExample: {

public static void main(String[] args) {
System.out.println(
Pattern.matches(".c", "abc"));//false (. only replaces a)
System.out.println(
Pattern.matches(".. c", "abc"));//true (3rd character is c)
System.out.println(
Pattern.matches("... c", "abbc"));//true (4th character is c)

System.out.println(Pattern.matches("\\d", "2"));//true (only one digit)
System.out.println(Pattern.matches("\\d", "332"));//false (multiple digits)
System.out.println(Pattern.matches(
"\\d","123abc"));//false (digits and characters)

System.out.println(Pattern.matches(
"\\D*", "geek"));//true (Unencrypted and appears at least once)
}
}
The matches()belongs to the Matcher and Pattern class, it returns true if the pattern you are looking for exists in the string.

2- Character classes

A character class is a set of characters. The metacharacters [...] means a character class within a regular expression. You can define the range with the hyphen '-'. For example[0-9] represents the digits from 0 to 9.

[abc] a, b or c
[^abc] Negation: Replaces all alphabet except a,b and c
[a-zA-Z] Range: Replaces all characters from a to z and from A to Z
[a-d[m-p]] Union: Replaces characters from a to d or from m to p: [a-dm-p]
[a-z& & [abc]] Intersection: Replaces the entire intersection of a,b, and c with characters from a to z
[a-z& & [^cd]] Subtraction: Replaces all characters from a to z except c and d: [abe-z]
[a-z& & [^m-p]] Subtraction: from a to z except from m to p: [a-lq-z]

Example

import java.util.regex.Matcher; 
import java.util.regex.Pattern;

public class metaFeaturesExample: {

public static void main(String[] args) {
Pattern p;
Matcher m;
//all digits from 0 to 9 except 345
p = Pattern.compile("[0-9& & [^345]]");
m = p.matcher("7");
boolean b = m.matches();
System.out.println(b);
}
}
true

Predefined character classes

These are the classes already defined in the Java:

Class Description
. Any character
\d One number: [0-9]
\D Any character except the numbers [^0-9]
\s A blank character: line break, space: [ \t\n\x0B\f\r]
\S A non-white character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A character that is not a word: [^\w]

Example

import java.util.regex.Matcher; 
import java.util.regex.Pattern;

public class CharacterClassesExample{
public static void main(String args[]) {
//d: one digit
//+: 1 or more digits
String regex = "\\d+";
Pattern p = Pattern.compile(regex);

String phrase = "the year 2015 21";

Matcher m = p.matcher(sentence);

if (m.find()) {
//show first group
System.out.println("group 0:" + m.group(0));
}
}
}
group 0:2015
In this example, the regex "\\d+" contains two slashes, because in Java you always add a '\ before. \d means the range between 0 and 9 If you remove the '+ , only the first digit found will be considered: 2.

3-  Quantifiers

Quantizers allow you to set the number of times a character is repeated.

Quantizers Description
X? X occurs no more than once
X+ One or more times
X* zero or multiple times
X{n} n times
X{n, } n or multiple times
X{y,z} at least y times but less than z times

Examples

Motif String Results
[abc]? a a
[abc]? aa none
[abc]? ab none
a? abdaaaba {a}, { },{ },{a},{a},{a},{ },{a}
a* abdaaaba {a},{ },{ },{aa},{ },{a}
a+ abdaaaba {a}, {aaa},{a}
a[3] aaaa aaa
a{3, 6} aaaaaaaa aaaaaa
[0-9]{4} The year 2038 bug is similar to the year 2000 bug {2038}, {2000}

4- Capture groups

Capture groups give the ability to process multiple characters as a single unit or sub-pattern. For example, (abc) creates a single group containing the characters "a", "b" and "c".

Catch groups are counted by the number of parentheses opening from left to right. In the expression (A(B(C))), there are 4 groups. Group 0 always contains the entire expression:
  1. Group 0: (A(B(C)))
  2. Group 1:  (A)
  3. Group 2:  (B(C))
  4. Group 3:  (C)
To find out in Java how many groups there are in an expression, invoke the method groupCount() of the Matcher object. The groupCount() returns a int which represents the total number of groups. In the following example, goupCount() would return the number 4.

The substring captured by the group is returned by the group(int).

Example 1:

import java.util.regex.Matcher; 
import java.util.regex.Pattern;

public class Goup {

public static void main(String[] args) {

Pattern p = Pattern.compile("(A(B(C)))");
Matcher m = p.matcher("ABC");
if( m.matches())
for(int i= 0; i<= m.groupCount(); ++i)
System.out.println("group "+i+" :"+m.group(i));

}
}
group 0:ABC
group 1:ABC
group 2:BC
group 3:C
Example 2

This program creates a regular expression that reads and checks the validity of a phone number:

import java.util.regex.Matcher; 
import java.util.regex.Pattern;

public class regex_telephone {

public static void main(String[] args) {

String regex = "\\b(\\d{3})(\\d{4})(\\d{3})(\\d{3})\\b";
Pattern p = Pattern.compile(regex);
String tel = "2541724156348";

Matcher m = p.matcher(tel);

if (m.find()) {
System.out.println("Phone: ("
+ m.group(1) + ") " + m.group(2) + "-"
+ m.group(3) + "-" + m.group(4));
}
}
}
Phone: (254) 1724-156-348

5- Search boundaries

You can make your pattern more precise by specifying the location of the pattern you are looking for and where it starts.

limiter Description
^ Beginning of Line
$ Endline
\b Word End.
\B Non-Word End.
\A Input Sequence Start
\ G End of previous occurrence
\Z End of sequence, except for the final character
\z End of sequence

Examples

Motif String Results
^java$ java java
\s*java$ java java
^hello\w* helloblahblah helloblahblah
\bjava\B javascript is a programming language java
\Gtest test test test
\btest\b this is a test test

Find a pattern and replace it

The Regex API gives you the ability to find one text and replace it with another. In Java, you can use two methods of the Matcher class to accomplish this task:
  • replaceFirst(String): Replaces the first occurrence only;
  • replaceAll(String): Iterates and replaces all occurrences.
Example:

import java.util.regex.Matcher; 
import java.util.regex.Pattern;

public class Replacement {
public static void main(String[] args) {
Pattern p = Pattern.compile("bus");
Matcher m = p.matcher("I'm traveling by bus");
String s = m.replaceAll("train");
System.out.println(s);
}
}
I'm travelling by train
References:
Java Doc:  Regular Expressions
JavaPoint:   Java Regex
TutorialsPoint:  Java - Regular Expressions
Expand:  Regular expressions with the Java