SCJP – Regex

  • left to the right
  • the chars are consumed
  • "\d" compile error    --> escape like this: "\\d"   --> This is a metachar
  • "\." compile error    --->escape like this: "\\."   --> This is a dot, not a metachar
  • 0[xX]([0-9a-fA-F])+    -- The parentheses and "+" augment the previous find-the-hex expression by saying in effect: "Once we've found our 0x or 0X, you can find from one to many occurrences of hex digits."
  • [AB] means a A or a a B
    • [\\s\\d] - means a space or a digit

Greedy vs Reluctant quantifiers

Greedy Reluctant
* *?
? ??
+ +?

Greedy quantifiers looks to the entire source data, and right to left matches to the bigger expression.

void testSimpleGreedy() {
		System.out.println("======= testSimpleGreedy ====");
		Pattern p = Pattern.compile(".*xx");
		Matcher m = p.matcher("yyxxxyxx");
		while (m.find()) {
			System.out.println(m.start() + "-" + m.group());
		}
	}

	void testSimpleReluctant() {
		System.out.println("======= testSimpleReluctant ====");
		Pattern p = Pattern.compile(".*?xx");
		Matcher m = p.matcher("yyxxxyxx");
		while (m.find()) {
			System.out.println(m.start() + "-" + m.group());
		}
	}

Output

======= testSimpleGreedy ====
0-yyxxxyxx
======= testSimpleReluctant ====
0-yyxx
4-xyxx

Match empty string

'*' at the end of the pattern will match one time the empty string at the end!

	void testSimple2() {
		System.out.println("======= testSimple2 ====");
		Pattern p = Pattern.compile("\\w*");
		Matcher m = p.matcher("babaa2aaab");
		while (m.find()) {
			System.out.println(m.start() + "-" + m.group());
		}
	}

Output

======= testSimple2 ====
0-babaa2aaab
10-
  • group() will throw a IllegalStateExceptionif the matcher doesn't found anything
    • usually m.group is conditional to the m.find returns true, as in example above

Metacharacters

  • \d A digit
  • \s A whitespace character
  • \w A word character (letters, digits, or "_" (underscore))
  • "\s" - is a compiler error. We need to escape this character with \. "\\s" its a correct space.

Tags:

Leave a Reply