Regular Expressions for Beginners — Common Patterns and Syntax

A regular expression (regex) is a small language for describing patterns in text with a short expression. It is widely used wherever you handle text: searching, extracting, replacing, and validating input. This article organizes the basic building blocks — character classes, metacharacters, quantifiers, groups, anchors, and flags — and then covers simple patterns for email, phone, URL and date, along with pitfalls such as ReDoS, all based on JavaScript syntax.

To start: with regex, being readable and correctly scoped matters more than being short. Rather than cramming a complex expression into one line, it is safer to break it up sensibly for the task and always verify it against real examples as you build. All code examples in this article use JavaScript syntax (/pattern/flags).

1. What a regular expression is

A regular expression represents a condition on text — such as "three digits in a row" or "something that looks like an email containing @" — as an expression built from symbols. In JavaScript you create one with a /pattern/flags literal, or with new RegExp("pattern", "flags").

For example, to simply look for the sequence "cat", you write /cat/. A part made of plain characters matches those characters themselves. The main uses are as follows.

Searching / testing: /cat/.test("category") returns true if it is contained.
Extracting: "a1b2".match(/\d/g) pulls out the matches as ["1","2"].
Replacing: "a-b-c".replace(/-/g, "_") yields "a_b_c".

From here, we look in turn at the meaning of the symbols (metacharacters) that make up a pattern. When you want to strip a symbol's special meaning and treat it as "just a character", place a \ (backslash) before it to escape it. For instance, to represent a literal ., write \..

2. Basic building blocks — character classes, metacharacters, quantifiers

The foundation of regex is the symbols that express "which characters" to match and "how many". Let us go through them in order.

Character classes [ ] and \d \w \s

A character class lists candidates inside [ ] and matches any one of them. Use - for a range and a leading ^ for negation.

[abc] … any one of a, b, or c.
[a-z] … one lowercase letter (range specification).
[^0-9] … one non-digit character (the leading ^ negates).

Commonly used character classes have shorthand forms. Uppercasing them gives "the negation".

Notation	Meaning	Negation
`\d`	One digit (essentially `0`-`9`)	`\D` (non-digit)
`\w`	One word character (`a-z A-Z 0-9 _`)	`\W` (non-word character)
`\s`	One whitespace character (space, tab, newline, etc.)	`\S` (non-whitespace)

Metacharacters . ^ $

Metacharacters are symbols with special meaning. Here are the representative ones.

. … any single character (excluding newline by default). To represent a literal ., escape it as \..
^ … an anchor for the start of the line (string) (see below).
$ … an anchor for the end of the line (string) (see below).
| … "or". cat|dog matches either cat or dog.

Quantifiers * + ? {n,m}

Quantifiers express how many times the preceding element repeats.

Notation	Meaning	Example
`*`	0 or more times	`ab*c` matches `ac`, `abc`, `abbc` …
`+`	1 or more times	`ab+c` matches `abc`, `abbc` … (not `ac`)
`?`	0 or 1 time	`colou?r` matches `color` and `colour`
`{n}`	Exactly n times	`\d{4}` is four digits
`{n,m}`	At least n, at most m times	`\d{2,4}` is two to four digits

Greedy and lazy matching

Quantifiers are greedy by default and match as much as possible while still satisfying the pattern. Adding ? right after them makes them lazy, matching as little as possible.

When the target string is <a><b>, the greedy <.+> grabs everything from the first < to the last >, that is, the whole <a><b>.
The lazy <.+?> stops at the first > and matches only <a>.

When "more is captured than you expected", greedy matching is almost always the cause. Add a ? to make it lazy (*?, +?), or use a character class that excludes the delimiter such as [^>]+, and you can scope it as intended.

3. Groups and capturing — ( ), backreferences, named

Parentheses ( ) bundle several elements into one unit. You can apply a quantifier to the whole group, and groups also capture (extract) the matched part.

Grouping: (ab)+ is one or more repetitions of ab (ab, abab …).
Capturing: (\d{4})-(\d{2}) captures the year into the first parentheses and the month into the second. You can reference them in the match result array or in a callback.
Non-capturing: when you only need grouping and not extraction, write (?:...). (?:ab)+ bundles the repetition without capturing.

Backreferences

Captured content can be backreferenced within the same pattern as \1, \2 … (numbered by the order the parentheses appear). It is used to express a repetition of a string that appeared earlier.

(\w)\1 … the same word character twice in a row (such as oo).
They also work in replacement; in JavaScript's replace replacement string you use the notation $1, $2.

Named captures

Giving a name instead of a number improves readability. Capture with (?<name>...) and retrieve it by name from groups in the match result. A backreference within the same pattern is \k<name>, and in the replacement string it is $<name>.

Example: (?<year>\d{4})-(?<month>\d{2}) lets you retrieve the year and month via groups.year and groups.month.

4. Anchors and boundaries — ^ $ \b

Anchors match a "position" rather than a character itself. They are zero-width (they consume no characters).

^ … the start position. With the m flag it becomes the start of each line.
$ … the end position. With the m flag it becomes the end of each line.
\b … a word boundary. It matches the boundary between a word character (\w) and a non-word character (\W), or the edge of the string.
\B … a position that is not a word boundary.

For example, ^\d+$ expresses the condition "the whole thing is only digits" (one or more digits from start to end). Without anchors it merely means "contains a digit somewhere", which is insufficient for input validation.

When you want to search by whole words, \b is handy. \bcat\b matches the word cat but not part of category. Conversely, plain /cat/ also matches inside category — keep that in mind.

5. Flags — g i m s u

Flags are options that change the behavior of the whole pattern; in a literal you append them after the closing / (e.g. /abc/gi). Here are the representative ones.

Flag	Name	Effect
`g`	global	Does not stop at the first match; targets all matches
`i`	ignoreCase	Does not distinguish upper and lower case
`m`	multiline	Applies `^` and `$` to the start and end of each line
`s`	dotAll	Makes `.` match newlines as well
`u`	unicode	Handles Unicode correctly (code-point units for emoji etc., the `\u{...}` notation)

For example, to "find all cat ignoring case", use /cat/gi. Flags can be combined and their order does not matter.

6. Common patterns — email, phone, URL, date

The table below collects simple patterns commonly seen in practice, combining the elements above. None of them are strict spec-compliant validations; they are practical rules of thumb only. When you truly need accurate validation (especially for email), pair them with a dedicated library or an actual delivery check.

Target	Simple pattern (example)	Description
Email (simple)	`^[^\s@]+@[^\s@]+\.[^\s@]+$`	The minimal shape "non-whitespace/non-`@` + `@` + domain + `.` + TLD". Not strictly RFC-compliant
Phone (Japan, hyphen-separated)	`^0\d{1,4}-\d{1,4}-\d{4}$`	A simple form starting with `0`, of digits and hyphens. Digit counts vary by region, so this is only a guide
URL (http/https)	`^https?:\/\/[^\s]+$`	`http` or `https` (`s?`) followed by a non-whitespace string. `/` is escaped as `\/`
Date (YYYY-MM-DD form)	`^\d{4}-\d{2}-\d{2}$`	Checks only the digit-count shape. Logical validity such as "month 13" must be checked separately

A pattern only looks at the "shape". For instance, the date pattern ^\d{4}-\d{2}-\d{2}$ also accepts 2026-13-99. Non-existent dates, or the real existence of an email, are semantic validity outside the scope of regex. Keep shape checking and value validation separate.

7. Pitfalls — over-complication and ReDoS

Regex is powerful, but writing too much makes it unreadable and invites performance problems. Finally, here are practical pitfalls.

Avoid over-complication: cramming everything into one expression makes it unreadable and unfixable later. It is also a sound choice to split it into multiple steps, add comments (in languages with an x flag), or delegate to a dedicated parser.
Watch out for ReDoS (regex denial of service): patterns with nested quantifiers such as (a+)+ or (.*)* can cause backtracking to explode on certain inputs, making processing extremely slow. Places that pass external input straight into a complex pattern are especially dangerous.
Mitigations: avoid nested quantifiers, limit the range with a character class such as [^>]+, and cap the input length where possible. Designing so that you never execute untrusted patterns also helps.

In short, the basics of regex are "keep it small, keep it readable, and verify with real examples". Master the four — character classes and quantifiers, anchors, and flags — and you can cover most everyday patterns.

Free Tool Try it for real with the Regex Tester Enter a pattern and flags and check matches and capture results against your target text right in the browser. Verify what you wrote instantly.

Frequently Asked Questions (FAQ)

What is a regular expression?

A regular expression (regex) is a small language for describing patterns in text. It expresses conditions such as "three digits in a row" or "something that looks like an email containing @" as a short expression, and is used for searching, extracting, replacing, and validation. It is built into many programming languages and editors; in JavaScript you work with it through /pattern/flags literals or the RegExp object.

What is the difference between \d and \w?

\d matches a single digit (essentially 0-9). \w matches a single "word character": the alphanumerics (a-z, A-Z, 0-9) plus the underscore _. In other words, \w includes \d and additionally covers letters and the underscore, making it a broader character class. Uppercasing either one negates it (\D is any non-digit, \W is any non-word character).

What is greedy matching?

Quantifiers (* + ? {n,m}) are "greedy" by default and try to match as much as possible while still satisfying the pattern. For example, <.+> grabs everything from the first < to the last >. Adding ? right after them, as in *? +? ?? {n,m}?, makes them "lazy" so they match as little as possible. <.+?> stops at the first >.

Regular Expressions for Beginners — Common Patterns and Syntax

1. What a regular expression is

2. Basic building blocks — character classes, metacharacters, quantifiers

Character classes [ ] and \d \w \s

Metacharacters . ^ $

Quantifiers * + ? {n,m}

Greedy and lazy matching

3. Groups and capturing — ( ), backreferences, named

Backreferences

Named captures

4. Anchors and boundaries — ^ $ \b

5. Flags — g i m s u

6. Common patterns — email, phone, URL, date

7. Pitfalls — over-complication and ReDoS

Related pages

Frequently Asked Questions (FAQ)