A regular expression (regex) is a small language for describing patterns in text with a short expression. It is widely used wherever you handle text: searching, extracting, replacing, and validating input. This article organizes the basic building blocks — character classes, metacharacters, quantifiers, groups, anchors, and flags — and then covers simple patterns for email, phone, URL and date, along with pitfalls such as ReDoS, all based on JavaScript syntax.
JavaScript syntax (/pattern/flags).
1. What a regular expression is
A regular expression represents a condition on text — such as "three digits in a row" or "something that looks like an email containing @" — as an expression built from symbols. In JavaScript you create one with a /pattern/flags literal, or with new RegExp("pattern", "flags").
For example, to simply look for the sequence "cat", you write /cat/. A part made of plain characters matches those characters themselves. The main uses are as follows.
- Searching / testing:
/cat/.test("category")returnstrueif it is contained. - Extracting:
"a1b2".match(/\d/g)pulls out the matches as["1","2"]. - Replacing:
"a-b-c".replace(/-/g, "_")yields"a_b_c".
From here, we look in turn at the meaning of the symbols (metacharacters) that make up a pattern. When you want to strip a symbol's special meaning and treat it as "just a character", place a \ (backslash) before it to escape it. For instance, to represent a literal ., write \..
2. Basic building blocks — character classes, metacharacters, quantifiers
The foundation of regex is the symbols that express "which characters" to match and "how many". Let us go through them in order.
Character classes [ ] and \d \w \s
A character class lists candidates inside [ ] and matches any one of them. Use - for a range and a leading ^ for negation.
[abc]… any one ofa,b, orc.[a-z]… one lowercase letter (range specification).[^0-9]… one non-digit character (the leading^negates).
Commonly used character classes have shorthand forms. Uppercasing them gives "the negation".
| Notation | Meaning | Negation |
|---|---|---|
\d | One digit (essentially 0-9) | \D (non-digit) |
\w | One word character (a-z A-Z 0-9 _) | \W (non-word character) |
\s | One whitespace character (space, tab, newline, etc.) | \S (non-whitespace) |
Metacharacters . ^ $
Metacharacters are symbols with special meaning. Here are the representative ones.
.… any single character (excluding newline by default). To represent a literal., escape it as\..^… an anchor for the start of the line (string) (see below).$… an anchor for the end of the line (string) (see below).|… "or".cat|dogmatches eithercatordog.
Quantifiers * + ? {n,m}
Quantifiers express how many times the preceding element repeats.
| Notation | Meaning | Example |
|---|---|---|
* | 0 or more times | ab*c matches ac, abc, abbc … |
+ | 1 or more times | ab+c matches abc, abbc … (not ac) |
? | 0 or 1 time | colou?r matches color and colour |
{n} | Exactly n times | \d{4} is four digits |
{n,m} | At least n, at most m times | \d{2,4} is two to four digits |
Greedy and lazy matching
Quantifiers are greedy by default and match as much as possible while still satisfying the pattern. Adding ? right after them makes them lazy, matching as little as possible.
- When the target string is
<a><b>, the greedy<.+>grabs everything from the first<to the last>, that is, the whole<a><b>. - The lazy
<.+?>stops at the first>and matches only<a>.
? to make it lazy (*?, +?), or use a character class that excludes the delimiter such as [^>]+, and you can scope it as intended.
3. Groups and capturing — ( ), backreferences, named
Parentheses ( ) bundle several elements into one unit. You can apply a quantifier to the whole group, and groups also capture (extract) the matched part.
- Grouping:
(ab)+is one or more repetitions ofab(ab,abab…). - Capturing:
(\d{4})-(\d{2})captures the year into the first parentheses and the month into the second. You can reference them in thematchresult array or in a callback. - Non-capturing: when you only need grouping and not extraction, write
(?:...).(?:ab)+bundles the repetition without capturing.
Backreferences
Captured content can be backreferenced within the same pattern as \1, \2 … (numbered by the order the parentheses appear). It is used to express a repetition of a string that appeared earlier.
(\w)\1… the same word character twice in a row (such asoo).- They also work in replacement; in JavaScript's
replacereplacement string you use the notation$1,$2.
Named captures
Giving a name instead of a number improves readability. Capture with (?<name>...) and retrieve it by name from groups in the match result. A backreference within the same pattern is \k<name>, and in the replacement string it is $<name>.
Example: (?<year>\d{4})-(?<month>\d{2}) lets you retrieve the year and month via groups.year and groups.month.
4. Anchors and boundaries — ^ $ \b
Anchors match a "position" rather than a character itself. They are zero-width (they consume no characters).
^… the start position. With themflag it becomes the start of each line.$… the end position. With themflag it becomes the end of each line.\b… a word boundary. It matches the boundary between a word character (\w) and a non-word character (\W), or the edge of the string.\B… a position that is not a word boundary.
For example, ^\d+$ expresses the condition "the whole thing is only digits" (one or more digits from start to end). Without anchors it merely means "contains a digit somewhere", which is insufficient for input validation.
\b is handy. \bcat\b matches the word cat but not part of category. Conversely, plain /cat/ also matches inside category — keep that in mind.
5. Flags — g i m s u
Flags are options that change the behavior of the whole pattern; in a literal you append them after the closing / (e.g. /abc/gi). Here are the representative ones.
| Flag | Name | Effect |
|---|---|---|
g | global | Does not stop at the first match; targets all matches |
i | ignoreCase | Does not distinguish upper and lower case |
m | multiline | Applies ^ and $ to the start and end of each line |
s | dotAll | Makes . match newlines as well |
u | unicode | Handles Unicode correctly (code-point units for emoji etc., the \u{...} notation) |
For example, to "find all cat ignoring case", use /cat/gi. Flags can be combined and their order does not matter.
6. Common patterns — email, phone, URL, date
The table below collects simple patterns commonly seen in practice, combining the elements above. None of them are strict spec-compliant validations; they are practical rules of thumb only. When you truly need accurate validation (especially for email), pair them with a dedicated library or an actual delivery check.
| Target | Simple pattern (example) | Description |
|---|---|---|
| Email (simple) | ^[^\s@]+@[^\s@]+\.[^\s@]+$ | The minimal shape "non-whitespace/non-@ + @ + domain + . + TLD". Not strictly RFC-compliant |
| Phone (Japan, hyphen-separated) | ^0\d{1,4}-\d{1,4}-\d{4}$ | A simple form starting with 0, of digits and hyphens. Digit counts vary by region, so this is only a guide |
| URL (http/https) | ^https?:\/\/[^\s]+$ | http or https (s?) followed by a non-whitespace string. / is escaped as \/ |
| Date (YYYY-MM-DD form) | ^\d{4}-\d{2}-\d{2}$ | Checks only the digit-count shape. Logical validity such as "month 13" must be checked separately |
^\d{4}-\d{2}-\d{2}$ also accepts 2026-13-99. Non-existent dates, or the real existence of an email, are semantic validity outside the scope of regex. Keep shape checking and value validation separate.
7. Pitfalls — over-complication and ReDoS
Regex is powerful, but writing too much makes it unreadable and invites performance problems. Finally, here are practical pitfalls.
- Avoid over-complication: cramming everything into one expression makes it unreadable and unfixable later. It is also a sound choice to split it into multiple steps, add comments (in languages with an
xflag), or delegate to a dedicated parser. - Watch out for ReDoS (regex denial of service): patterns with nested quantifiers such as
(a+)+or(.*)*can cause backtracking to explode on certain inputs, making processing extremely slow. Places that pass external input straight into a complex pattern are especially dangerous. - Mitigations: avoid nested quantifiers, limit the range with a character class such as
[^>]+, and cap the input length where possible. Designing so that you never execute untrusted patterns also helps.
In short, the basics of regex are "keep it small, keep it readable, and verify with real examples". Master the four — character classes and quantifiers, anchors, and flags — and you can cover most everyday patterns.
Free Tool Try it for real with the Regex Tester Enter a pattern and flags and check matches and capture results against your target text right in the browser. Verify what you wrote instantly.Frequently Asked Questions (FAQ)
What is a regular expression?
A regular expression (regex) is a small language for describing patterns in text. It expresses conditions such as "three digits in a row" or "something that looks like an email containing @" as a short expression, and is used for searching, extracting, replacing, and validation. It is built into many programming languages and editors; in JavaScript you work with it through /pattern/flags literals or the RegExp object.
What is the difference between \d and \w?
\d matches a single digit (essentially 0-9). \w matches a single "word character": the alphanumerics (a-z, A-Z, 0-9) plus the underscore _. In other words, \w includes \d and additionally covers letters and the underscore, making it a broader character class. Uppercasing either one negates it (\D is any non-digit, \W is any non-word character).
What is greedy matching?
Quantifiers (* + ? {n,m}) are "greedy" by default and try to match as much as possible while still satisfying the pattern. For example, <.+> grabs everything from the first < to the last >. Adding ? right after them, as in *? +? ?? {n,m}?, makes them "lazy" so they match as little as possible. <.+?> stops at the first >.