# 正则的扩展

# 构造函数

在 ES6 中，RegExp构造函数的参数有三种情况。

第一种情况是，参数是字符串，这时第二个参数表示正则表达式的修饰符（flag）。

var regex = new RegExp('xyz', 'i');
// 等价于
var regex = /xyz/i;

1
2
3

第二种情况是，参数是一个正则表示式，这时会返回一个原有正则表达式的拷贝。

var regex = new RegExp(/xyz/i);
// 等价于
var regex = /xyz/i;

1
2
3

第三种情况，如果RegExp构造函数第一个参数是一个正则对象，那么可以使用第二个参数指定修饰符。而且，返回的正则表达式会忽略原有的正则表达式的修饰符，只使用新指定的修饰符。

new RegExp(/abc/ig, 'i').flags
// "i"

1
2

# u修饰符

ES6 对正则表达式添加了u修饰符，含义为“Unicode 模式”，用来正确处理大于\uFFFF的 Unicode 字符。也就是说，会正确处理四个字节的 UTF-16 编码。

/^\uD83D/u.test('\uD83D\uDC2A') // false /^\uD83D/.test('\uD83D\uDC2A') // true

# 1. 点字符

点（.）字符在正则表达式中，含义是除了换行符以外的任意单个字符。对于码点大于0xFFFF的 Unicode 字符，点字符不能识别，必须加上u修饰符。

var s = '𠮷';

/^.$/.test(s) // false
/^.$/u.test(s) // true

1
2
3
4

# 2. Unicode 字符表示法

ES6 新增了使用大括号表示 Unicode 字符，这种表示法在正则表达式中必须加上u修饰符，才能识别当中的大括号，否则会被解读为量词。

/\u{61}/.test('a') // false
/\u{61}/u.test('a') // true
/\u{20BB7}/u.test('𠮷') // true

1
2
3

# 3. 量词

使用u修饰符后，所有量词都会正确识别码点大于0xFFFF的 Unicode 字符。

/a{2}/.test('aa') // true
/a{2}/u.test('aa') // true
/𠮷{2}/.test('𠮷𠮷') // false
/𠮷{2}/u.test('𠮷𠮷') // true

1
2
3
4

# 4. 预定义模式

u修饰符也影响到预定义模式，能否正确识别码点大于0xFFFF的 Unicode 字符。

/^\S$/.test('𠮷') // false
/^\S$/u.test('𠮷') // true

1
2

# 5. i 修饰符

有些 Unicode 字符的编码不同，但是字型很相近，比如，\u004B与\u212A都是大写的K。

/[a-z]/i.test('\u212A') // false
/[a-z]/iu.test('\u212A') // true

1
2

# 6. RegExp.prototype.unicode 属性

正则实例对象新增unicode属性，表示是否设置了u修饰符。

const r1 = /hello/;
const r2 = /hello/u;

r1.unicode // false
r2.unicode // true

1
2
3
4
5

# y修饰符

y修饰符的作用与g修饰符类似，也是全局匹配，后一次匹配都从上一次匹配成功的下一个位置开始。不同之处在于，g修饰符只要剩余位置中存在匹配就可，而y修饰符确保匹配必须从剩余的第一个位置开始，这也就是“粘连”的涵义。

var s = 'aaa_aa_a';
var r1 = /a+/g;
var r2 = /a+/y;

r1.exec(s) // ["aaa"]
r2.exec(s) // ["aaa"]

r1.exec(s) // ["aa"]
r2.exec(s) // null

1
2
3
4
5
6
7
8
9

如果改一下正则表达式，保证每次都能头部匹配，y修饰符就会返回结果了。

var s = 'aaa_aa_a';
var r = /a+_/y;

r.exec(s) // ["aaa_"]
r.exec(s) // ["aa_"]

1
2
3
4
5

使用lastIndex属性，可以更好地说明y修饰符。

const REGEX = /a/g;

// 指定从2号位置（y）开始匹配
REGEX.lastIndex = 2;

// 匹配成功
const match = REGEX.exec('xaya');

// 在3号位置匹配成功
match.index // 3

// 下一次匹配从4号位开始
REGEX.lastIndex // 4

// 4号位开始匹配失败
REGEX.exec('xaya') // null

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

与y修饰符相匹配，ES6 的正则实例对象多了sticky属性，表示是否设置了y修饰符。

var r = /hello\d/y;
r.sticky // true

1
2

# flags属性

ES6 为正则表达式新增了flags属性，会返回正则表达式的修饰符。

// ES5 的 source 属性
// 返回正则表达式的正文
/abc/ig.source
// "abc"

// ES6 的 flags 属性
// 返回正则表达式的修饰符
/abc/ig.flags
// 'gi'

1
2
3
4
5
6
7
8
9

# 具名组匹配

ES2018 引入了具名组匹配（Named Capture Groups），允许为每一个组匹配指定一个名字，既便于阅读代码，又便于引用。

const RE_DATE = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;

const matchObj = RE_DATE.exec('1999-12-31');
const year = matchObj.groups.year; // 1999
const month = matchObj.groups.month; // 12
const day = matchObj.groups.day; // 31

1
2
3
4
5
6

上面代码中，“具名组匹配”在圆括号内部，模式的头部添加“问号 + 尖括号 + 组名”（?<year>），然后就可以在exec方法返回结果的groups属性上引用该组名。同时，数字序号（matchObj[1]）依然有效。

# matchAll

目前有一个提案，增加了String.prototype.matchAll方法，可以一次性取出所有匹配。不过，它返回的是一个遍历器（Iterator），而不是数组。

const string = 'test1test2test3';

// g 修饰符加不加都可以
const regex = /t(e)(st(\d?))/g;

for (const match of string.matchAll(regex)) {
  console.log(match);
}
// ["test1", "e", "st1", "1", index: 0, input: "test1test2test3"]
// ["test2", "e", "st2", "2", index: 5, input: "test1test2test3"]
// ["test3", "e", "st3", "3", index: 10, input: "test1test2test3"]

1
2
3
4
5
6
7
8
9
10
11

遍历器转为数组是非常简单的，使用...运算符和Array.from方法就可以了。

// 转为数组方法一
[...string.matchAll(regex)]

// 转为数组方法二
Array.from(string.matchAll(regex));

1
2
3
4
5