Lexical grammar

这部分描述了JavaScript的词法。ECMAScript源码文本会被从左到右扫描，并被转换为一系列的输入元素，包括tokens、控制符、行终止符、注释和空白符。ECMAScript定义了一些关键字、字面量以及行尾分号补全的规则。

格式控制符

Unicode 格式控制符用于控制对源码文本的解释，但是并不会显示出来。

Code point	Name	Abbreviation	Description
U+200C	Zero width non-joiner	<ZWNJ>	Placed between characters to prevent being connected into ligatures in certain languages (Wikipedia).
U+200D	Zero width joiner	<ZWJ>	Placed between characters that would not normally be connected in order to cause the characters to be rendered using their connected form in certain languages (Wikipedia).
U+FEFF	Byte order mark	<BOM>	Used at the start of the script to mark it as Unicode and the text's byte order (Wikipedia).

空白符

空白符提升了源码的可读性，并将标记 (tokens) 区分开。这些符号通常不影响源码的功能。通常可以用压缩器来移除源码中的空白，减少数据传输量。

Code point	Name	Abbreviation	Description	Escape sequence
U+0009	Character tabulation	<HT>	Horizontal tabulation	\t
U+000B	Line tabulation	<VT>	Vertical tabulation	\v
U+000C	Form feed	<FF>	Page breaking control character (Wikipedia).	\f
U+0020	Space	<SP>	Normal space
U+00A0	No-break space	<NBSP>	Normal space, but no point at which a line may break
Others	Other Unicode space characters	<USP>	Spaces in Unicode on Wikipedia

行终止符

在ECMAScript中，只有下列Unicode字符会被当成行终止符，其他的行终止符（比如Next Line、NEL、U+0085等）都会被当成空白。

Code point	Name	Abbreviation	Description	Escape sequence
U+000A	Line Feed	<LF>	New line character in UNIX systems.	\n
U+000D	Carriage Return	<CR>	New line character in Commodore and early Mac systems.	\r
U+2028	Line Separator	<LS>	Wikipedia
U+2029	Paragraph Separator	<PS>	Wikipedia

注释

JavaScript中有两种生成注释的方法。

第一种是单行注释(single-line comment)，使用//，会将该行中符号以后的文本都视为注释：

function comment() {
  // This is a one line JavaScript comment
  console.log('Hello world!'
}
comment(

第二种是多行注释 (multiple-line comment)，使用/* */ ，这种方式更加灵活：

比如，可以使用多行注释来实现单行注释：

function comment() {
  /* This is a one line JavaScript comment */
  console.log('Hello world!'
}
comment(

也可以用来实现多行注释：

function comment() {
  /* This comment spans multiple lines. Notice
     that we don't need to end the comment until we're done. */
  console.log('Hello world!'
}
comment(

如果你愿意的话，你也可以在行的中间使用它，虽然这会使你的代码难以阅读，所以应该谨慎使用它：

function comment(x) {
  console.log('Hello ' + x /* insert the value of x */ + ' !'
}
comment('world'

另外，你可以通过在注释中包装代码来禁用代码来阻止代码运行，如下所示：

function comment() {
  /* console.log('Hello world!' */
}
comment(

在这种情况下，console.log()调用永远不会发出，因为它在注释中。任何数量的代码行都可以这样禁用。

关键词

Reserved keywords as of ECMAScript 2015

break

case

catch

class

const

continue

debugger

default

delete

do

else

export

extends

finally

for

function

if

import

in

instanceof

new

return

super

switch

this

throw

try

typeof

var

void

while

with

yield

未来的保留关键字

以下内容被ECMAScript规范保留为将来的关键字。目前它们没有特殊的功能，但是可能在将来的某个时候，所以它们不能被用作标识符。

这些总是保留的：

enumThe following are only reserved when they are found in strict mode code:

implements

interface

let

package

private

protected

public

static

以下仅在模块代码中找到时才保留：

awaitFuture reserved keywords in older standardsThe following are reserved as future keywords by older ECMAScript specifications (ECMAScript 1 till 3).

abstract

boolean

byte

char

double

final

float

goto

int

long

native

short

synchronized

throws

transient

volatile

另外，在ECMAScript中文字null，true和false不能用作标识符。

保留字的用法

保留字实际上只适用于标识符（vs. IdentifierNames）。正如es5.github.com/#A.1中所述，这些都是IdentifierNames不排除的ReservedWords。

a.import
a['import']
a = { import: 'test' }.

另一方面，以下是非法的，因为它是一个IdentifierName没有保留字的标识符。标识符用于FunctionDeclaration, FunctionExpression, VariableDeclaration等等。IdentifierNames用于MemberExpression, CallExpression等等。

function import() {} // Illegal.

直接量

空直接量

另请参阅null更多信息。

null

逻辑直接量

See also Boolean for more information.

true
false

数字直接量

Decimal

1234567890
42

// Caution when using with a leading zero:
0888 // 888 parsed as decimal
0777 // parsed as octal, 511 in decimal

请注意，十进制文字可以以零（0）后跟另一个十进制数字开头，但如果前导后的所有数字0都小于8，则该数字将被解释为八进制数字。这不会抛出JavaScript，请参阅错误957513。另见有关的页面parseInt()。

Binary

二进制表示为开头是0后接大写或小写的B（0b或者0B）。这是ECMAScript 6中的新语法，可以参考下面的浏览器兼容性表格。如果0b之后有除了0或1以外的数字，将会抛出SyntaxError：“Missing binary digits after 0b”。

var FLT_SIGNBIT  = 0b10000000000000000000000000000000; // 2147483648
var FLT_EXPONENT = 0b01111111100000000000000000000000; // 2139095040
var FLT_MANTISSA = 0B00000000011111111111111111111111; // 8388607

Octal

八进制表示为开头是0后接大写或小写的O（0o或0O）。这是ECMAScript 6中的新语法，可以参考下面的浏览器兼容性表格。如果有不在（01234567）中的数字，将会抛出SyntaxError：“Missing octal digits after 0o”。

var n = 0O755; // 493
var m = 0o644; // 420

// Also possible with just a leading zero (see note about decimals above)
0755
0644

Hexadecimal

十六进制表示为开头是0后接大写或小写的X（0x或0X）。如果有不在（0123456789ABCDEF）中的数字，将会抛出SyntaxError：“Identifier starts immediately after numeric literal”。

0xFFFFFFFFFFFFFFFFF // 295147905179352830000
0x123456789ABCDEF   // 81985529216486900
0XA                 // 10

对象直接量

数组直接量

更多信息可以参考Array。

[1954, 1974, 1990, 2014]

字符串直接量

'foo'
"bar"

十六进制转义序列

'\xA9' // "©"

Unicode转义序列

Unicode转义序列要求在\u之后至少有四个字符。

'\u00A9' // "©"

Unicode编码转义

ECMAScript 6新增特性。使用Unicode编码转义，任何字符都可以被转义为十六进制编码。最高可以用到0x10FFFF。使用单纯的Unicode转义通常需要写成分开的两半以达到相同的效果。

See also String.fromCodePoint() or String.prototype.codePointAt().

'\u{2F804}'

// the same with simple Unicode escapes
'\uD87E\uDC04'

正则表达式直接量

更多信息可以参考RegExp。

/ab+c/g

// An "empty" regular expression literal
// The empty non-capturing group is necessary 
// to avoid ambiguity with single-line comments.
/(?:)/

模板直接量

更多信息可以参考template strings。

`string text`

`string text line 1
 string text line 2`

`string text ${expression} string text`

tag `string text ${expression} string text`

自动分号补全

一些JavaScript语句必须用分号结束，所以会被自动分号补全 (ASI)影响：

Empty statement

let, const, variable statement

import, export, module declaration

Expression statement

debugger

continue, break, throw

return

ECMAScript规格提到自动分号补全的三个规则。

1. 当出现一个不允许的行终止符或“}”时，会在其之前插入一个分号。

2. 当捕获到标识符输入流的结尾，并且无法将单个输入流转换为一个完整的程序时，将在结尾插入一个分号。

在下面这段中，由于在b和++之间出现了一个行终止符，所以++未被当成变量b的后置运算符。

a = b
++c

// is transformend by ASI into

a = b;
++c;

3. 当语句中包含语法中的限制产品后跟一个行终止符的时候，将会在结尾插入一个分号。带“这里没有行终止符”规则的语句有：

PostfixExpressions (++ and --)

continue

break

return

yield, yield*

module

return
a + b

// is transformed by ASI into

return;
a + b;

产品规格

Specification	Status	Comment
ECMAScript 1st Edition (ECMA-262)	Standard	Initial definition.
ECMAScript 5.1 (ECMA-262)The definition of 'Lexical Conventions' in that specification.	Standard
ECMAScript 2015 (6th Edition, ECMA-262)The definition of 'Lexical Grammar' in that specification.	Standard	Added: Binary and Octal Numeric literals, Unicode code point escapes, Templates
ECMAScript Latest Draft (ECMA-262)The definition of 'Lexical Grammar' in that specification.	Living Standard

浏览器兼容性

Feature	Chrome	Edge	Firefox (Gecko)	Internet Explorer	Opera	Safari
Basic support	(Yes)	(Yes)	(Yes)	(Yes)	(Yes)	(Yes)
Binary and octal numeric literals (0b and 0o)	41	12	25 (25)	?	28	9
Unicode code point escapes (\u{})	44	12	40 (40)	No support	31	9
Shorthand notation for object literals	43	12	33 (33)	No support	30	9
Template literals	41	12	34 (34)	No support	28	9

Feature	Android	Chrome for Android	Firefox Mobile (Gecko)	IE Mobile	Opera Mobile	Safari Mobile
Basic support	(Yes)	(Yes)	(Yes)	(Yes)	(Yes)	(Yes)
Binary and octal numeric literals (0b and 0o)	?	41	33.0 (33)	?	?	?
Unicode code point escapes (\u{})	?	?	40.0 (40)	?	?	?
Shorthand notation for object literals	No support	No support	33.0 (33)	No support	No support	No support
Template literals	No support	No support	34.0 (34)	No support	No support	No support

Lexical grammar

Lexical grammar

格式控制符

空白符

行终止符

注释

关键词

Reserved keywords as of ECMAScript 2015

未来的保留关键字

保留字的用法

空直接量

逻辑直接量

Decimal

Binary

Octal

Hexadecimal

对象直接量

更多信息可以参考Object和对象初始化器。

数组直接量

字符串直接量

十六进制转义序列

Unicode转义序列要求在\u之后至少有四个字符。

Unicode编码转义

正则表达式直接量

模板直接量

自动分号补全

产品规格

浏览器兼容性