扫描文本 | text/scanner
打包扫描仪
import "text/scanner"
- 概观
- 索引
- 示例
概观
程序包扫描程序为 UTF-8 编码的文本提供扫描程序和标记程序。它需要一个提供源的 io.Reader ,然后可以通过重复调用扫描功能对其进行标记。为了与现有工具兼容, NUL 字符是不允许的。如果源中的第一个字符是 UTF-8 编码的字节顺序标记 (BOM) ,它将被丢弃。
默认情况下,扫描程序会跳过空格并执行注释并识别 Go 语言规范定义的所有文字。它可以被定制为仅识别这些文字的一个子集并识别不同的标识符和空白字符。
示例
package main
import (
"fmt"
"strings"
"text/scanner"
)
func main() {
const src = `
// This is scanned code.
if a > 10 {
someParsable = text
}`
var s scanner.Scanner
s.Init(strings.NewReader(src))
s.Filename = "example"
for tok := s.Scan( tok != scanner.EOF; tok = s.Scan() {
fmt.Printf("%s: %s\n", s.Position, s.TokenText())
}
}
索引
- 常量
- func TokenString(tok rune) string
- type Position
- func (pos *Position) IsValid() bool
- func (pos Position) String() string
- type Scanner
- func (s *Scanner) Init(src io.Reader) *Scanner
- func (s *Scanner) Next() rune
- func (s *Scanner) Peek() rune
- func (s *Scanner) Pos() (pos Position)
- func (s *Scanner) Scan() rune
- func (s *Scanner) TokenText() string
示例
打包
打包文件
常量
预定义的模式位控制令牌的识别。例如,要配置扫描仪,使其仅识别 (Go) 标识符,整数并跳过注释,请将扫描仪的模式字段设置为:
ScanIdents | ScanInts | SkipComments
除注释外,如果设置了 SkipComments ,将跳过注释,但不会忽略无法识别的令牌。相反,扫描仪只是返回相应的单个字符(或可能是子令牌)。例如,如果模式是 ScanIdents(而不是 ScanStrings ) ,则将字符串“ foo ”作为标记序列'' 'Ident '''进行扫描。
const (
ScanIdents = 1 << -Ident
ScanInts = 1 << -Int
ScanFloats = 1 << -Float // includes Ints
ScanChars = 1 << -Char
ScanStrings = 1 << -String
ScanRawStrings = 1 << -RawString
ScanComments = 1 << -Comment
SkipComments = 1 << -skipComment // if set with ScanComments, comments become white space
GoTokens = ScanIdents | ScanFloats | ScanChars | ScanStrings | ScanRawStrings | ScanComments | SkipComments
)
Scan 的结果是这些标志或 Unicode 字符之一。
const (
EOF = -(iota + 1)
Ident
Int
Float
Char
String
RawString
Comment
)
GoWhitespace 是扫描仪空白字段的默认值。它的值选择 Go 的空白字符。
const GoWhitespace = 1<<'\t' | 1<<'\n' | 1<<'\r' | 1<<' '
func TokenStringSource
func TokenString(tok rune) string
TokenString 为标志或 Unicode 字符返回可打印的字符串。
type PositionSource
源位置由位置值表示。如果 Line> 0,则位置有效。
type Position struct {
Filename string // filename, if any
Offset int // byte offset, starting at 0
Line int // line number, starting at 1
Column int // column number, starting at 1 (character count per line)
}
func (*Position) IsValidSource
func (pos *Position) IsValid() bool
IsValid 报告该位置是否有效。
func (Position) StringSource
func (pos Position) String() string
键入 扫描仪源
扫描仪实现从 io.Reader 读取 Unicode 字符和标记。
type Scanner struct {
// Error is called for each error encountered. If no Error
// function is set, the error is reported to os.Stderr.
Error func(s *Scanner, msg string)
// ErrorCount is incremented by one for each error encountered.
ErrorCount int
// The Mode field controls which tokens are recognized. For instance,
// to recognize Ints, set the ScanInts bit in Mode. The field may be
// changed at any time.
Mode uint
// The Whitespace field controls which characters are recognized
// as white space. To recognize a character ch <= ' ' as white space,
// set the ch'th bit in Whitespace (the Scanner's behavior is undefined
// for values ch > ' '). The field may be changed at any time.
Whitespace uint64
// IsIdentRune is a predicate controlling the characters accepted
// as the ith rune in an identifier. The set of valid characters
// must not intersect with the set of white space characters.
// If no IsIdentRune function is set, regular Go identifiers are
// accepted instead. The field may be changed at any time.
IsIdentRune func(ch rune, i int) bool
// Start position of most recently scanned token; set by Scan.
// Calling Init or Next invalidates the position (Line == 0).
// The Filename field is always left untouched by the Scanner.
// If an error is reported (via Error) and Position is invalid,
// the scanner is not inside a token. Call Pos to obtain an error
// position in that case, or to obtain the position immediately
// after the most recently scanned token.
Position
// contains filtered or unexported fields
}
func (*Scanner) InitSource
func (s *Scanner) Init(src io.Reader) *Scanner
Init 用新源初始化扫描仪并返回 s 。错误设置为零, ErrorCount 设置为0,模式设置为 GoTokens ,并且空白设置为 GoWhitespace 。
func (*Scanner) NextSource
func (s *Scanner) Next() rune
接下来读取并返回下一个 Unicode 字符。它在源的末尾返回 EOF 。它通过调用 s.Error 来报告读取错误,如果不是零; 否则它会向 os.Stderr 输出一条错误消息。接下来不更新扫描仪的位置字段; 使用 Pos() 来获取当前位置。
func (*Scanner) PeekSource
func (s *Scanner) Peek() rune
Peek 将返回源中的下一个 Unicode 字符,而不会推进扫描程序。如果扫描仪的位置在源的最后一个字符处,它会返回 EOF 。
func (*Scanner) PosSource
func (s *Scanner) Pos() (pos Position)
Pos 返回最后一次调用 Next 或 Scan 时返回的字符或标记之后的字符位置。将扫描仪的位置字段用于最近扫描的标记的开始位置。
func (*Scanner) ScanSource
func (s *Scanner) Scan() rune
扫描从源读取下一个标记或Unicode字符并将其返回。它只识别设置了相应模式位 (1<<-t) 的标志 t 。它在源的末尾返回 EOF 。它通过调用 s.Error 来报告扫描器错误(读取和令牌错误),如果不是零; 否则它会向 os.Stder r输出一条错误消息。
func (*Scanner) TokenTextSource
func (s *Scanner) TokenText() string
TokenText 返回对应于最近扫描的标记的字符串。调用 Scan() 后有效。