Whitespace
Lexer:
WHITESPACE :U+0009
|U+000D
|U+0020
LINE_ENDING :
U+000A
LINE_TERMINATOR :
\
EOL : WHITESPACE* LINE_TERMINATOR? WHITESPACE* LINE_ENDING
The following is a list of valid whitespace characters.
U+0009
: horizontal tabU+000D
: carriage returnU+0020
: space
Whitespace characters only serve to separate tokens in the grammar, and have no semantic significance. This is different to programming languages such as Python or markup languages like YAML where indentation matters.
Line Endings
Both LF
and CRLF
are valid line endings because the Creation Kit compilers separate lines on U+000A
(line feed) and U+000D
(carriage return) is treated as whitespace.
Encoding
Files should be encoded using UTF-8
however because of the limited character set allowed by the Creation Kit compilers, other encodings that share the following code points are also allowed (UTF-16-LE
, UTF-16-BE
, UTF-32
):
- [
0
-9
] must start at0x30
(0
) and end at0x39
(9
) - [
A
-Z
] must start at0x41
(A
) and end at0x5A
(Z
) - [
a
-z
] must start at0x61
(a
) and end at0x7A
(z
)
Line Terminators
In order for [Statements] to be separated correctly, languages must define a separator. Conventional languages like C++, C# or Java use a semi-column (;
) to separate statements, but Papyrus uses line endings instead. This means only one statement is allowed per line. Papyrus allows statements to be split on multiple lines using the backslash character (\
).
int x = 0 +\
1
The backslash must be followed by any amount of whitespace and a Line Ending. The following is not valid Papyrus code.
int x = 0+\ ; comment after line terminator is not allowed
1