mirror of
https://github.com/zaphar/ucg.git
synced 2025-07-21 18:10:42 -04:00
This makes it both easier to correctly write a select expression as well as easier to parse and report syntax errors.
5.0 KiB
5.0 KiB
+++ title = "Formal Grammar" slug = "grammar" weight = 1 sort_by = "weight" in_search_index = true +++ UCG Formal Grammar
Definitions
- WS is any non-visible utf-8 whitespace.
- DIGIT is any ascii number character.
- VISIBLE_CHAR is any visible utf-8 character.
- ASCII_CHAR is any visible ascii letter character.
- UTF8_CHAR is any utf8 character including ws.
Tokens
ws: WS ;
dot: ".";
quot: '"' ;
pipe: '|' ;
percent: "%" ;
star: "*" ;
plus: "+" ;
minus: "-" ;
slash: "/" ;
equal: "=" ;
gtequal: ">=" ;
ltequal: "<=" ;
equalequal: "<=" ;
gt: ">" ;
lt: "<" ;
fatcomma: "=>" ;
comma: "," ;
integer: DIGIT+ ;
lbrace: "{" ;
rbrace: "}" ;
lbracket: "[" ;
rbracket: "]" ;
lparen: "(" ;
rparen: ")" ;
bareword: ASCII_CHAR, { DIGIT | VISIBLE_CHAR | "_" } ;
let_keyword: "let" ;
import_keyword: "import" ;
include_keyword: "include" ;
as_keyword: "as" ;
func_keyword: "func" ;
select_keyword: "select" ;
map_keyword: "map" ;
reduce_keyword: "map" ;
filter_keyword: "filter" ;
module_keyword: "module" ;
mod_keyword: "mod" ;
out_keyword: "out" ;
convert_keyword: "convert" ;
assert_keyword: "assert" ;
fail_keyword: "fail" ;
trace_keyword: "TRACE" ;
null_keyword: "NULL" ;
in_keyword: "in" ;
is_keyword: "in" ;
not_keyword: "module" ;
escaped: "\", VISIBLE_CHAR ;
str: quot, { escaped | UTF8_CHAR }, quot ;
float: (DIGIT+, dot, { DIGIT }) | (dot, DIGIT+) ;
number: ["-" | "+"](float | integer) ;
Whitespace is discarded before parsing the rest of the AST.
Complex Values
Lists
field: bareword | str ;
list_elements: expr, (comma, expr)*, [comma] ;
list: lbracket, [ list_elements ], rbracket ;
Tuples
field_pair: field, equal, expr ;
field_list: field_pair, { comma, field_pair }, [comma]
tuple: lbrace, [ field_list ], rbrace;
Expressions
Simple Expressions
simple_expr: literal | bareword ;
Literals
literal: str | integer | float | list | tuple | null_keyword;
Complex Expressions
Grouped Expression
grouped: lparen, expr, rparen ;
Select expressions
select_expr: select_keyword, lparen, expr, [comma, expr], fatcomma, tuple ;
Function Definition
arglist: expr, { comma, expr }, [comma] ;
func_def: func_keyword, lparen, [ arglist ], rparen, fatcomma, tuple ;
Module Definition
module_def: module_keyword, tuple, fatcomma, [lparen, expr, rparen], lbrace, [ { statement } ], rbrace ;
Copy and Call Expression
copy_expr: bareword, tuple ;
call_expr: bareword, lparen, [arglist], rparen ;
Format Expression
format_arg_list: lparen, [arglist], rparen ;
foramt_expr_arg: expression ;
format_expr: str, percent, (format_arg_list | format_expr_arg) ;
Functional processing expressions
func_op_kind: map_keyword | filter_keyword ;
map_or_filter_expr: func_op_kind, lparen, expr, expr, rparen ;
reduce_expr: reduce_keyword, lparen, expr, expr, expr, rparen ;
processing_expr: map_or_filter_expr | reduce_expr
Range Expression
range_expr: expr, ':', [int, ':'], expr ;
Include Expression
include_expr: include_keyword, bareword, str ;
Import expression
import_expr: import_keyword, str ;
Fail expressions
fail_expr: fail_keyword, (str | format_expr) ;
Not Expression
not_expr: not_keyword, expr ;
Not Expression
trace_expr: trace_keyword, expr ;
Non Operator Expression
non_operator_expr: literal
| grouped
| select_def
| import_expr
| funcdef
| module_def
| fail_expr
| not_expr
| trace_expr
| format_expr
| range_expr
| include_expr
| copy_expr
| processing_expr
| call_expr ;
Operator Expressions
sum_op: plus | minus ;
product_op: start | slash ;
compare_op: equalequal | gtequal | ltequal | gt | lt | in_keyword | is_keyword ;
binary_op: sum_op | product_op | dot | compare_op ;
binary_expr: non_operator_expr, binary_op, expr ;
Operator expressions have a defined precedence order for evaluation:
- First the
dot
operator binds the tightest of all the operators. - Next the
product_op
is the next tightest binding of the operators. - Next the
sum_op
is the next tightest binding of the operators. - And lastly the
compare_op
is the least tightest binding of the operators.
Any Expression
expr: binary_expr | non_operator_expr ;
Statements
let_statement: let_keyword, bareword, equal, expr ;
out_statement: out_keyword, bareword, str ;
convert_statement: convert_keyword, bareword, str ;
assert_statement: assert_keyword, pipe, { statement }, pipe ;
simple_statement: expr ;
statement: ( let_statement
| out_statement
| convert_statement
| assert_statement
| simple_statement ), semicolon ;
UCG File
grammar: { statement } ;