+++ title = "UCG Formal Grammar" slug = "grammar" weight = 1 sort_by = "weight" in_search_index = true +++ UCG Formal Grammar ------------------------ ## Definitions * WS is any non-visible utf-8 whitespace. * DIGIT is any ascii number character. * VISIBLE_CHAR is any visible utf-8 character. * ASCII_CHAR is any visible ascii letter character. * UTF8_CHAR is any utf8 character including ws. ## Tokens ``` ws: WS ; dot: "."; quot: '"' ; pipe: '|' ; percent: "%" ; star: "*" ; plus: "+" ; minus: "-" ; slash: "/" ; equal: "=" ; gtequal: ">=" ; ltequal: "<=" ; equalequal: "<=" ; gt: ">" ; lt: "<" ; fatcomma: "=>" ; comma: "," ; integer: DIGIT+ ; lbrace: "{" ; rbrace: "}" ; lbracket: "[" ; rbracket: "]" ; lparen: "(" ; rparen: ")" ; bareword: ASCII_CHAR, { DIGIT | VISIBLE_CHAR | "_" } ; let_keyword: "let" [WS] ; import_keyword: "import" [WS]; as_keyword: "as" [WS]; macro_keyword: "macro" ; module_keyword: "module" ; mod_keyword: "mod" ; out_keyword: "out" ; assert_keyword: "assert" ; null_keyword: "NULL" ; in_keyword: "in" ; escaped: "\", VISIBLE_CHAR ; str: quot, { escaped | UTF8_CHAR }, quot ; ``` Whitespace is discarded before parsing the rest of the AST. ## Simple Scalar Values ``` str: float: (DIGIT+, dot, { DIGIT }) | (dot, DIGIT+) ; number: ["-" | "+"](float | integer) ; ``` ## Complex Values ### Lists ``` field: bareword | str ; list_elements: expr, (comma, expr)*, [comma] ; list: lbracket, [ list_elements ], rbracket ; ``` ### Tuples ``` field_pair: field, equal, expr ; field_list: field_pair, { comma, field_pair }, [comma] tuple: lbrace, [ field_list ], rbrace; ``` ## Expressions ### Simple Expressions ``` simple_expr: literal | bareword ; ``` #### Literals ``` literal: str | integer | float | list | tuple | null_keyword; ``` ### Complex Expressions #### Grouped Expression ``` grouped: lparen, expr, rparen ; ``` #### Macro Definition ``` arglist: expr, { comma, expr }, [comma] ; macro_def: macro_keyword, lparen, [ arglist ], rparen, fatcomma, tuple ; ``` #### Module Definition ``` module_def: module_keyword, tuple, fatcomma, lbrace, [ { statement } ], rbrace ; ``` #### Copy and Call Expression ``` copy_expression: bareword, tuple ; call_expression: bareword, lparen, [arglist], rparen ; ``` #### Format Expression ``` format_expr: str, percent, lparen, [arglist], rparen ; ``` #### Non Operator Expression ``` non_operator_expr: literal | grouped | macrodef | module_def | format_expression | copy_expression | call_expression ; ``` #### Operator Expressions ``` sum_op: plus | minus ; product_op: start | slash ; compare_op: equalequal | gtequal | ltequal | gt | lt | in_keyword ; binary_op: sum_op | product_op | dot | compare_op ; binary_expr: non_operator_expr, binary_op, expr ; ``` Operator expressions have a defined precedence order for evaluation: * First the `dot` operator binds the tightest of all the operators. * Next the `product_op` is the next tightest binding of the operators. * Next the `sum_op` is the next tightest binding of the operators. * And lastly the `compare_op` is the least tightest binding of the operators. ### Any Expression ``` expr: operator_expression | non_operator_expression ; ``` ## Statements ``` let_statement: let_keyword, bareword, equal, expr ; import_statement: import_keyword, str, as_keyword, bareword ; out_statement: out_keyword, bareword, str ; assert_statement: assert_keyword, pipe, { statement }, pipe ; simple_statement: expr ; statement: ( let_statement | import_statement | out_statement | assert_statement | simple_statement ), semicolon ; ``` ## UCG File ``` grammar: { statement } ; ```