【编译原理】C语言完整文法(包括预处理器文法)
说明:
以下内容基于ISO C89标准。其中,一行开头的 | 表示或者的意思,[]里面的内容是可选的。不是行头的 | 都是 字符 '|' 的意思,如果文法中用到字符 [ ],则使用 '[' 和 ']'表示。
该文法适合用来理解C语言的文法,但不适合用来写编译器。因为该文法中有大量左递归和左公因子,且是二义性的,总共有 500 行左右,最重要的一点是它不是上下文无关的,所以编译原理中的LL分析法和LR分析法都派不上用场 T_T。
也就是说,对该文法做一些特殊处理才能够用于C编译器的开发。
该文法的开始符号是: 翻译单元 translation_unit。
纯手打。

C89文法
一、词法
1. 单词, 预处理单词
token :
keyword
identifier
constant
string_literal
operator
punctuator
preprocessing_token :
header_name
identifier
pp_number
character_constant
string_literal
operator
punctuator
不在上述范围内的任一非空白符
2. 关键字
keyword:
auto double int struct
break else long switch
case enum register typedef
char extern return union
const float short unsigned
for signed void default
goto sizeof volatile do
if static while continue
3. 标识符
identifier :
nodigit
identifier nodigit
identifier digit
nodigit :
_ a b c d e f g h i j k l m n o p q r s t
u v w x y z A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z
digit :
0 1 2 3 4 5 6 7 8 9
4. 常量
constant :
floating_constant
| integer_constant
| enumeration_constant
| character_constant
floating_constant :
fractional_constant [exponent_part] [floating_suffix]
| digit_sequence exponent_part [floating_suffix]
fractional_constant :
[digit_sequence] . digit_sequence
| digit_sequence
exponent_part:
e [sign] digit_sequence
| E [sign] digit_sequence
sign:
+ | -
digit_sequence:
digit
| digit_sequence digit
floating_suffix:
f | l | F | L
integer_constant:
decimal_constant [integer_suffix]
| octal_contant [integer_suffix]
| hexadecimal_constant [integer_suffix]
decimal_constant:
nonzero_digit
| decimal_constant digit
octal_contant:
0
| octal_contant octal_digit
hexadecimal_constant:
0x hexadecimal_digit
| 0X hexadecimal_digit
| hexadecimal_constant hexadecimal_digit
nonzero_digit:
1 2 3 4 5 6 7 8 9
octal_digit:
0 1 2 3 4 5 6 7
hexadecimal_digit:
0 1 2 3 4 5 6 7 8 9
a b c d e f g
A B C D E F G
integer_suffix:
unsigned_suffix [long_suffix]
| long_suffix [unsigned_suffix]
unsigned_suffix :
u | U
long_suffix :
l | L
enumeration_constant :
identifier
character_constant:
' c_char_sequence '
| L' c_char_sequence '
c_char_sequence:
c_char
| c_char_sequence c_char
c_char:
源字符集中除 单引号 ',右反斜线 \,换行符 \n 外的所有字符
| escape_sequence
escape_sequence:
simple_escape_sequence
| octal_escape_sequence
| hexadecimal_escape_sequence
simple_escape_sequence: \' | \" | \? | \\ | \a | \b | \f | \n | \r | \t | \v
octal_escape_sequence:
\ octal_digit
| \ octal_digit octal_digit
| \ octal_digit octal_digit octal_digit
hexadecimal_escape_sequence:
\x hexadecimal_digit
| hexadecimal_escape_sequence hexadecimal_digit
5. 字符串字面量
string_literal:
" [s_char_sequence] "
| L" [s_char_sequence] "
s_char_sequence:
s_char
| s_char_sequence s_char
s_char:
源字符集中除 双引号 ",右反斜线 \,换行符 \n 外的所有字符
| escape_sequence
6. 运算符
operator:
[ ] ( ) . -> ++ -- & * + - ! sizeof / % << >> < > <= >= != ^ | && ||
= *= /= %= += -= <<= >>= &= ^= |= , # ##
7. 标点符号
punctuator:
[ ] ( ) { } * , : = ; ... #
8. 标头名
header_name:
< h_char_sequence >
" q_char_sequence "
h_char_sequence:
h_char
| h_char_sequence h_char
h_char:
源字符集中除 换行符 \n , 大于号 > 外的所有字符
q_char_sequence:
q_char
| q_char_sequence q_char
q_char:
源字符集中除 换行符 \n , 双引号" 外的所有字符
9. 预处理数字
pp_number:
digit
| .digit
| pp_number digit
| pp_number nonzero_digit
| pp_number e sign
| pp_number E sign
| pp_number .
二、语法
1. 表达式
primary_expression:
identifier
| constant
| string_literal
| (expression)
postfix_expression:
primary_expression
| postfix_expression '[' expression ']'
| postfix_expression ([argument_expression_list])
| postfix_expression . identifier
| postfix_expression -> identifier
| postfix_expression ++
| postfix_expression --
argument_expression_list:
assignment_expression
| assignmant_expression_list , assignment_expression
unary_expression:
postfix_expression
| ++ unary_expression
| -- unary_expression
| unary_operator cast_expression
| sizeof unary_expression
| sizeof (type_name)
unary_operator:
& * + - ~ !
cast_expression:
unary_expression
| (type_name) cast_expression
multiplicative_expression:
cast_expression
| multiplicative_expression * cast_expression
| multiplicative_expression / cast_expression
| multiplicative_expression % cast_expression
|
addtive_expression:
multiplicative_expression
| addtive_expression + multiplicative_expression
| addtive_expression - multiplicative_expression
shift_expression:
addtive_expression
| shift_expression << addtive_expression
| shift_expression >> addtive_expression
relational_expression:
shift_expression
| relational_expression < shift_expression
| relational_expression <= shift_expression
| relational_expression > shift_expression
| relational_expression >= shift_expression
equality_expression:
relational_expression
| equality_expression == relational_expression
| equality_expression != relational_expression
and_expression:
equality_expression
| and_expression & relational_expression
exclusive_or_expression:
and_expression
| exclusive_or_expression ^ and_expression
inclusive_or_expression:
exclusive_or_expression
| inclusive_or_expression | exclusive_or_expression
logical_and_expression:
inclusive_or_expression
| logical_and_expression && inclusive_or_expression
logical_or_expression:
logical_and_expression
| logical_or_expression || logical_and_expression
conditional_expression:
logical_or_expression
| logical_or_expression ? expression : conditional_expression
assignment_expression:
conditional_expression
| unary_expression assignment_operator assignment_expression
assignment_operator:
= *= /= %= += -= <<= >>= &= ^= |=
expression:
assignment_expression
| expression assignment_expression
constant_expression:
conditional_expression
2. 声明
declaration:
declaration_specifier [init_declaratior_list] ;
declaration_specifier:
storage_class_specifier [declaration_specifier]
| type_specifer [declaration_specifier]
| type_qualifier [declaration_specifier]
init_declaratior_list:
init_declarator
| init_declaratior_list , init_declarator
init_declarator:
declarator
| declarator = initializer
storage_class_specifier:
typedef
| extern
| static
| auto
| register
type_specifer:
void
| char
| short
| int
| long
| float
| double
| signed
| unsigned
| struct_or_union_specifer
| enum_specifier
| typedef_name
struct_or_union_specifer:
struct_or_union [identifier] { struct_declaration_list}
| struct_or_union identifier
struct_or_union:
struct
| union
struct_declaration_list:
struct_declaration
| struct_declaration_list struct_declaration
struct_declaration:
specifier_qualifier_list struct_declarator_list;
specifier_qualifier_list:
type_specifer [specifier_qualifier_list]
| type_qualifier [specifier_qualifier_list]
struct_declarator_list:
struct_declarator
| struct_declarator_list, struct_declarator
struct_declarator:
declarator
| [declarator] : constant_expression
enum_specifier:
enum [identifier] {enumerator_list}
| enum identifier
enumerator_list:
enumerator
| enumerator_list, enumerator
enumerator:
enumeration_constant
| enumeration_constant = constant_expression
enumeration_constant:
identifier
type_qualifier:
const
| volatile
parameter_declaration:
declaration_specifier declarator
| declaration_specifier [abstract_declarator]
declarator:
[pointer] direct_declarator
abstract_declarator:
pointer
| [pointer] direct_abstract_delarator
direct_declarator:
identifier
| (declarator)
| direct_declarator '[' [ constant_expression ] ']'
| direct_declarator (parameter_type_list)
| direct_declarator ( [identifier_list] )
direct_abstract_delarator:
(abstract_declarator)
| [direct_abstract_delarator] '[' [constant_expression] ']'
| [direct_abstract_delarator] ( [parameter_type_list] )
pointer:
* [type_qualifier_list]
| * [type_qualifier_list] pointer
type_qualifier_list:
type_qualifier
| type_qualifier_list type_qualifier
parameter_type_list:
parameter_list
| parameter_list, ...
parameter_list:
parameter_declaration
| parameter_list, parameter_declaration
identifier_list:
identifier
| identifier_list, identifier
type_name:
specifier_qualifier_list [abstract_declarator]
specifier_qualifier_list:
type_specifer [specifier_qualifier_list]
| type_qualifier [specifier_qualifier_list]
typedef_name:
identifier
initializer:
assignment_expression
| {initializer_list}
| {initializer_list,}
initializer_list:
initializer
| initializer_list, initializer
3. 语句
statement:
labeled_statement
| compound_statement
| expression_statement
| selection_statement
| iteration_statement
| jump_statement
labeled_statement:
identifier : statement
| case constant_expression : statement
| default : statement
compound_statement:
{[declaration_list] [statement_list]}
declaration_list:
declaration
| declaration_list declaration
statement_list:
statement
| statement_list statement
expression_statement:
[expression] ;
selection_statement:
if(expression) statement
| if(expression) statement else statement
| switch(expression) statement
iteration_statement:
while(expression) statement
| do statement while(expression) ;
| for([expression] ; [expression] ; [expression]) statement
jump_statement:
goto identifier ;
| continue ;
| break ;
| return [expression] ;
4. 外部定义
translation_unit:
external_declaration
| translation_unit external_declaration
external_declaration:
functionn_definition
| declaration
functionn_definition:
| [declaration_specifier] declarator [declaration_list] compound_statement
6. 预处理命令
preprocessing_file:
[group]
group:
gruop_part
| group gruop_part
gruop_part:
[pp_tokens] new_line
| if_section
| control_line
if_section:
if_group [elif_groups] [else_group] endif_line
if_group:
# if constant_expression new_line [group]
| # ifdef identifier new_line [group]
| #ifndef identifier new_line [group]
elif_groups:
elif_group
| elif_groups elif_group
elif_group:
# elif constant_expression new_line [group]
else_group:
# else new_line [group]
endif_line:
# endif new_line
control_line:
# include pp_tokens new_line
| # define identifier replacement_list new_line
| # define identifier lparen [identifier_list] ) replacement_list new_line
| # undef identifier new_line
| # line pp_tokens new_line
| # error [pp_tokens] new_line
| # pragma [pp_tokens] new_line
| # new_line
lparen:
前面没有空白符的左括号(
replacement_list:
[pp_tokens]
pp_tokens:
preprocessing_token
| pp_tokens preprocessing_token
new_line:
换行符\n