yacc

Actions

An action is a set of C statements enclosed in curly braces, '{' and '}'. The user can associate one or more actions with each grammar rule; these actions are performed when the rule is recognized. An action can appear anywhere in the list of symbols defining a rule, including before the first symbol; usually, actions follow the final symbol in the definition. Actions can return values, obtain the values returned by previous actions, and use values for tokens returned by the lexical analyzer. They can also carry out other tasks that can be programmed in C, such as input and output operations, calls to subroutines, and modification of variables.

The following are examples of grammar rules with actions:

   A       :  '('  B  ')'
           {
                  hello( 1, "abc" );
           }
           ;

and:

   XXX     :  YYY  ZZZ
           {
                   printf("a message\n");
                   flag = 25;
           }
           ;

Values of symbols

Each symbol in a grammar rule, including the one to the left of the colon, may have some value associated with it. For a terminal symbol, this value can be assigned by the lexical analyzer (for example, the literal value of an identifier). Non-terminal symbols recognized by the parser can have values associated with them by parser actions. These values can be numbers, text, or another kind of data structure.

The dollar-sign symbol ($) is used in actions to access the value of a symbol. The pseudo-variable $$ represents the value returned by the action. For example, the following action returns the value '1'.

   {  $$ = 1;  }

If the action follows the final symbol in the definition, then $$ is the value associated with the symbol to the left of the colon, and is that symbol's value when it appears to the right of the colon in another grammar rule.

To obtain the values returned by previous actions and the lexical analyzer, the action can use the pseudo-variables $1, $2, ... $n. $n refers to the value of the nth symbol or action to the right of the colon. In the following rule, $2 is the value returned by C, and $3 is the value returned by D:

   A   :  B  C  D   ;

Consider the rule:

   expr   :   '('  expr  ')'   ;

It is expected that the value returned by this rule is the value of the expr within the parentheses. Since the first component of the action is the literal left parenthesis, the desired result can be obtained with the following action:

   expr   :    '('  expr  ')'
          {
               $$ = $2 ;
          }

Default action

By default, the value of a rule (that is, the value assigned to the symbol to the left of the colon) is the value of the first element in the definition, ($1). Thus, grammar rules such as the following example, which has only one symbol to the right of the colon, often need not have an explicit action:

   A   :   B    ;

This example is equivalent to:

Actions in the middle of rules

In previous examples, all actions come at the end of rules. Sometimes it is desirable to have an action take place before a rule is fully parsed. yacc permits an action to be written in the middle of a rule as well. This action can return a value that is accessible by the actions to the right of it through the usual $ mechanism. In turn, it can access the values returned by the symbols or actions to its left. The following example of such a rule sets x to 1, and sets y to the value returned by C.

The first action is given a value by the assignment to $$. Because that action is the second component of the list to the right of the colon, its value is referred to in subsequent actions as $2. The value returned by C, which would normally have been $2, is now $3.

yacc treats the previous example as if it were written as follows, where ACT is an empty action:

   ACT   :   / empty /
          {
              $$ = 1;
          }
          ;
   
   A      :   B  ACT  C
          {
              x = $2;
              y = $3;
          }
          ;

Accessing left-context symbols

The following discussion is somewhat advanced and therefore should be given careful examination.

An action associated with the left-hand symbol in a rule may need to refer to values associated with symbols that occurred before the current left-hand symbol. These values are referred to as left-context values, because they are associated with symbols that appeared to the left of the current left-hand symbol in another rule in the specification. Consider the following yacc specification for a grammar that recognizes dates:

   %token t_MONTH t_DAY t_YEAR
   %union{
           char *text;
           int  ival;
   };
   %%
   date: year day month
   
   month : t_MONTH
           { if (!strcmp($1,"February"))
                   if ($0==29 && ($-1)%4!=0)
                           printf("Too many days!\n");
           }
           ;
   

   day     : t_DAY
           {
                   $$ = $1;
           }
   

   year    : t_YEAR ;
           {
                   $$ = $1;
           }

In this example the lexical analyzer routine associates an integer value with the tokens t_DAY and t_YEAR, and a character string with the token t_MONTH.

The action associated with the symbol ``month'' checks whether ``February 29'' occurs in a non-leap year. To do so, it needs to know what values are associated with the ``day'' and ``year'' symbols. These symbols appear to the left of ``month'' in the first rule in the specification, and so their values are left-context values with respect to the symbol ``month''.

There are two constructions for accessing left-context values. The value associated with the symbol immediately to the left of the current left-hand symbol is referred to as $0. Values farther to the left are referred to by constructions of the form $-n. For example, the pseudo-variable $-1 refers to the value associated with the symbol that is two symbols to the left of ``month''. In general, the pseudo-variable $-n refers to the value associated with the symbol that is n-1 symbols to the left of the current symbol.

In the action associated with ``month'' in the example, $0 refers to the value associated with ``day'', and $-1 refers to the value associated with ``year''.

Parse trees

In many applications, output is not done directly by the actions. A data structure, such as a parse tree, is constructed in memory, and transformations are applied to it before output is generated. Parse trees are particularly easy to construct, given routines to build and maintain the tree structure desired. The following example shows a C function node written so that the following call creates a node with label L and descendants n1 and n2, and returns a pointer to the newly-created node:

   node( L, n1, n2 )

Then a parse tree can be built by supplying actions such as:

   expr   :   expr  '+'  expr
          {
              $$ = node( '+', $1, $3 );
          }