C++ Blog

Boost.Spirit: Semantic Actions

Posted in boost, Uncategorized by Umesh Sirsiwal on January 1, 2010

Note that this post applies to Spirit.Classic or 2.0.

In the first two posts we introduced basic parsing techniques defined by Spirit. The parsing is only useful, if we can do something with parsing results. In Spirit you achieve this with semantic actions.

Semantic actions are expected to use functional programming paradigms. The most basic semantic action has the following prototype:

 void f(IteratorT first, IteratorT last);

or as functor

  struct my_functor
    {
        void operator()(IteratorT first, IteratorT last) const;
    };

Here is an example of simple action from Spirit user guide:

 void
    my_action(char const* first, char const* last)
    {
        std::string str(first, last);
        std::cout << str << std::endl;
    }

Applying the action is rather simple. You specify them in [] after the rule. From previous e-mail address parser, you can write:

 
    r = *(mailTo | anychar_p);
    mailTo = "mailTo:" >> emailAddress[&my_action];
    emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];

my_action will be called with iterator pointing to start and end of the parsed e-mail address. The above action will result in printing all the e-mail addresses in the input.

In a lot of cases, it is wasteful to call actions with the Iterators pointing to first and last. After all, Spirit has just parsed the contents. For this purpose the boost defines specialized actions. For example

  void func(NumT val);

or equivalent functor

struct fctr
    {
        void operator()(NumT val) const;
    };

can be applied to any numeric parsers (real_p, ureal_p, int_p, uint_p). Similar actions exist for other other types of parsers. Please check Spirit guide for details.

The complete program looks like:

#include <boost/spirit/core.hpp>
#include <iostream>
using namespace boost::spirit;

void
my_action(const char* first, const char* last)
{
std::string str(first, last);
std::cout << str << std::endl;
}

struct my_grammar : public grammar<my_grammar>
{
template <typename ScannerT>
struct definition
{
rule<ScannerT>  r, mailTo, emailAddress;
definition(my_grammar const& self)  {
r = *(mailTo | anychar_p);
mailTo = “mailTo:” >> emailAddress[&my_action];
emailAddress = lexeme_d[ +alnum_p >> ‘@’ >> +alnum_p >> *(‘.’ >> +alnum_p)];
}
rule<ScannerT> const& start() const { return r; }
};
};

int main(){
const char* str = “mailTo:a@b.com  test mailTo:d@f.com”;
my_grammar g;
if (parse(str, str + strlen(str), g, space_p).full)
std::cout << “parsing succeeded\n”;
else
std:: cout << “parsing failed\n”;
return 0;
}
In addition, there is a large selection of pre-defined actions. You an find them here.

Advertisements

Boost.Spirit – Easy to use Parser

Posted in boost by Umesh Sirsiwal on December 20, 2009

Note that this post applies to Spirit.Classic or 2.0.

I recently had write a parser in C++ and decided to give Boost.Spirit a chance. I was delighted with ease of use of the parser itself. It was a steep learning curve to get started. However, once I got started, it made life significantly simple.

Boost.Spirit provides a very simple way to create a parser using EBNF grammar. Let us consider an e-mail address parser as an example.
+alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);
Simple! Let us understand the above statement. In Boost.Spirit rule<> template defines basic grammar rule. A rule is made up of parsers and other rules. In Boost.Spirit built-in parser all have _p as suffix. The alnum_p parser matches any character or number. ‘+’ represents operator which matches one more more instances of the enclosed parser.

‘>>’ is an overloaded operator which simply says followed by.’@’ is short form of parser which matches character ‘@’.

So the above statement says any alphanumeric string followed by ‘@’ followed by any other alpha numeric string. This can than be followed by any number of “.<alphanumber>”.

By default the >> includes white space skipping. This implies the above parser will match (a@b as well as a @ b).

We don’t want white space skipping for e-mail address. So we need to modify the above rule such that >> does not skip white space. For this and other purposes, the Spirit allows directives. These are special instructions which can be applied to a part of the rule. No white space skipping is achieved by using lexeme_d.
lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);];

I have no clue why the name. But the suffix is _d for all directives.OK! we got a rule. Now let us check if this rule matches a specified string:

bool isEmailAddress(const std::string& str){
return parse(str, lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);], space_p).full;
}

The above statement will check if the supplied string is an e-mail address or not.

In the next post we will discuss how to create more complex examples as well as deal with parse actions.