C++ Blog

Functional Programming & Boost.Lambda Programming

Posted in boost, templates by Umesh Sirsiwal on December 26, 2009

I will like to take a break from Boost.Spirit related topic to talk about Functional Programming. Understanding of Functional Programming is essential for understanding how Spirit related actions are implemented. Typical object oriented programming paradigm combines mutable data and set of algorithms which operates on data. In contrast FP avoids mutable data or state and emphasizes on application of functions. From Wikipedia:

In practice, the difference between a mathematical function and the notion of a “function” used in imperative programming is that imperative functions can have side effects, changing the value of already calculated computations. Because of this they lack referential transparency, i.e. the same language expression can result in different values at different times depending on the state of the executing program. Conversely, in functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) both times. Eliminating side-effects can make it much easier to understand and predict the behavior of a program, which is one of the key motivations for the development of functional programming.

STL provides set of algorithms which can be viewed as FP. For example std::copy will always copy source to destination.

The next concept in FP is called currying, technique in which a function is applied to its arguments one at a time, with each application returning a new function that accepts the next argument. The STL function bind1st, bind2nd and binary_compose can be used for currying

All funcitonal programming languages are based on Lambda Calculus introduced in 1930s which is based on anonymous functions and currying. You can read more on it here.

Boost.Lambda and Boost.Phoenix introduce formal system of funcitonal programming and lambda expressions to C++. I typically use Boost.Lambda. I only use Phoenix when using Spirit since they are more closely integrated. They share a lot of similar concept. This post will discuss Boost.Lambda. The next post will use these concepts to introduce Phonix as applied to Boost.Spirit.

Let us start with a simple lambda expression _1 = 1 is definition of an anonymous function which takes one argument and sets it to 1. So to initialize a variable to 1 you could write:

(_1 + 1)(i);

or if you wanted to initialize all variables in a container to 1, you would write something like:

std::vector v;

std::for_each(v.begin(), v.end(), _1=1);

or to print everyobject in a container:

std::for_each(v.begin(), v.end(), std::cout << _1);

What if you wanted to define a function with calls a function with one argument and assigns results to the same argument:

That function will be _1 = bind(foo, _1);

Bind is generalized version of std::bind1st and bind2nd. With the above expression in place one can write:

(_1 = bind(foo, _1))(i);

Which will be equivalent to:

i = foo(i);

So with for_each one can write:

std::for_each(v.begin(), v.end(); _1 = bind(foo, _1));

_1 in the above examples are called placeholders which are equlvalent of lambda in the lambda expression. The above can be generalized with larger number of placeholders. For example one can write: (_1 + _2) to create a function which adds its two arguments. Boost Lambda defines up to 3 place holders. Higher order functions can be constructed using currying.

A function definition which used _2 takes 2 arguments and one which uses _3 takes 3 arguments.

For example _3 = 1, takes 3 argument and

(_3 = 1)(k) will have compile time errors.

(_3 = 1)(i,j,k) is good and is equivalent to k = 1.

Above, we are using simple expressions to create anonymous functions which do not have side effects and depend only on their arguments. As alluded earlier BLL also provides bind expression as generalization of bind1st and bind2nd. With BLL bind expression it is possible to bind any method with up to 9 arguments for delayed executions. It can target C++ class members, or simple functions as target. The syntax used is similar to bind1st and bind2nd.

Advertisements
Tagged with: , ,

Boost.Spirit : Complete Parser Design

Posted in boost, templates by Umesh Sirsiwal on December 25, 2009

Note that this post applies to Spirit.Classic or 2.0.

In the last post we discussed how to write a simple parser using Spirit. Most real life parsers are a lot more complex requiring several rules and combining them to create a full grammar. Spirit provides support to declare a full grammar. Let us create a parser which looks for all mailto: tag with associated e-mail address in an input. All Spirit grammar are derived from grammar class. The following is definition of grammar:

 struct my_grammar : public grammar<my_grammar>
    {
        template <typename ScannerT>
        struct definition
        {
            rule<ScannerT>  r;
            definition(my_grammar const& self)  { r = /*..define here..*/; }
            rule<ScannerT> const& start() const { return r; }
        };
    };

The my_grammar is derived from grammar class using curiously recurring template pattern (CRTP). For those who are new to CRTP, it was introduced by Jim Copelien and Wikipedia has some details on it. Using CRTP it is possible to achieve effect of dynamic polymorphism without taking cost of virtual function. More on CRTP another day.

Each grammar class must define another nested structure called definition. The definition must have a function called start which returns starting parser rule. So let us start writing rules for e-mail address parser. The mailto rule can be written as:

     mailTo = "mailto:" >> emailAddress;

From our previous post, emailAddress can be written as:

     emailAddress =  lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];

Of course the full input has things other than the mailto: tags. So we must skip other characters. You do that as follows:

       r = mailTo | anychar_p;

The above rule is saying that try matching the input with mailTo and if it does not match mailTo rule, match it against any character parser.

So combining these two the grammar constructor now can be written as:

      definition(my_grammar const& self)  { 
            r = mailTo | anychar_p;
            mailTo = "mailTo:" >> emailAddress;
            emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
      }

First thing one notices is that the rule “r” refers to rules “emailAddress” and “mailTo” before they are initialized. It works because the rules are held by reference and not by value. The referred rule can be initialized anytime. This does complicate programming a little bit. It means that it is user’s responsibility to make sure that the referred rules never go out of scope. So emailAddress cannot be local to definition. Typically, one declares all rules as part of the definition call definition. The full grammar now becomes:

 struct my_grammar : public grammar<my_grammar>
    {
        template <typename ScannerT>
        struct definition
        {
            rule<ScannerT>  r;
            rule<ScannerT>  emailAddress mailTo;
            definition(my_grammar const& self)  { 
                r = mailTo | anychar_p;
                mailTo = "mailTo:" >> emailAddress;
                emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
            }
            rule<ScannerT> const& start() const { return r; }
        };
    };

OK! now we have the grammar. Let us use it.

   my_grammar g;
    if (parse(first, last, g, space_p).full)
        cout << "parsing succeeded\n";
    else
        cout << "parsing failed\n";

Here first and last are iterators pointing to first and the last characters in the input.

Note that the above grammar matches exactly one e-mail address or any other character. This can be easily be modified to match all e-mail addresses.

The new rule will be:

r = *(mailTo | anychar_p);


Boost.Spirit – Easy to use Parser

Posted in boost by Umesh Sirsiwal on December 20, 2009

Note that this post applies to Spirit.Classic or 2.0.

I recently had write a parser in C++ and decided to give Boost.Spirit a chance. I was delighted with ease of use of the parser itself. It was a steep learning curve to get started. However, once I got started, it made life significantly simple.

Boost.Spirit provides a very simple way to create a parser using EBNF grammar. Let us consider an e-mail address parser as an example.
+alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);
Simple! Let us understand the above statement. In Boost.Spirit rule<> template defines basic grammar rule. A rule is made up of parsers and other rules. In Boost.Spirit built-in parser all have _p as suffix. The alnum_p parser matches any character or number. ‘+’ represents operator which matches one more more instances of the enclosed parser.

‘>>’ is an overloaded operator which simply says followed by.’@’ is short form of parser which matches character ‘@’.

So the above statement says any alphanumeric string followed by ‘@’ followed by any other alpha numeric string. This can than be followed by any number of “.<alphanumber>”.

By default the >> includes white space skipping. This implies the above parser will match (a@b as well as a @ b).

We don’t want white space skipping for e-mail address. So we need to modify the above rule such that >> does not skip white space. For this and other purposes, the Spirit allows directives. These are special instructions which can be applied to a part of the rule. No white space skipping is achieved by using lexeme_d.
lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);];

I have no clue why the name. But the suffix is _d for all directives.OK! we got a rule. Now let us check if this rule matches a specified string:

bool isEmailAddress(const std::string& str){
return parse(str, lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);], space_p).full;
}

The above statement will check if the supplied string is an e-mail address or not.

In the next post we will discuss how to create more complex examples as well as deal with parse actions.