C++ Blog

Boost.Spirit: Semantic Actions

Posted in boost, Uncategorized by Umesh Sirsiwal on January 1, 2010

Note that this post applies to Spirit.Classic or 2.0.

In the first two posts we introduced basic parsing techniques defined by Spirit. The parsing is only useful, if we can do something with parsing results. In Spirit you achieve this with semantic actions.

Semantic actions are expected to use functional programming paradigms. The most basic semantic action has the following prototype:

 void f(IteratorT first, IteratorT last);

or as functor

  struct my_functor
    {
        void operator()(IteratorT first, IteratorT last) const;
    };

Here is an example of simple action from Spirit user guide:

 void
    my_action(char const* first, char const* last)
    {
        std::string str(first, last);
        std::cout << str << std::endl;
    }

Applying the action is rather simple. You specify them in [] after the rule. From previous e-mail address parser, you can write:

 
    r = *(mailTo | anychar_p);
    mailTo = "mailTo:" >> emailAddress[&my_action];
    emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];

my_action will be called with iterator pointing to start and end of the parsed e-mail address. The above action will result in printing all the e-mail addresses in the input.

In a lot of cases, it is wasteful to call actions with the Iterators pointing to first and last. After all, Spirit has just parsed the contents. For this purpose the boost defines specialized actions. For example

  void func(NumT val);

or equivalent functor

struct fctr
    {
        void operator()(NumT val) const;
    };

can be applied to any numeric parsers (real_p, ureal_p, int_p, uint_p). Similar actions exist for other other types of parsers. Please check Spirit guide for details.

The complete program looks like:

#include <boost/spirit/core.hpp>
#include <iostream>
using namespace boost::spirit;

void
my_action(const char* first, const char* last)
{
std::string str(first, last);
std::cout << str << std::endl;
}

struct my_grammar : public grammar<my_grammar>
{
template <typename ScannerT>
struct definition
{
rule<ScannerT>  r, mailTo, emailAddress;
definition(my_grammar const& self)  {
r = *(mailTo | anychar_p);
mailTo = “mailTo:” >> emailAddress[&my_action];
emailAddress = lexeme_d[ +alnum_p >> ‘@’ >> +alnum_p >> *(‘.’ >> +alnum_p)];
}
rule<ScannerT> const& start() const { return r; }
};
};

int main(){
const char* str = “mailTo:a@b.com  test mailTo:d@f.com”;
my_grammar g;
if (parse(str, str + strlen(str), g, space_p).full)
std::cout << “parsing succeeded\n”;
else
std:: cout << “parsing failed\n”;
return 0;
}
In addition, there is a large selection of pre-defined actions. You an find them here.

Advertisements

Functional Programming & Boost.Lambda Programming

Posted in boost, templates by Umesh Sirsiwal on December 26, 2009

I will like to take a break from Boost.Spirit related topic to talk about Functional Programming. Understanding of Functional Programming is essential for understanding how Spirit related actions are implemented. Typical object oriented programming paradigm combines mutable data and set of algorithms which operates on data. In contrast FP avoids mutable data or state and emphasizes on application of functions. From Wikipedia:

In practice, the difference between a mathematical function and the notion of a “function” used in imperative programming is that imperative functions can have side effects, changing the value of already calculated computations. Because of this they lack referential transparency, i.e. the same language expression can result in different values at different times depending on the state of the executing program. Conversely, in functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) both times. Eliminating side-effects can make it much easier to understand and predict the behavior of a program, which is one of the key motivations for the development of functional programming.

STL provides set of algorithms which can be viewed as FP. For example std::copy will always copy source to destination.

The next concept in FP is called currying, technique in which a function is applied to its arguments one at a time, with each application returning a new function that accepts the next argument. The STL function bind1st, bind2nd and binary_compose can be used for currying

All funcitonal programming languages are based on Lambda Calculus introduced in 1930s which is based on anonymous functions and currying. You can read more on it here.

Boost.Lambda and Boost.Phoenix introduce formal system of funcitonal programming and lambda expressions to C++. I typically use Boost.Lambda. I only use Phoenix when using Spirit since they are more closely integrated. They share a lot of similar concept. This post will discuss Boost.Lambda. The next post will use these concepts to introduce Phonix as applied to Boost.Spirit.

Let us start with a simple lambda expression _1 = 1 is definition of an anonymous function which takes one argument and sets it to 1. So to initialize a variable to 1 you could write:

(_1 + 1)(i);

or if you wanted to initialize all variables in a container to 1, you would write something like:

std::vector v;

std::for_each(v.begin(), v.end(), _1=1);

or to print everyobject in a container:

std::for_each(v.begin(), v.end(), std::cout << _1);

What if you wanted to define a function with calls a function with one argument and assigns results to the same argument:

That function will be _1 = bind(foo, _1);

Bind is generalized version of std::bind1st and bind2nd. With the above expression in place one can write:

(_1 = bind(foo, _1))(i);

Which will be equivalent to:

i = foo(i);

So with for_each one can write:

std::for_each(v.begin(), v.end(); _1 = bind(foo, _1));

_1 in the above examples are called placeholders which are equlvalent of lambda in the lambda expression. The above can be generalized with larger number of placeholders. For example one can write: (_1 + _2) to create a function which adds its two arguments. Boost Lambda defines up to 3 place holders. Higher order functions can be constructed using currying.

A function definition which used _2 takes 2 arguments and one which uses _3 takes 3 arguments.

For example _3 = 1, takes 3 argument and

(_3 = 1)(k) will have compile time errors.

(_3 = 1)(i,j,k) is good and is equivalent to k = 1.

Above, we are using simple expressions to create anonymous functions which do not have side effects and depend only on their arguments. As alluded earlier BLL also provides bind expression as generalization of bind1st and bind2nd. With BLL bind expression it is possible to bind any method with up to 9 arguments for delayed executions. It can target C++ class members, or simple functions as target. The syntax used is similar to bind1st and bind2nd.

Tagged with: , ,

Boost.Spirit : Complete Parser Design

Posted in boost, templates by Umesh Sirsiwal on December 25, 2009

Note that this post applies to Spirit.Classic or 2.0.

In the last post we discussed how to write a simple parser using Spirit. Most real life parsers are a lot more complex requiring several rules and combining them to create a full grammar. Spirit provides support to declare a full grammar. Let us create a parser which looks for all mailto: tag with associated e-mail address in an input. All Spirit grammar are derived from grammar class. The following is definition of grammar:

 struct my_grammar : public grammar<my_grammar>
    {
        template <typename ScannerT>
        struct definition
        {
            rule<ScannerT>  r;
            definition(my_grammar const& self)  { r = /*..define here..*/; }
            rule<ScannerT> const& start() const { return r; }
        };
    };

The my_grammar is derived from grammar class using curiously recurring template pattern (CRTP). For those who are new to CRTP, it was introduced by Jim Copelien and Wikipedia has some details on it. Using CRTP it is possible to achieve effect of dynamic polymorphism without taking cost of virtual function. More on CRTP another day.

Each grammar class must define another nested structure called definition. The definition must have a function called start which returns starting parser rule. So let us start writing rules for e-mail address parser. The mailto rule can be written as:

     mailTo = "mailto:" >> emailAddress;

From our previous post, emailAddress can be written as:

     emailAddress =  lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];

Of course the full input has things other than the mailto: tags. So we must skip other characters. You do that as follows:

       r = mailTo | anychar_p;

The above rule is saying that try matching the input with mailTo and if it does not match mailTo rule, match it against any character parser.

So combining these two the grammar constructor now can be written as:

      definition(my_grammar const& self)  { 
            r = mailTo | anychar_p;
            mailTo = "mailTo:" >> emailAddress;
            emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
      }

First thing one notices is that the rule “r” refers to rules “emailAddress” and “mailTo” before they are initialized. It works because the rules are held by reference and not by value. The referred rule can be initialized anytime. This does complicate programming a little bit. It means that it is user’s responsibility to make sure that the referred rules never go out of scope. So emailAddress cannot be local to definition. Typically, one declares all rules as part of the definition call definition. The full grammar now becomes:

 struct my_grammar : public grammar<my_grammar>
    {
        template <typename ScannerT>
        struct definition
        {
            rule<ScannerT>  r;
            rule<ScannerT>  emailAddress mailTo;
            definition(my_grammar const& self)  { 
                r = mailTo | anychar_p;
                mailTo = "mailTo:" >> emailAddress;
                emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
            }
            rule<ScannerT> const& start() const { return r; }
        };
    };

OK! now we have the grammar. Let us use it.

   my_grammar g;
    if (parse(first, last, g, space_p).full)
        cout << "parsing succeeded\n";
    else
        cout << "parsing failed\n";

Here first and last are iterators pointing to first and the last characters in the input.

Note that the above grammar matches exactly one e-mail address or any other character. This can be easily be modified to match all e-mail addresses.

The new rule will be:

r = *(mailTo | anychar_p);


Boost.Spirit – Easy to use Parser

Posted in boost by Umesh Sirsiwal on December 20, 2009

Note that this post applies to Spirit.Classic or 2.0.

I recently had write a parser in C++ and decided to give Boost.Spirit a chance. I was delighted with ease of use of the parser itself. It was a steep learning curve to get started. However, once I got started, it made life significantly simple.

Boost.Spirit provides a very simple way to create a parser using EBNF grammar. Let us consider an e-mail address parser as an example.
+alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);
Simple! Let us understand the above statement. In Boost.Spirit rule<> template defines basic grammar rule. A rule is made up of parsers and other rules. In Boost.Spirit built-in parser all have _p as suffix. The alnum_p parser matches any character or number. ‘+’ represents operator which matches one more more instances of the enclosed parser.

‘>>’ is an overloaded operator which simply says followed by.’@’ is short form of parser which matches character ‘@’.

So the above statement says any alphanumeric string followed by ‘@’ followed by any other alpha numeric string. This can than be followed by any number of “.<alphanumber>”.

By default the >> includes white space skipping. This implies the above parser will match (a@b as well as a @ b).

We don’t want white space skipping for e-mail address. So we need to modify the above rule such that >> does not skip white space. For this and other purposes, the Spirit allows directives. These are special instructions which can be applied to a part of the rule. No white space skipping is achieved by using lexeme_d.
lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);];

I have no clue why the name. But the suffix is _d for all directives.OK! we got a rule. Now let us check if this rule matches a specified string:

bool isEmailAddress(const std::string& str){
return parse(str, lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);], space_p).full;
}

The above statement will check if the supplied string is an e-mail address or not.

In the next post we will discuss how to create more complex examples as well as deal with parse actions.

Domain Specific Embedded Langauge

Posted in boost by Umesh Sirsiwal on January 6, 2009

We typically program in a general purpose programming language like C, C++, Java, Perl, etc. These languages are designed to solve all the problems of the world and no problem in particular. However, it will be a lot more elegant if sound processing library exposed a sound specific language, an image processing library exposed an image processing specific language not just API calls which is the case currently. In effect these will enhance overall general purpose language. These languages are called Domain Specific Embedded Language (DSEL).  For example:

result  = src * dest;

should multiply two numbers for arithmetic domain. For images, it may layer the two images, for music this expression may allow mixing of the two tunes. As you probably know, in C++ this can easily be achieved using operator overloading. But, the DSELs probably want to be significantly more expressive than depicted above. C++ is probably the best language for defining a DSEL since it allows rich set of operator overloading and metaprogramming constructs to implement your language. This article compares various programming language’s capability to define DSEL.

It is possible to define a complete DSEL manually in C++. However, anything beyond overloading of small set of operators can be time consuming. Boost 1.37.0 now includes Boost.Proto. The Boost.Proto provides facilities to quickly develop DSEL. From Boost.Proto documentation:

Expression Templates are an advanced technique that C++ library developers use to define embedded mini-languages that target specific problem domains. The technique has been used to create efficient and easy-to-use libraries for linear algebra as well as to define C++ parser generators with a readable syntax. But developing such a library involves writing an inordinate amount of unreadable and unmaintainable template mumbo-jumbo. Boost.Proto eases the development of domain-specific embedded languages (DSELs). Use Proto to define the primitives of your mini-language and let Proto handle the operator overloading and the construction of the expression parse tree. Immediately evaluate the expression tree by passing it a function object. Or transform the expression tree by defining the grammar of your mini-language, decorated with an assortment of tree transforms provided by Proto or defined by you. Then use the grammar to give your users short and readable syntax errors for invalid expressions! No more mumbo-jumbo — an expression template library developed with Proto is declarative and readable.

In short, Proto is a DSEL for defining DSELs.

Boost.Proto uses template meta-programming to implement the help bild DSELs quickly and efficiently. Boost.Proto can be slightly overwhelming for the first time users and specifically those who are not used to of templates meta-programming. If you want to use Boost.Proto remember to read the user’s guide a couple of times before trying it out.

Good luck.

Happy DSELing.

Shared Memory based C++ programming

Posted in boost by Umesh Sirsiwal on January 5, 2009

When it comes to shared memory based programming, we C++ users have traditionally found our selves at disadvantage. Most our beloved constructs like shared pointers, various containers, scoped locks, etc. don’t work very well when programming in C++. In addition, we cannot place objects which use inheritance in shared memory since the virtual pointers cannot work in shared memory. These limitation has implied that the shared memory is rarely used in C++ programming. Whenever it is used, the object placed in the shared memory are more like C constructs rather then C++ constructs. This is pity since using shared memory provides significant performance advantage in certain conditions.

Boost 1.35 includes Interproces which provides C++ based programming environment for shared memory. The facilities provided are:

  • Shared memory based pool of memory and capability to new C++ objects from this pool
  • Mutexes placed in the shared memory
  • Various smart pointers
  • Various STL-like containers which can be placed in shared memory. These containers include maps, deque, set, list, flat_map, flat_set, slist, and string

In addition various boost containers can be placed in shared memory. These containers include the multi-index container.

Hopefully these constructs make placing objects in shared memory a little easier for you all. Please note that placing data in shared memory has traditionally been dangerous due to complexity in debugging crashes, not easily understood deadlocks, etc. Please use shared memory based containers carefully.

C++ Monitor Pattern

Posted in boost by Umesh Sirsiwal on January 5, 2009

Locking is an integral part of concurrent programming. A usual class used with threads looks like:

class A {
void F() {
ScopedLock l(m);
....
....
}
Void G()
{
ScopedLock l(m);
....
....
}
mutex m;
};
This guarantees that the class data part of class A are always protected from multiple thread access. However, this code is fragile and difficult to maintain. It moves the responsibility of maintaining concurrency to the class writer. Also, sooner or later somebody will make a change of the following form:

Void G()
{
ScopedLock l(m);
....
....
F();
}

This will deadlock unless you were using recursive mutex.

Also, a construct like this is difficult to use if you are trying to use third party library like STL. For example, let us say you were using std::map. In order to enforce appropriate concurrency control you must wrap every map method you are going to use with a lock that implies you must implement a forwarding class which takes the user argument performs a lock and then call the map function. This is tedious, time consuming and absolute waste of time.

Won’t it be nice if there was a way to generically wrap class and provided method forwarders? The forwarder will provide the lock/unlock capability and the class implementer will not have to worry about concurrency issues. Welcome to monitor pattern. This pattern was first introduced in this paper by Douglas Schmidt.The monitor object guarantees that only one method runs within an object at any given point of time. Because the monitor pattern guarantees that there is any need to perform locking within class methods.

One of the C++ monitor pattern implementation is part of libpoet. ACE provides another implementation of the monitor pattern. I am sure there are other implementations. The libpoet implementation is partially based on the this paper by Bjarne Stroustrup. The library includes two implementations of the monitor pattern. One is based on smart pointer and the other is based on inheritance and has additional functionality. In most cases the pointer based implementation is sufficient and is considered in this post.

The monitor pointer is implemented as smart pointer and is called monitor_ptr. Like any other smart pointer it provides an object wrapper. The wrapper in this case takes a lock before the object is called and releases the lock once the object call is complete.  To use the class A defined above with monitor_ptr we will remove the locks from the class and wrap A as follows:

poet::monitor_ptr<A>  a(new A);

Now calls to  a->F() and a->G() will result in automatic locks taken by the monitor_ptr and call to G() will wait till call to F is complete.

class A {
void F() {
....
....
}
Void G()
{
....
....
}
};
Look no locks. If we need to use a map with concurrent programming we will write something like:

poet::monitor_ptr< std::map<int, std::string> > m(new  std::map<int, std::string>);

We will now access the monitored map using something like:

m->insert(1,std::string("abc"));

without worrying about the locking.

The inheritance based implementation provides additional functionality since it can use condition wait constructs.

Of course it will be much nicer to have a language level support (similar to synchronized keyword in Java) for the monitor pattern. That is a lot more elegant but till the language supports it we will have to live with the smart pointer based implementation.

C++ Weak Pointers

Posted in boost by Umesh Sirsiwal on January 5, 2009

C++ TR1 has added concept of weak pointers. Most of the C++ developers understand use of shared pointers but they don’t understand use of weak pointers very well.

Weak Pointer Definition

According to boost:

The weak_ptr class template stores a “weak reference” to an object that’s already managed by a shared_ptr. To access the object, a weak_ptr can be converted to a shared_ptr using the shared_ptr constructor or the member function lock. When the last shared_ptr to the object goes away and the object is deleted, the attempt to obtain a shared_ptr from the weak_ptr instances that refer to the deleted object will fail: the constructor will throw an exception of type boost::bad_weak_ptr, and weak_ptr::lock will return an empty shared_ptr.

There is no way to directly access underlying resources managed by the weak_ptr. In order to access the underlying pointer the weak pointer must be converted to a shared_ptr using lock or shared_ptr constructor.


shared_ptr<int> p(new int(5));

weak_ptr<int> q(p);

// some time later

if(shared_ptr<int> r = q.lock()) {

// use *r

}

Weak Pointer Usage

This all looks interesting but where will one use it. DDJ covered this subject in one of its articles. Consider an application where thread reads messages from a queue operators on it and then sends response back to the queue. This is tricky if the processing of the message can take a significant time. In order to correctly send back the response the message must hold a reference to the queue. The problem is that the queue may close before the response is ready to be sent. One possible solution is to store shared pointer to queue in the message structure:

shared_ptr<QueueType> q;

struct {

shared_ptr<QueueType> q;

} MessageType;

MessageType m;

m = q->Get();

m.q = q; //m.q holds a shared reference to the queue

...

...

m.q.Send(data);

If the queue is closed after putting a reference to the queue in the message, the queue cannot be destroyed since it is shared pointer. In some of the applications number of messages may be 1000s and can significantly delay queue destruction. A far better alternative is to put a weak_pointer to the queue in the message structure and check the queue state before calling the send. The modified code looks like the following:

shared_ptr<QueueType> q;

struct {

weak_ptr<QueueType> q;

} MessageType;

MessageType m;

m = q->Get();

m.q = q; //m.q holds a  weak reference to the queue. If the queue is closed, the underlying object can be destroyed and resources reclaimed.

...

...

if (shared_ptr<QueueType> realq = m.q.lock() ) {

realq.Send(data);

}

This code segment allows early destruction of queue to reclaim resources.