C++ Blog

Boost.Spirit: Semantic Actions

Posted in boost, Uncategorized by Umesh Sirsiwal on January 1, 2010

Note that this post applies to Spirit.Classic or 2.0.

In the first two posts we introduced basic parsing techniques defined by Spirit. The parsing is only useful, if we can do something with parsing results. In Spirit you achieve this with semantic actions.

Semantic actions are expected to use functional programming paradigms. The most basic semantic action has the following prototype:

 void f(IteratorT first, IteratorT last);

or as functor

  struct my_functor
    {
        void operator()(IteratorT first, IteratorT last) const;
    };

Here is an example of simple action from Spirit user guide:

 void
    my_action(char const* first, char const* last)
    {
        std::string str(first, last);
        std::cout << str << std::endl;
    }

Applying the action is rather simple. You specify them in [] after the rule. From previous e-mail address parser, you can write:

 
    r = *(mailTo | anychar_p);
    mailTo = "mailTo:" >> emailAddress[&my_action];
    emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];

my_action will be called with iterator pointing to start and end of the parsed e-mail address. The above action will result in printing all the e-mail addresses in the input.

In a lot of cases, it is wasteful to call actions with the Iterators pointing to first and last. After all, Spirit has just parsed the contents. For this purpose the boost defines specialized actions. For example

  void func(NumT val);

or equivalent functor

struct fctr
    {
        void operator()(NumT val) const;
    };

can be applied to any numeric parsers (real_p, ureal_p, int_p, uint_p). Similar actions exist for other other types of parsers. Please check Spirit guide for details.

The complete program looks like:

#include <boost/spirit/core.hpp>
#include <iostream>
using namespace boost::spirit;

void
my_action(const char* first, const char* last)
{
std::string str(first, last);
std::cout << str << std::endl;
}

struct my_grammar : public grammar<my_grammar>
{
template <typename ScannerT>
struct definition
{
rule<ScannerT>  r, mailTo, emailAddress;
definition(my_grammar const& self)  {
r = *(mailTo | anychar_p);
mailTo = “mailTo:” >> emailAddress[&my_action];
emailAddress = lexeme_d[ +alnum_p >> ‘@’ >> +alnum_p >> *(‘.’ >> +alnum_p)];
}
rule<ScannerT> const& start() const { return r; }
};
};

int main(){
const char* str = “mailTo:a@b.com  test mailTo:d@f.com”;
my_grammar g;
if (parse(str, str + strlen(str), g, space_p).full)
std::cout << “parsing succeeded\n”;
else
std:: cout << “parsing failed\n”;
return 0;
}
In addition, there is a large selection of pre-defined actions. You an find them here.

Boost.Spirit : Complete Parser Design

Posted in boost, templates by Umesh Sirsiwal on December 25, 2009

Note that this post applies to Spirit.Classic or 2.0.

In the last post we discussed how to write a simple parser using Spirit. Most real life parsers are a lot more complex requiring several rules and combining them to create a full grammar. Spirit provides support to declare a full grammar. Let us create a parser which looks for all mailto: tag with associated e-mail address in an input. All Spirit grammar are derived from grammar class. The following is definition of grammar:

 struct my_grammar : public grammar<my_grammar>
    {
        template <typename ScannerT>
        struct definition
        {
            rule<ScannerT>  r;
            definition(my_grammar const& self)  { r = /*..define here..*/; }
            rule<ScannerT> const& start() const { return r; }
        };
    };

The my_grammar is derived from grammar class using curiously recurring template pattern (CRTP). For those who are new to CRTP, it was introduced by Jim Copelien and Wikipedia has some details on it. Using CRTP it is possible to achieve effect of dynamic polymorphism without taking cost of virtual function. More on CRTP another day.

Each grammar class must define another nested structure called definition. The definition must have a function called start which returns starting parser rule. So let us start writing rules for e-mail address parser. The mailto rule can be written as:

     mailTo = "mailto:" >> emailAddress;

From our previous post, emailAddress can be written as:

     emailAddress =  lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];

Of course the full input has things other than the mailto: tags. So we must skip other characters. You do that as follows:

       r = mailTo | anychar_p;

The above rule is saying that try matching the input with mailTo and if it does not match mailTo rule, match it against any character parser.

So combining these two the grammar constructor now can be written as:

      definition(my_grammar const& self)  { 
            r = mailTo | anychar_p;
            mailTo = "mailTo:" >> emailAddress;
            emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
      }

First thing one notices is that the rule “r” refers to rules “emailAddress” and “mailTo” before they are initialized. It works because the rules are held by reference and not by value. The referred rule can be initialized anytime. This does complicate programming a little bit. It means that it is user’s responsibility to make sure that the referred rules never go out of scope. So emailAddress cannot be local to definition. Typically, one declares all rules as part of the definition call definition. The full grammar now becomes:

 struct my_grammar : public grammar<my_grammar>
    {
        template <typename ScannerT>
        struct definition
        {
            rule<ScannerT>  r;
            rule<ScannerT>  emailAddress mailTo;
            definition(my_grammar const& self)  { 
                r = mailTo | anychar_p;
                mailTo = "mailTo:" >> emailAddress;
                emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
            }
            rule<ScannerT> const& start() const { return r; }
        };
    };

OK! now we have the grammar. Let us use it.

   my_grammar g;
    if (parse(first, last, g, space_p).full)
        cout << "parsing succeeded\n";
    else
        cout << "parsing failed\n";

Here first and last are iterators pointing to first and the last characters in the input.

Note that the above grammar matches exactly one e-mail address or any other character. This can be easily be modified to match all e-mail addresses.

The new rule will be:

r = *(mailTo | anychar_p);


C++0x Concepts Contd..

Posted in c++0x, templates by Umesh Sirsiwal on January 18, 2009

In previous article we had discussed BCCL. This post continues to discuss Concept with ConceptC++. The ConceptC++ was the playground for language level Concept ideas which eventually became part of currently proposed C++0x standard. In this post we take in-depth view to ConceptC++. These extensions were implemented by modifying GCC.

ConceptC++

Unlike the Boost Concept Check Library (BCCL), which is implemented purely using standard C++ constructs, these extensions are implemented by extending C++ language. By changing the C++ definition, it is possible to do things which was not possible otherwise. In particular, remember that with BCCL, it was not possible to make template parameter restriction part of template or function definition. With ConceptC++, this becomes now possible. In addition to enforcing restriction on instantiation of template, ConceptC++ also imposes restrictions on template implementation and makes sure that it only uses template parameter’s facility defined in the contract.

To understand ConceptC++, let us consider sum template which adds two parameters and returns the value:

template <typename T>
T sum(T a, T b){
   return a+b;
}

With Concept C++ we can rewrite this to:

template <CopyConstructable T>
T sum(T a, T b){
   return a+b;
}

This says T must be copy constructable. If we compile this template using ConceptC++ (even without any instantiation) we get the following error:

error: no match for 'operator+' in 'a+b'

That is because the template definition did not require T to implement operator+ but is trying to use it. This forces template signature to accurately define the template signature as:

template <CopyConstructable T>
  requires Addable<T>
T sum(T a, T b){
   return a+b;
}

In other words ConceptC++ puts restrictions on both template instantiation and template definition and binds them in a contract with respect to template parameter. This is what function, class or template definition, etc. are supposed to be.

We have been using magic words like CopyConstructable, Addable, etc. What are they? Are they new language features or are provided by library? What if we wanted to implement our own custom concepts? How will we go about that.

The C++ language addition provides capability to define new concepts while the library provides set of standard concepts. It is rather easy to build your own concept. For example let us look at definition of Addable concept:

auto concept Addable<typename T> {
  T operator+(T x, T y);
};

How simple can that be? The above definition is saying that the Addable concept implies that the template parameter must implement operator+ which takes two parameters of type T and returns T.

With this background, let us modify out example sort template. In case of sort, the template parameters are iterators and value pointed to by the iterator must implement less than operator. With ConceptC++ you can specify these restrictions as follows:

#include <concepts>
template<std::ForwardIterator T>
    requires LessThanComparable<T::value_type>
void sort(T b, T e)
{
  ....
}

As you can see ConceptC++ provides natural way of definining restrictions on template parameters allowing compilers to perform early error checking.


Performance


The ConceptC++ implements its functionality by reducing compile speed. However, just like BCCL the ConceptC++ does not incur any run time overhead.




	

Java Style Dynamic Proxy in C++

Posted in c++0x by Umesh Sirsiwal on January 9, 2009

This post defines C++ template based implementation for C++ proxy.

I recently came across Java’s proxy object. That is some cool technology. Using this one can wrap one object in another object in a completely typesafe way. From user’s point of view, the proxy object looks something like:

ProxiedObjectInterface obj = (ProxiedObjectInterface)ProxyCreator.newInstance( proxiedObject );

More details on proxy object are available here.

Since the interface exposed is of the proxied type, all calls to obj are typsef and can be checked by compiler. In java ProxyCreator class does not need to know anything about proxied object. This makes it very powerful.

I have been wondering if there is any easy way to create a similar facility in C++. I cannot think of one right away. I know we can do some cool things with template meta-programming. What will that be?

Tagged with: , , ,

Domain Specific Embedded Langauge

Posted in boost by Umesh Sirsiwal on January 6, 2009

We typically program in a general purpose programming language like C, C++, Java, Perl, etc. These languages are designed to solve all the problems of the world and no problem in particular. However, it will be a lot more elegant if sound processing library exposed a sound specific language, an image processing library exposed an image processing specific language not just API calls which is the case currently. In effect these will enhance overall general purpose language. These languages are called Domain Specific Embedded Language (DSEL).  For example:

result  = src * dest;

should multiply two numbers for arithmetic domain. For images, it may layer the two images, for music this expression may allow mixing of the two tunes. As you probably know, in C++ this can easily be achieved using operator overloading. But, the DSELs probably want to be significantly more expressive than depicted above. C++ is probably the best language for defining a DSEL since it allows rich set of operator overloading and metaprogramming constructs to implement your language. This article compares various programming language’s capability to define DSEL.

It is possible to define a complete DSEL manually in C++. However, anything beyond overloading of small set of operators can be time consuming. Boost 1.37.0 now includes Boost.Proto. The Boost.Proto provides facilities to quickly develop DSEL. From Boost.Proto documentation:

Expression Templates are an advanced technique that C++ library developers use to define embedded mini-languages that target specific problem domains. The technique has been used to create efficient and easy-to-use libraries for linear algebra as well as to define C++ parser generators with a readable syntax. But developing such a library involves writing an inordinate amount of unreadable and unmaintainable template mumbo-jumbo. Boost.Proto eases the development of domain-specific embedded languages (DSELs). Use Proto to define the primitives of your mini-language and let Proto handle the operator overloading and the construction of the expression parse tree. Immediately evaluate the expression tree by passing it a function object. Or transform the expression tree by defining the grammar of your mini-language, decorated with an assortment of tree transforms provided by Proto or defined by you. Then use the grammar to give your users short and readable syntax errors for invalid expressions! No more mumbo-jumbo — an expression template library developed with Proto is declarative and readable.

In short, Proto is a DSEL for defining DSELs.

Boost.Proto uses template meta-programming to implement the help bild DSELs quickly and efficiently. Boost.Proto can be slightly overwhelming for the first time users and specifically those who are not used to of templates meta-programming. If you want to use Boost.Proto remember to read the user’s guide a couple of times before trying it out.

Good luck.

Happy DSELing.

Shared Memory based C++ programming

Posted in boost by Umesh Sirsiwal on January 5, 2009

When it comes to shared memory based programming, we C++ users have traditionally found our selves at disadvantage. Most our beloved constructs like shared pointers, various containers, scoped locks, etc. don’t work very well when programming in C++. In addition, we cannot place objects which use inheritance in shared memory since the virtual pointers cannot work in shared memory. These limitation has implied that the shared memory is rarely used in C++ programming. Whenever it is used, the object placed in the shared memory are more like C constructs rather then C++ constructs. This is pity since using shared memory provides significant performance advantage in certain conditions.

Boost 1.35 includes Interproces which provides C++ based programming environment for shared memory. The facilities provided are:

  • Shared memory based pool of memory and capability to new C++ objects from this pool
  • Mutexes placed in the shared memory
  • Various smart pointers
  • Various STL-like containers which can be placed in shared memory. These containers include maps, deque, set, list, flat_map, flat_set, slist, and string

In addition various boost containers can be placed in shared memory. These containers include the multi-index container.

Hopefully these constructs make placing objects in shared memory a little easier for you all. Please note that placing data in shared memory has traditionally been dangerous due to complexity in debugging crashes, not easily understood deadlocks, etc. Please use shared memory based containers carefully.