How is My Signal?
I have my first Android App “How is My Signal?” on the market now. Now a days I am busy developing and blogging about it. My blog on the subject is at http://blogs.techievarta.com/howismysignal/. Happy reading.
I need to move back…
Update: Done. I have moved back.
I physically moved to Shrewsbury,MA. I was forced to use Town ISP (town run ISP). It turns out they block port 80. Techievarta.com, which used to be hosted @ my home is now down. I am trying to find a way to move my existing posts to wordpress.com.
Stay tuned.
Boost.Spirit: Semantic Actions
Note that this post applies to Spirit.Classic or 2.0.
In the first two posts we introduced basic parsing techniques defined by Spirit. The parsing is only useful, if we can do something with parsing results. In Spirit you achieve this with semantic actions.
Semantic actions are expected to use functional programming paradigms. The most basic semantic action has the following prototype:
void f(IteratorT first, IteratorT last);
or as functor
struct my_functor
{
void operator()(IteratorT first, IteratorT last) const;
};
Here is an example of simple action from Spirit user guide:
void
my_action(char const* first, char const* last)
{
std::string str(first, last);
std::cout << str << std::endl;
}
Applying the action is rather simple. You specify them in [] after the rule. From previous e-mail address parser, you can write:
r = *(mailTo | anychar_p); mailTo = "mailTo:" >> emailAddress[&my_action]; emailAddress =
lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
my_action will be called with iterator pointing to start and end of the parsed e-mail address. The above action will result in printing all the e-mail addresses in the input.
In a lot of cases, it is wasteful to call actions with the Iterators pointing to first and last. After all, Spirit has just parsed the contents. For this purpose the boost defines specialized actions. For example
void func(NumT val);
or equivalent functor
struct fctr
{
void operator()(NumT val) const;
};
can be applied to any numeric parsers (real_p, ureal_p, int_p, uint_p). Similar actions exist for other other types of parsers. Please check Spirit guide for details.
The complete program looks like:
#include <boost/spirit/core.hpp>
#include <iostream>
using namespace boost::spirit;
void
my_action(const char* first, const char* last)
{
std::string str(first, last);
std::cout << str << std::endl;
}
struct my_grammar : public grammar<my_grammar>
{
template <typename ScannerT>
struct definition
{
rule<ScannerT> r, mailTo, emailAddress;
definition(my_grammar const& self) {
r = *(mailTo | anychar_p);
mailTo = “mailTo:” >> emailAddress[&my_action];
emailAddress = lexeme_d[ +alnum_p >> ‘@’ >> +alnum_p >> *(‘.’ >> +alnum_p)];
}
rule<ScannerT> const& start() const { return r; }
};
};
int main(){
const char* str = “mailTo:a@b.com test mailTo:d@f.com”;
my_grammar g;
if (parse(str, str + strlen(str), g, space_p).full)
std::cout << “parsing succeeded\n”;
else
std:: cout << “parsing failed\n”;
return 0;
}
In addition, there is a large selection of pre-defined actions. You an find them here.
Functional Programming & Boost.Lambda Programming
I will like to take a break from Boost.Spirit related topic to talk about Functional Programming. Understanding of Functional Programming is essential for understanding how Spirit related actions are implemented. Typical object oriented programming paradigm combines mutable data and set of algorithms which operates on data. In contrast FP avoids mutable data or state and emphasizes on application of functions. From Wikipedia:
In practice, the difference between a mathematical function and the notion of a “function” used in imperative programming is that imperative functions can have side effects, changing the value of already calculated computations. Because of this they lack referential transparency, i.e. the same language expression can result in different values at different times depending on the state of the executing program. Conversely, in functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) both times. Eliminating side-effects can make it much easier to understand and predict the behavior of a program, which is one of the key motivations for the development of functional programming.
STL provides set of algorithms which can be viewed as FP. For example std::copy will always copy source to destination.
The next concept in FP is called currying, technique in which a function is applied to its arguments one at a time, with each application returning a new function that accepts the next argument. The STL function bind1st, bind2nd and binary_compose can be used for currying
All funcitonal programming languages are based on Lambda Calculus introduced in 1930s which is based on anonymous functions and currying. You can read more on it here.
Boost.Lambda and Boost.Phoenix introduce formal system of funcitonal programming and lambda expressions to C++. I typically use Boost.Lambda. I only use Phoenix when using Spirit since they are more closely integrated. They share a lot of similar concept. This post will discuss Boost.Lambda. The next post will use these concepts to introduce Phonix as applied to Boost.Spirit.
Let us start with a simple lambda expression _1 = 1 is definition of an anonymous function which takes one argument and sets it to 1. So to initialize a variable to 1 you could write:
(_1 + 1)(i);
or if you wanted to initialize all variables in a container to 1, you would write something like:
std::vector v;
std::for_each(v.begin(), v.end(), _1=1);
or to print everyobject in a container:
std::for_each(v.begin(), v.end(), std::cout << _1);
What if you wanted to define a function with calls a function with one argument and assigns results to the same argument:
That function will be _1 = bind(foo, _1);
Bind is generalized version of std::bind1st and bind2nd. With the above expression in place one can write:
(_1 = bind(foo, _1))(i);
Which will be equivalent to:
i = foo(i);
So with for_each one can write:
std::for_each(v.begin(), v.end(); _1 = bind(foo, _1));
_1 in the above examples are called placeholders which are equlvalent of lambda in the lambda expression. The above can be generalized with larger number of placeholders. For example one can write: (_1 + _2) to create a function which adds its two arguments. Boost Lambda defines up to 3 place holders. Higher order functions can be constructed using currying.
A function definition which used _2 takes 2 arguments and one which uses _3 takes 3 arguments.
For example _3 = 1, takes 3 argument and
(_3 = 1)(k) will have compile time errors.
(_3 = 1)(i,j,k) is good and is equivalent to k = 1.
Above, we are using simple expressions to create anonymous functions which do not have side effects and depend only on their arguments. As alluded earlier BLL also provides bind expression as generalization of bind1st and bind2nd. With BLL bind expression it is possible to bind any method with up to 9 arguments for delayed executions. It can target C++ class members, or simple functions as target. The syntax used is similar to bind1st and bind2nd.
Boost.Spirit : Complete Parser Design
Note that this post applies to Spirit.Classic or 2.0.
In the last post we discussed how to write a simple parser using Spirit. Most real life parsers are a lot more complex requiring several rules and combining them to create a full grammar. Spirit provides support to declare a full grammar. Let us create a parser which looks for all mailto: tag with associated e-mail address in an input. All Spirit grammar are derived from grammar class. The following is definition of grammar:
struct my_grammar : public grammar<my_grammar>
{
template <typename ScannerT>
struct definition
{
rule<ScannerT> r;
definition(my_grammar const& self) { r = /*..define here..*/; }
rule<ScannerT> const& start() const { return r; }
};
};
The my_grammar is derived from grammar class using curiously recurring template pattern (CRTP). For those who are new to CRTP, it was introduced by Jim Copelien and Wikipedia has some details on it. Using CRTP it is possible to achieve effect of dynamic polymorphism without taking cost of virtual function. More on CRTP another day.
Each grammar class must define another nested structure called definition. The definition must have a function called start which returns starting parser rule. So let us start writing rules for e-mail address parser. The mailto rule can be written as:
mailTo = "mailto:" >> emailAddress;
From our previous post, emailAddress can be written as:
emailAddress = lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
Of course the full input has things other than the mailto: tags. So we must skip other characters. You do that as follows:
r = mailTo | anychar_p;
The above rule is saying that try matching the input with mailTo and if it does not match mailTo rule, match it against any character parser.
So combining these two the grammar constructor now can be written as:
definition(my_grammar const& self) { r = mailTo | anychar_p; mailTo = "mailTo:" >> emailAddress; emailAddress =
lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
}
First thing one notices is that the rule “r” refers to rules “emailAddress” and “mailTo” before they are initialized. It works because the rules are held by reference and not by value. The referred rule can be initialized anytime. This does complicate programming a little bit. It means that it is user’s responsibility to make sure that the referred rules never go out of scope. So emailAddress cannot be local to definition. Typically, one declares all rules as part of the definition call definition. The full grammar now becomes:
struct my_grammar : public grammar<my_grammar> { template <typename ScannerT> struct definition { rule<ScannerT> r;
rule<ScannerT> emailAddress mailTo;
definition(my_grammar const& self) { r = mailTo | anychar_p; mailTo = "mailTo:" >> emailAddress; emailAddress =
lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p)];
}
rule<ScannerT> const& start() const { return r; } }; };
OK! now we have the grammar. Let us use it.
my_grammar g;
if (parse(first, last, g, space_p).full)
cout << "parsing succeeded\n";
else
cout << "parsing failed\n";
Here first and last are iterators pointing to first and the last characters in the input.
Note that the above grammar matches exactly one e-mail address or any other character. This can be easily be modified to match all e-mail addresses.
The new rule will be:
r = *(mailTo | anychar_p);
Boost.Spirit – Easy to use Parser
Note that this post applies to Spirit.Classic or 2.0.
I recently had write a parser in C++ and decided to give Boost.Spirit a chance. I was delighted with ease of use of the parser itself. It was a steep learning curve to get started. However, once I got started, it made life significantly simple.
Boost.Spirit provides a very simple way to create a parser using EBNF grammar. Let us consider an e-mail address parser as an example.
+alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);
Simple! Let us understand the above statement. In Boost.Spirit rule<> template defines basic grammar rule. A rule is made up of parsers and other rules. In Boost.Spirit built-in parser all have _p as suffix. The alnum_p parser matches any character or number. ‘+’ represents operator which matches one more more instances of the enclosed parser.
‘>>’ is an overloaded operator which simply says followed by.’@’ is short form of parser which matches character ‘@’.
So the above statement says any alphanumeric string followed by ‘@’ followed by any other alpha numeric string. This can than be followed by any number of “.<alphanumber>”.
By default the >> includes white space skipping. This implies the above parser will match (a@b as well as a @ b).
We don’t want white space skipping for e-mail address. So we need to modify the above rule such that >> does not skip white space. For this and other purposes, the Spirit allows directives. These are special instructions which can be applied to a part of the rule. No white space skipping is achieved by using lexeme_d.
lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);];
I have no clue why the name. But the suffix is _d for all directives.OK! we got a rule. Now let us check if this rule matches a specified string:
bool isEmailAddress(const std::string& str){
return parse(str, lexeme_d[ +alnum_p >> '@' >> +alnum_p >> *('.' >> +alnum_p);], space_p).full;
}
The above statement will check if the supplied string is an e-mail address or not.
In the next post we will discuss how to create more complex examples as well as deal with parse actions.
C++ Concept Part 3
This is the last post in series of posts on C++0x concepts. In this we will look at C++0x concepts. The C++0x standard for concepts is almost identical to ConceptC++. The syntax are almost identical. Significant parts of ConceptC++ have become part of the draft standard.
One of the things we have not looked at the definition of concept map. In the previous example of sort, the template takes ForwardIterator as an argument. The article did not discuss what makes a forward iterator. Can we use an int* as a forard iterator? We have also said that ForwardIterator must has value_type argument. int* does not have a member value_type. But we want to pass int* as parameter to sort so that for example, we can sort an array of integers.
C++0x provides a way to solve this problem called concept_map. Using concept_map we can define what constitute ForwardIterator.
template ;
concept_map ForwardIterator {
// T*'s value_type is T
typedef T value_type;
};
Using above syntax we are saying that any T”*’s value_type is T. This removes need to wrap int* in another container.
In all C++0x concept provides solution for one of the biggest ongoing frustration with generic programming with C++. It will provide early and clear compilation errors for template users and authors both and bind them in template parameter contract.
Other C++0x sources on the web
C++0x Concepts Contd..
In previous article we had discussed BCCL. This post continues to discuss Concept with ConceptC++. The ConceptC++ was the playground for language level Concept ideas which eventually became part of currently proposed C++0x standard. In this post we take in-depth view to ConceptC++. These extensions were implemented by modifying GCC.
ConceptC++
Unlike the Boost Concept Check Library (BCCL), which is implemented purely using standard C++ constructs, these extensions are implemented by extending C++ language. By changing the C++ definition, it is possible to do things which was not possible otherwise. In particular, remember that with BCCL, it was not possible to make template parameter restriction part of template or function definition. With ConceptC++, this becomes now possible. In addition to enforcing restriction on instantiation of template, ConceptC++ also imposes restrictions on template implementation and makes sure that it only uses template parameter’s facility defined in the contract.
To understand ConceptC++, let us consider sum template which adds two parameters and returns the value:
template <typename T>
T sum(T a, T b){
return a+b;
}
With Concept C++ we can rewrite this to:
template <CopyConstructable T>
T sum(T a, T b){
return a+b;
}
This says T must be copy constructable. If we compile this template using ConceptC++ (even without any instantiation) we get the following error:
error: no match for 'operator+' in 'a+b'
That is because the template definition did not require T to implement operator+ but is trying to use it. This forces template signature to accurately define the template signature as:
template <CopyConstructable T> requires Addable<T>
T sum(T a, T b){
return a+b;
}
In other words ConceptC++ puts restrictions on both template instantiation and template definition and binds them in a contract with respect to template parameter. This is what function, class or template definition, etc. are supposed to be.
We have been using magic words like CopyConstructable, Addable, etc. What are they? Are they new language features or are provided by library? What if we wanted to implement our own custom concepts? How will we go about that.
The C++ language addition provides capability to define new concepts while the library provides set of standard concepts. It is rather easy to build your own concept. For example let us look at definition of Addable concept:
auto concept Addable<typename T> {
T operator+(T x, T y);
};
How simple can that be? The above definition is saying that the Addable concept implies that the template parameter must implement operator+ which takes two parameters of type T and returns T.
With this background, let us modify out example sort template. In case of sort, the template parameters are iterators and value pointed to by the iterator must implement less than operator. With ConceptC++ you can specify these restrictions as follows:
#include <concepts>
template<std::ForwardIterator T>
requires LessThanComparable<T::value_type>
void sort(T b, T e)
{
....
}
As you can see ConceptC++ provides natural way of definining restrictions on template parameters allowing compilers to perform early error checking.
Performance
The ConceptC++ implements its functionality by reducing compile speed. However, just like BCCL the ConceptC++ does not incur any run time overhead.
leave a comment