præclarum

Search:

Frank A. Krueger
Seattle, WA, United States

Recent Posts

Using SQLite as an On-disk File Format, Part 2
Using SQLite as an On-disk File Format, Part 1
Travel Guide to India
Time Delays in IO

Simplifying the Interview: Solving Permutation Problems

Tuesday, April 04, 2006 Link

I’m so tired of first-round interview questions. They always seem to boil down to “play with pointers” (linked lists, strings, trees, etc.) or “play with permutations or combinations”. The “play with permutations” questions are always the most annoying for me to answer, mostly because the solutions involve a bit of cookie cutter code that I know exists but have never taken the time to fully develop. Because of my laziness in defining a rigorous approach to solving these types of problems I am forced to invent a custom solution each time.

Well no more. Here is a cookie-cutter approach to solving permutation problems. Once you have read through this article, you will be able to tackle any of these problems with ease.

I will not just put the cookie-cutter code in front of you, but will instead build up to it through a series of examples. We’ll start with a very simple question and build up from it to harder and harder questions. The method developed in the earlier (easier) problems is shown to be of such value that the harder problems become trivial.

Once that is accomplished, I play around with C++ templates to create a function that directly implements the ideas of this article in such a manner that it can be plugged in to most situations.

With all that said, let’s move on to the first problem.

Generate all values for the integer n where 0 <= n < m

We can solve this problem most simply with a for loop:

void func1(int m)
{
for (int n = 0; n < m; n++) {
cout << n << endl;
}
}

That solution is obviously correct but does not serve well as cookie cutter code when we move on to more complex problems. Let’s think about that for loop for a second; it contains three sections: initialization, a check (predicate) for continuation, and some mutation (change) of the value n to progress on to new values. We could in fact rewrite the function in parts (factored):

int init()
{
return 0;
}

bool cont(int n, int m)
{
return n < m;
}

void next(int &n)
{
n++;
}

void func1a(int m)
{
int n = init();

while (cont(n, m)) {
cout << n << endl;

//
// Increment
//
next(n);
}
}

That sure is a lot more code, but it works identically to the first function.

First you’ll notice that I defined three functions: init(ialize), cont(inue), and next. We use these functions instead of their inline forms (= 0, <= m, n++) in order to force ourselves to recognize their roles. Because of their important role in these permutation problems, I call them the kernel functions. You may also recognize their similarity to the iterators of C++ and enumerators of .NET.

Notice also that we switched from a for loop to a while loop. That change wasn’t explicitly necessary for the refactoring, but made the structure of this solution match the structure of the solutions that will be presented later on.

Did we gain much over func1? No. However, we will soon see that the concept of the three functions init, cont, and next, and the layout of the code will make solving the harder problems much easier.

Generate all pairs n x n where n is an integer such that 0 <= n < m

Using the little kernel functions we developed in the previous section, we can easily solve this problem:

void func2(int m)
{
int n0 = init();
int n1 = init();

while (cont(n1, m)) {
cout << n0 << " x " << n1 << endl;

//
// Increment n0. If it reaches the end of its range,
// reset it to 0 and increment n1.
//
next(n0);
if (!cont(n0, m)) {
n0 = init();
next(n1);
}
}
}

The trick to understanding this code is to recognize these three points:

When permuting more than one object, establish some order to things by declaring that one object will increment faster or slower than all the other objects. In this case, I decided to have n0 increment faster than n1.

The permutation is complete when you are no longer able to continue incrementing the slowest object.

When the fastest object can no longer increment (cont(obj) is false), then the object slower than that one must be incremented. If that one cannot be incremented, then the next slowest from that one must be incremented. And so on.

To see these rules in action, let’s move on to the next example:

Generate all triples n x n x n where n is an integer such that 0 <= n < m

Based upon our experience in the previous section, we might go about implementing this one as:

void func3(int m)
{
int n0 = init();
int n1 = init();
int n2 = init();

while (cont(n2, m)) {
cout << n0 << " x " << n1 << " x " << n2 << endl;

//
// Increment
//
next(n0);
if (!cont(n0, m)) {
n0 = init();
next(n1);
if (!cont(n1, m)) {
n1 = init();
next(n2);
}
}
}
}

This code works (thanks to our rules) but the increment section of the code is already getting a bit unweidly. Just for a minute imagine what it would look like if we were dealing with four integers: we would keep nesting if statement after if statement. How redundant!

I hate redundancy in code, especially when it makes that code harder to understand. So let’s take a few minutes and think of a solution to simplify this whole mess.

What are we doing in that increment section? Well, we’re incrementing some value (the fastest object) and then checking whether that value has reached the end of its range. If it has not, then we do not need to do anything. If it has, then we need to reset it and increment the next fastest value. If that value has reached its end, then we need to reset it and increment the next fastest. And on and on.

I hope you made it through that last paragraph. If you didn’t have a cup of coffee and come back, because it’s important.

Good. Ok, so we have just described the generic algorithm that comprises all of the increment sections of all of the examples thus far. If we can code that algorithm, then our code should become simpler, less buggy (due to the code duplication removal), and hopefully easier to understand.

The algorithm states that we should increment the fastest object. That object will then either be able to continue or will have reached its end. If it is still able to continue, then we don’t need to do anything further. Instead, if it has reached the end, then we need to increment the next fastest object.

That sounds pretty easy to implement! For ease of implementation let’s assume that the current permutation values are stuck together in an array and are sorted from fastest moving to slowest moving. We then want to implement this mutating function:

bool incr(vector<int> &values, int m)
{
size_t fastest = 0;

while (fastest < values.size()) {
next(values[fastest]);

if (cont(values[fastest], m)) {
return true;
}
else {
//
// Reinitialize this one, and move on to
// increment the next fastest.
//
values[fastest] = init();
fastest++;
}
}

return false;
}

This function increases values just as we described in the algorithm. First the fastest value is incremented. If it is able to continue, then there is nothing more to do. If it has reached the end, then we need to reset it and increment the next value. We know that this algorithm will terminate because in each iteration it will either explicitly terminate (by using return) or it will increment the local variable called fastest. Because the loop is guarantees termination once the local variable has gotten to a certain size, the entire loop is guaranteed to terminate.

You should note that whatever extra data that the cont function needs, the incr function will itself need (since it calls cont). In our case, this extra information is simply the integer m.

A very important aspect of this function is its return value. The function will return true if there are still more permutations of values remaining. It returns false if we have already witnessed all permutations.

With this very convenient return value, we can re-implement func3 in a much more straight-forward manner:

void func3a(int m)
{
vector<int> n(3, init());

do {
cout << n[0] << " x " << n[1] << " x " << n[2] << endl;
} while (incr(n, m));
}

Generate all 6-tuples n x n x n x n x n x n where n is an integer such that 0 <= n < m

Given all our hard work in the last section, this is actually quite a breeze!

void func6(int m)
{
vector<int> n(6, init());

do {
cout << n[0] << " x " << n[1] << " x " << n[2] << " x " <<
n[3] << " x " << n[4] << " x " << n[5] << endl;
} while (incr(n, m));
}

Generate all k-tuples n x n x … x n where n is an integer such that 0 <= n < m

This is just as easy:

void funck(size_t k, int m)
{
vector<int> n(k, init());

do {
const char *head = "";
for (size_t i = 0; i < k; i++) {
cout << head << n[i];
head = " x ";
}
cout << endl;
} while (incr(n, m));
}

Here, we had to spend more thought coming up with a way to print the tuples rather than figuring out how to form then. I would say that we have mastered permuting integers at this point!

Generate all permutations of five letter words

As a final example, let’s consider a non-integer example. If our solution (or cookie-cutter code) is truly versatile then we should be able to implement this problem with little thought.

Let’s start with our kernel functions:

char init_letter()
{
return 'a';
}

bool cont_letter(char ch)
{
return (ch <= 'z');
}

void next_letter(char &ch)
{
ch++;
}

These functions are very simple and very reminiscent of their integer counterparts. Notice here however, that the cont functions requires no additional information (unlike its integer counterpart). This is because we were able to statically hard-code the terminating condition.

Lastly we need to implement the incr function:

bool incr_letter(vector<char> &values)
{
size_t fastest = 0;

while (fastest < values.size()) {
next_letter(values[fastest]);

if (cont_letter(values[fastest])) {
return true;
}
else {
//
// Reinitialize this one, and move on to
// increment the next fastest.
//
values[fastest] = init_letter();
fastest++;
}
}

return false;
}

We can see that this function is nearly identical to the incr for the integer examples. This should clue you in to a pending refactoring of the code (something that we will do in the next section).

With the incr function written, it is a trivial matter to write the solution:

void print_words()
{
vector<char> letters(5, init_letter());

do {
for (size_t i = 0; i < letters.size(); i++) {
cout << letters[i];
}
cout << endl;
} while (incr_letters(letters));
}

(Be careful running that program, there are a lot of 5-letter words!)

A generic incr

WARNING! Everything up until this point in the article addressed solving permutation problems. This last section is just riffing on that solution. Understanding this next bit is not necessary for understanding the rest of this article.

We noticed that the two incr functions in this article are essentially identical; they only differ by what functions they called. It is therefore beneficial to write a generic form of this function to reduce code duplication. Utilizing templates, we can write:

template<typename T, typename FInit, typename FCont, typename FNext >
bool incr_generic(vector<T> &values, FInit init, FCont cont, FNext next)
{
size_t fastest = 0;

while (fastest < values.size()) {
next(values[fastest]);

if (cont(values[fastest])) {
return true;
}
else {
//
// Reinitialize this one, and move on to
// increment the next fastest.
//
values[fastest] = init();
fastest++;
}
}

return false;
}

This generic form can be used just as easily as the custom form thanks to our kernel functions:

void print_words_generic()
{
vector<char> letters(2, init_letter());

do {
for (size_t i = 0; i < letters.size(); i++) {
cout << letters[i];
}
cout << endl;
} while (incr_generic(letters, init_letter, cont_letter, next_letter));
}

I have chosen to use templates for the function parameters such that we can use functors (objects that act like functions, closures) for any of the passed functions. This simplifies passing the extra data needed by the cont function.

To demonstrate this, let’s consider the k-tuple example once more. Using incr_generic, we would first attempt to write something like:

incr_generic(n, init, cont, next) 

in the while clause (look back at the funck function from before). However, this will not compile. The error lies in the cont function. The generic version of incr insists that the cont function only take one argument: the value to be checked. However, the cont that we passed requires two arguments: the value and the terminating integer.

To correct this, we will utilize an object that acts like a function and that stores away the terminal integer. Such an object can be defined simply as:

class Cont {
int m_;
public:
Cont(int m) : m_(m) { }
bool operator () (int n) { return n < m_; }
};

It is the operator member function that makes this object act like a function that takes just one value.

Now we can write the generic version of funck:

void funck_generic(size_t k, int m)
{
Cont cont(m);
vector<int> n(k, 0);

do {
const char *head = "";
for (size_t i = 0; i < k; i++) {
cout << head << n[i];
head = " x ";
}
cout << endl;
} while (incr_generic(n, init, cont, next));
}

That’s enough code for one day. Thanks for sticking in to the end!

Reader Comments

I moved this item from the old site.

Posted at 5:35 PM by Frank Krueger

Another comment to test the styling...

Posted at 5:53 PM by Frank Krueger

Could you also move the "Assimilated by Win32" article from the old site? I found it very useful and didn't get to finish going through it.

Posted at 2:14 AM by Anonymous

Post a Comment