Computational Complexity part 2
All you budding computer scientists our there are no doubt dying for me to resolve last months cliffhanger. As I stated last time, my goal with this series of posts is to describe the PCP Theorem in plain English. The PCP Theorem is a major result in the field of computational complexity, but to really understand what it says, some background is required.
Last post I explained that in computer science, we typically look for the fastest procedure, or “algorithm”, for solving a particular problem. Furthermore, the “complexity” of an algorithm is typically considered to be the number of steps it requires in relation to the length of its input. For example, you (and also a computer) can add two numbers with n digits using the approach you learned in elementary school. The number of steps required is proportional to n. Multiplying two n digit numbers, however, requires more steps (the standard approach uses about n2 steps).
The number of steps an algorithm takes is called its “running time”. When theoretical computer scientists study the running time of an algorithm, they aren’t concerned with the running time on a particular input, they are concerned with how to bound the running time in terms of n, the input length, for large inputs. This makes a lot of sense, since if I told you I’d discovered a way to multiply numbers very quickly, and told you that 318 x 702 = 223,236, you’d rightly suspect that I had just memorized that answer. A real test of my method would be to see how it performs on randomly chosen large inputs. When the running time of an algorithm is at most n raised to some power (for example, n7), it is said to run in “polynomial time” (back in high school, you probably learned that x raised to a power is called a polynomial).
Any problem that can be solved by an algorithm that runs in polynomial time is said to be in a class of problems that is very sensibly denoted P. An algorithm that takes n100 steps to run will take a very long time to run, so it is not true that any problem in P can be solved quickly. In practice however, many many useful problems in P can be solved quickly. Furthermore, problems not in P can be extremely hard to solve even for relatively small inputs. For example, if an algorithms running time is 2n, it does not run in polynomial time, and in fact there isn’t even enough time in the universe for it to run on arbitrary inputs of length 100 (if each step of the algorithm takes only a nanosecond, executing 2100 steps would still take far longer than the estimated current age of the age of the universe).
Over the last 40 years, diligent computer scientists have identified a host of practical problems that do not appear to be in P. Some of these problems, even though we do not think they can be solved in polynomial time, can still be checked in polynomial time. As an example, suppose you were asked to pack n objects in a box. You might need to try many arrangements of these objects before determining that they could, or could not, fit in the box. Notice, however, that if I told you they could fit in the box, I could prove this fact quite quickly by actually packing them in the box. Simply put, the box-packing task may be hard to solve, but a correct solution is easy to verify.
Box-packing makes for a nice example, but as explained in part one of this lengthy exposition, computers like numbers. Put another way, before a computer scientist can study a problem, it needs to be precisely specified using numbers (or at the very least, symbols). In the case of box-packing, each object can be described as a collection of points in space, and the end goal, fitting the objects in a box, can be described by a large number of mathematical relationships. These relationships, which assert that no point lies outside of the box, and no point from one object lies inside another object, hold true only when the objects are packed properly in the box. Clearly the use of numbers has made our nice example irritatingly complex!
To keep things simple, but still be precise, we can instead consider a related problem, which computer scientists call “subset-sum”. In subset-sum, you are given a list of n numbers and asked if there exists some subset of those numbers that adds up to a given target value. Similar to box-packing, this problem can be solved by trying all combinations of the numbers (of which there are 2n). A correct solution, however, can be verified quickly, since once a correct subset is revealed, adding it up takes very little time.
There are a whole host of problems that, like subset-sum and box-packing, appear hard to solve (i.e. not to be in P) but easy to verify (i.e. the task of verifying that a given solution is correct, is in P). Problems that can be verified (but not necessarily solved) in polynomial time belong to a class called NP (the “N” stands for nondeterministic, but that’s a whole other can of worms).
Any problem that can be solved in polynomial time can be verified in polynomial time (just re-solve the problem and make sure you get the same answer), so any problem in P is also in NP. The million dollar question (literally) is whether P = NP, meaning “Is every problem in NP also in P?” Computer scientists the world over strongly suspect that the answer is “no”, but no one has a proof, and in fact, no one even feels like they are anywhere close.
If someone were to prove that P is not equal to NP that person would necessarily prove that some problem in NP cannot be solved in polynomial time, but in fact, they would also prove something far stronger. Problems such as subset-sum and box-packing aren’t just in NP, they are “NP-complete“. This means that they can both be restated in terms of each other. For example, in polynomial time, one could take the irritatingly complex box-packing problem that we alluded to above and transform it into a subset-sum problem. This can be done not just for the box-packing problem, but for every single problem in NP. As a result, if subset-sum, or any other NP-complete problem can be solved in polynomial time, so can every problem in NP. This means that if P does not equal NP (as computer scientists suspect), no NP-complete problem can be solved in polynomial time.
The existence of NP-complete problems was first revealed in 1971. Soon after, many well-known problems were shown to be NP-complete. For example, the problem of finding the shortest route that visits a certain set of cities, called “traveling-salesperson”, is NP-complete. “Independent-set”, the problem of determining whether a party contains some set of n individuals such that no two have ever met, is as well.
It is fascinating a that broad class of well-known problems are all “equally hard”, meaning that either they are all in P, or none of them are. This is exactly the type of thing that gets theoretical computer scientists excited. Even though there is still no proof of whether or not P = NP, the search for a proof has led to further insights about the nature of solving problems. One of these, the PCP theorem, will be the focus of part three of this post.
A final note: Since we do not expect that P = NP, we do not have algorithms that solve NP-complete problems in polynomial time. In practice, however, certain inputs to certain problems can still be solved fairly quickly. For example, subset-sum is easy to solve when the numbers involved are only one digit long. Also, in the real world, when someone has an NP-complete problem that they need to solve, it is often possible find a solution that is “close” to correct. For example, even when many large numbers are involved, subset-sum can be solved in polynomial time if one is only required to find a subset that adds up to within 10 percent of the target value. None-the-less, it is still easy to generate many, many examples of NP-complete problems that we do not have a way to solve in any reasonable amount of time.
October 15th, 2008 at 8:50 pm
[…] public links >> npcomplete Computational Complexity part 2 Saved by soulrot on Tue 14-10-2008 Drivel for today Saved by kamiel on Mon 29-9-2008 Against […]