Thursday, July 3, 2014

Cleaning up the code

"All the most important mistakes on a project are made on the first day"
--Old design adage

"There is no teacher but the enemy. No one but the enemy will tell you what the enemy is going to do. No one but the enemy will ever teach you how to destroy and conquer. Only the enemy shows you where you are weak. Only the enemy tells you where he is strong. And the only rules of the game are what you can do to him and what you can stop him from doing to you."
-- Mazer Rackham

"Reinhardt regarded the mysteries of the Universe not as indifferent questions of physics or chemistry, but as implacable, malicious foes. They were to be assaulted with science, vanquished at any cost, forced to yield their treasure house of knowledge."
-- from The Black Hole by Alan Dean Foster

In any interesting problem, you don't know enough about the problem to begin with to even form a plan for solving it. You do the best you can, and in the process of solving the problem, you learn what you should have been doing from the beginning. If time and resources weren't an obstacle, you could go back and solve the problem right from the beginning. But, usually for any interesting problem, time and resources are an obstacle, and you are stuck with what you started with. At each point in solving the problem, you only change what came before to the absolute minimum degree necessary.

As a result, sometimes legacy design choices make the solution quite a bit more complicated than a clean solution would be.

Now I have time and resources -- a whole year of time. It's time to clean house.

Once upon a time, the code was purely integer math. A data recorder only needs integers. All measurements are converted from analog to digital at some point, and all ADCs I work with report their results as either unsigned integers or signed twos-complement integers of various bit lengths. Also, the ARM7TDMI-S is capable of being interfaced to a floating-point processor, but the LPC2148 does not take this option, and I am stuck with that. Any floating point math would have to be done in software. Since the floating-point library has a long reputation of being slow, I chose to put this off as long as possible.

GPS is the one exception, but in that case, it reports results as ASCII strings in NMEA format. In particular, latitude and longitude are reported as degrees and decimal minutes, like this: DDMM.MMMMM... So for the example, if the AVC track was at 40deg 01.23456 minutes, it reports 4001.23456. My choice at this point was to keep things in integer. So, figuring that longitude would always be in the range -180 to +180deg, I scaled it such that -180 was -1.8 billion and +180 was +1.8 billion, fitting comfortably in the range of a 32-bit signed integer. The scaling factor is 10^7, and the resulting DN size is just over 1cm, plenty of accuracy for all of my intended purposes. I never came up with a good name for this DN size, but right now I am thinking something like pseudo-cm, symbol cm' . Other measurements had in effect a home-spun floating point format, where the number part was done by just ramming together the part before and after the decimal point, and a separate scaling factor was tracked to keep track of the power of 10 to scale the number. So an altitude reported as 1520m would be number 1520, scale 0. The same altitude reported as 1520.000m would be number 1520000, scale 3. This is reinventing the floating point scheme, in a particularly inefficient way, but that's ok. My code never actually computed in this format, it just recorded.

With the robot, navigation, guidance, and control problems are all in the continuous domain, not the discrete domain of integers. Even sensor measurements above are all continuous, but as a logger, the Loginator didn't have to care. It just wrote down the bits that the sensors gave it. A robot controller doesn't have that choice. It must interpret the sensors. Now the choice is, do I use the floating point library that comes with gnuarm, or do I spin my own floating- or fixed-point solution? The more I thought about that, the less I thought I could do a better job than those who have been working on the problem for years. I chose to use the floating-point library.

Now we have a conflict with previous design choices. The natural way to consider longitude in floating point is of course purely as degrees, with the floating point taking care of the fractional part. However, everything done before reported cm'. I had a routine laboriously computing number parts and scaling factors, then another routine which cast the number part to float and then repeatedly divided by 10 to handle scaling.

The next version of the code will abandon the number/scaling format, in favor of standard IEEE-754 floating-point. It will report these in the packets, which means new versions of all packets which used number/scaling.

Also, I found this clever hack in completely unrelated research on the IEEE-754 format. The guidance routine needs atan2(), a transcendental trigonometry function, in order to calculate the desired heading from the current position and waypoint. It needs sqrt() to calculate the steer-to point beyond the waypoint. But it doesn't need seven significant figures for either. Sometimes good enough is good enough, and it only needs to be good enough to calculate the correct steering direction and rough magnitude. With these, any real-world errors will shrink.

I was digging through the math library, which happens to be newlib 1.20, compiled along with the GCC/G++ cross-compiler. The atan2() function does a lot of argument checking and scaling, but boils down to a Taylor series with 10 terms, so about 10 multplies and adds. Sqrt() has the following interesting comment in its header:

/* __ieee754_sqrt(x)
 * Return correctly rounded sqrt.
 *           ------------------------------------------
 *           |  Use the hardware sqrt if you have one |
 *           ------------------------------------------
 * Method: 
 *   Bit by bit method using integer arithmetic. (Slow, but portable) 

That doesn't inspire a lot of confidence. Anyway, as I said above, while poking through Wikipedia about IEEE-754, I came across this article on Fast inverse square root. I won't attempt to explain it here, as its algorithm, coding, and authorship are all complicated questions, some of which haven't been fully answered even yet. It boils down to:

union floatint {
  float f;
  int32_t i;

//From Wikipedia:Fast inverse square root. Must be float, not fp. Edited
//to use correct types and avoid strict-aliasing warning on conversion from float to int
static inline float Q_rsqrt( float number ) {
        floatint i;
        float x2, y;
        const float threehalfs = 1.5F;
        x2 = number * 0.5F;
        i.f  = number;        // evil floating point bit level hacking
        i.i  = 0x5f3759df - ( i.i >> 1 );               // what the...?
        y  = i.f;
        y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
        return y;

The highlighted line is the interesting part. It calculates the initial guess to a Newton's method approximation by treating the float as an int, and subtracting it from a cleverly constructed constant. In gross effect, this is treating the floating point representation as an approximation to the log2 of the represented value, which it is in a sense. The most significant bits are the exponent. There are tons of details and nuances, the most interesting one of which is what happens to the bit of the exponent which shifts into the mantissa. But like I said, lots of other people have explained this better than me.

The point is that this approximation is good to within 0.2% (always an underestimate, though) which is good enough for lots of things. The original purpose was calculating unit-length normals for surfaces for lighting calculations. As it turns out, this is what I need in the guidance code as well. Reciprocal square root is more useful than direct.

Also as it turns out, this code isn't run very much for me. In its original home, this code was run once per triangle, or perhaps once per pixel, in a graphics engine. For me, the code is run once per waypoint change. But, it does improve the size of the code image by quite a bit, maybe a couple of kilobytes. Pulling in sqrtf() is pulling in quite a bit of double-precision logic as well, just for the case when sqrt throws an exception, and just because the exception reporting mechanism holds onto the arguments' values in double precision.

No comments:

Post a Comment