Sunday, April 28, 2013

DirectTask and nested interrupts

One problem with the old rocketometer code is that the sensor readout and SD card writing code were in the same thread, meaning that when the SD card took its endless milliseconds to write the data, the sensors were not being read, leaving an irregular gap in the record. My brilliant idea was to read the sensors in a task at interrupt priority, effectively creating another thread. First effort was with the task manager I described below, which was a dismal failure.

For whatever reason, and perhaps the same reason (see below), the task was not able to read the sensors. I came up with a much simpler task manager with which I am getting incredible accuracy.

I call it the DirectTask manager. Its concept: Rather than using one match channel of the timer and a heap priority queue, we just use all three available match channels (there are four, but the zeroth channel resets the timer every 1 minute). This limits the flexibility enormously, but I only need two tasks. I set up one task to reschedule itself on a regular basis (5ms in my first test) and I use the other task to read the BMP sensor.

However, the sensor readout runs an I2C read, which itself is interrupt driven. The code does not currently support nested interrupts, which means all interrupts are delayed until the current one returns. The I2C state machine was interrupt drive, and its interrupts were getting delayed, including the one which makes the state machine stop waiting forever, so the state machine waited forever.

So, we put in an option to I2C to run without interrupts, instead using a busy loop to check the I2C interrupt status bit, then calling the same state machine driver. We weren't getting anything by being interrupt driven anyway, since we had to wait for the read to finish.

With that, the DirectTask manager worked fine. Maybe the heap task manager would have worked fine too, but this one is simpler.

It takes 641 microseconds to read the sensors. We could probably easily bump the read rate to 500Hz, and maybe to 1000Hz, but this doesn't count anything but reading the MPU and HighAcc sensors.

Wednesday, April 24, 2013

Flight Level 390 and autopilots

Started reading an interesting blog - Flight Level 390 by an airliner pilot who obviously wishes to remain anonymous.

One of the interesting things he pointed to was an incident where the ground crew turned off the pressurization system of a 737, failed to turn it back on, and the flight crew failed three times during the preflight check to turn it back on. As a result, the plane never pressurized on the way up on its flight from Cyprus to Athens, Greece. The oxygen masks in back dropped at 18000 feet, and no communications from the crew happened after the plane flew through 28000 feet. Presumably the crew lost consciousness at this point.

The interesting thing to me is what happened next. The autopilot leveled off at its flight plan altitude of 34000 feet, flew to Athens automatically, and then entered the holding pattern. The program did exactly what it was supposed to do at that point, wait for further instructions, which never came. After flying for almost 3 hours, the plane ran out of fuel and plummeted to the ground.

I find it amazing that the autopilot could be programmed to do that -- not that it is technically difficult, but that they bothered to do it. I always think of an autopilot as more of just a wing leveller and maybe a beam follower. I would have supposed that the plane would get to Athens or wherever the end of its beam was, then keep flying straight and level in the same direction.

If you can program an autopilot to do that, it's not that big of a jump to complete auto control of the whole flight, with the human crew there just to be able to think on their feet if there is trouble. And here then is the interesting part. In my previous essay on the interaction between pilots and automation, I advocated a system where each part of the system was assigned to the component that could do it best. In that case it was lunar landing, where the computer calculated the best course to the landing spot while the human used his vision and judgement to select that spot. In this case, we assign most of the actual flying to the autopilot and assign oversight and handling of emergencies to the human crew. But, here is where the human and autopilot system are anti-complimentary. The human crew will have very little to do during the 999,999 flights in 1,000,000 where nothing happens. A crew member might go his entire career without ever seeing a true emergency, although I have been on enough flights to know that a good chunk of them try to fly over, under, or around turbulence. Is the latter just a matter of dialing a different number into the altitude hold?

Maybe the autopilot could have been programmed to descend to 18000 feet if it detects a pressure loss (the masks dropping) and gets no command inputs for several minutes. Will this incident type ever happen again? Maybe, ever is a long time. It turns out that the system did have a pressurization alarm, which worked, but the alarm was the same sound as one which can only occur on the ground. The crew thought it was the alarm for the thing that can only happen on the ground, and therefore ignored it as a bad alarm.

The point is, you become good at what you practice, and lose ability at what you don't. If a crew is mind-numbed by 10,000 hours of babysitting an autopilot, what makes us think that the humans will actually be able to handle an emergency? Air France 447 seems to have run into just that sort of difficulty. They crew was so used to the autopilot and fly-by-wire handling things, that when those programs shut down, the human pilot didn't remember how to fly that high up, and stalled the plane for 35000 feet while believing that the fly-by-wire would keep the plane from stalling.

But here's the thing. When autopilots work, they work better than human pilots. They don't make the same kinds of mistakes as human pilots. And, even if the system was fully manual, being mind-numbed by holding the yoke in place for 10,000 hours is no better training for an emergency.

I look at how astronauts trained for the Apollo missions and I see how football teams practice. They spend at least ten times as much time practicing as they do in the event. I look at how airliners are flown and I see how baseball players and basketball players and basketball band musicians practice -- by playing the game. By performing. By actually doing it every single day.

The airlines are doing something right. They fly 10 million operations per year, and there have been no fatal airline crashes in the United States since 2009. Maybe this is just as good as it gets.

Wednesday, April 3, 2013

They took down NTRS

Back when NASA was testing the X-43, they said something that has stuck in my mind ever since. The X-43 is a scramjet test vehicle. The thing is boosted on a Pegasus to near mach 10, then the scramjet is fired for 10 seconds. Afterwards, the vehicle is not recovered. "The only thing we get back from this mission is data."

When you think about it that way, all the robotic space missions are like that. We just get back data. No artifacts, no samples, just bits. We then assemble those bits back into knowledge back here on the ground. When we talk about the $800M SDO mission, we are spending the $800M to get those bits, so they better be valuable. Furthermore, I paid for those bits so I should have access to them.

The NASA Technical Report Server is where the knowledge collected from those bits go. It is now closed.

They claim that it is due to an ITAR review, and that it will be reopened when that is complete. There are over a million papers on NTRS. That is never going to get the kind of review they are implying.

So: For all those mission that only send back data, the only evidence that we have that they even happened is now gone. We spent $XB a year on NASA, and what we paid for is now locked up. This is a bad precedent. Furthermore, the NASA library is now untrustworthy. We build libraries to hold knowledge, and if they close at random without notice, they fail to serve their purpose.

Similarly, I now don't think that the kernels for reconstructed entry of MSL will ever be released.

This is not a good way to start the day.