Thursday, May 5, 2016

Yet Another Episode in the Annals of Data Stewardship

Having learned my lesson from before, I did not set up my filesystem as one big raid0. I did a btrfs raid5 instead. When one of the disks finally did give out, it wasn't with the click of death I heard before, but with read errors. The btrfs degraded, and by mounting read-only in recovery mode, I was able to use the two good disks in order to get my data.

Or so I thought.

A word on the issue I was having. I was seeing "stale file handle" warnings, of the type you see when you are in a folder that is NFS mounted, after you lose connection. But, this wasn't an NFS system. I rebooted the system and it wouldn't come up, because the btrfs refused to mount. After manually mounting in degraded mode, many of the disk accesses reported errors in dmesg, about the generation of certain metadata being off of the expected value, often by hundreds or thousands of generations.

First, I decided that I had lost confidence in btrfs -- if it wasn't going to keep working in the presence of a disk failure, what was the point? I spent the next several days scraping data off of the btrfs and putting it wherever I could find a place for it - on the USB disks I have, on other computers, on the system disk, etc. I then replaced the bad disk and formatted them all as zfs - now possible since Ubuntu 16.04 includes a native zfs driver.

Finally, I started copying data back onto the zfs. All appeared to go well, until I tried to bring up the wiki. The LocalSettings.php file was completely blank - it had the expected value, but all bytes in the file were 0x01 . Hrm.

Turns out a lot of files were like this. Files I care about, like the database, the git repositories, etc. It seems like the newer the file is, the more likely it is to be damaged like this.

No problem, I've got backups. A raid5 is not a backup, so I had the most important data copied off onto several other systems.

Or so I thought.

My backup script runs on a cron every night, and had backed up the bad data and spread it all around over the good data.

Oops.

It isn't a total loss. I have all my code in a git repository on the big USB disk. I have an old backup (from December, I think) of all the data I considered important. I did lose a lot of video :( but I don't think I lost anything from Florida 5.

So I think.

Monday, April 4, 2016

Hearing but not Understanding

I just heard a conversation drift over the walls of my cube. I could identify the speakers, I could recognize their voices, but I couldn't understand it. It was as if I couldn't parse spoken English. What was actually happening was that, in between the noise level of the fan in my cube, the sound insulation in the cube partitions, and the low level of the conversation to begin with, I just couldn't make it out.

But then how was it that I was able to identify the voices and put names to them, when I couldn't parse them? It means that at some level, identifying voices is easier and more noise-resistant than picking words out.

Or it means that the spoken English section of my brain is broken. I have neither spoken nor heard anyone speak since then, a few minutes ago.

Thursday, March 31, 2016

Check PCLK measurement

Check if the user code measures PCLK properly. If it doesn't, then the baud rate calculation will be wrong. Since one of the symptoms that has been seen is that the RX light on the FT232 doodad flickers, but no characters appear in putty, it is possible that the baud rate isn't what we think it is.

Consider also calculating the baud rate registers and stuffing them manually. If this works, then it's the PCLK stuff that is broken.

Friday, March 11, 2016

The Secret to Success

  1. Pick something you like doing.
  2. Do it and do it and do it until you don't like doing it any more. This will always happen at some point.
  3. Keep doing it.
Following these steps don't guarantee success, but failing to follow them guarantees failure.

Tuesday, February 16, 2016

Cortex M4 FPU

For a while I was having trouble getting my part to print any FPU calculations. Finally it occurred to me that maybe the FPU has to be turned on, and that the ISP wasn't doing it since it didn't use it.

It turns out that you DO need to turn on the FPU:

4.6.6 Enabling the FPU
The FPU is disabled from reset. You must enable it before you can use any floating-point instructions. Example 4-1shows an example code sequence for enabling the FPU in both privileged and user modes. The processor must be in privileged mode to read from and write to the CPACR.
Example 4-1 Enabling the FPU
; CPACR is located at address 0xE000ED88
LDR.W R0, =0xE000ED88
; Read CPACR
LDR R1, [R0]
; Set bits 20-23 to enable CP10 and CP11 coprocessors
ORR R1, R1, #(0xF << 20)
; Write back the modified value to the CPACR
STR R1, [R0]; wait for store to complete
DSB
;reset pipeline now the FPU is enabled
ISB

In effect, the FPU is counted as coprocessor 10 and 11. Cortex-M doesn't fully support the concept of coprocessors, but it does in this context. We allow full unpriveleged access to coprocessors 10 and 11.

I don't know about waiting for the store to complete and resetting the pipeline. I just put some C++ code to do this long before the FPU is used, and let a bunch of other work instructions flush the pipeline.

Friday, February 12, 2016

Getting on board the LPC4078

A few things make the LPC4078 dramatically different from the LPC2148 that I am used to:

  1. This is a Cortex-M4, with a much different interrupt and reset vector table. Instead of a set of ldr pc,[pc,#24] instructions followed by addresses 24 bytes later, we have just a table of addresses. The first one is the value to put into the stack pointer upon reset (so there is no required stack setup code) and the second one is the value to put in PC on reset. Subsequent values include exception and interrupt handler addresses.
  2. As with the 2148, there is a bootstrap program. On reset, the bootstrap vector table is mapped to 0x00000000, rather than the more obvious solution of having the reset value of the vector table address register point at the bootstrap.
One of the ugly things about my old LPC2148 is that much of it was Not Invented Here. Since this is a hobby project, I can use as much of my time to do whatever I feel like, including reinvent wheels! So, I get to figure out how to start up a Cortex-M4 from scratch.

I'm not there yet.

I haven't gotten my code to run directly yet, so there is still a problem with my vector table. But, I can get my code to run with the help of the ISP, after some experimentation.

The ISP maps a total of 512 (0x200) bytes of memory as its vector table. This is room for 128 vectors, while the actual number of vectors it uses is only 7. I can tell because the reset vector points at what would be in slot 7. In any case, this code is mapped to address 0 at startup. When I was using the ISP to launch my code, I originally had only exactly enough memory reserved to hold my table, and my code started immediately after, at address 0xE4 as it happened. Well, that code was covered up by the bootstrap. After rearranging my program so that a full 512 bytes was allocated for the table, things worked much better.

Also, the ISP uses an autobaud feature when it sets up the serial port. You send a question mark, and it times the bits on that to set up thebaud rate registers. However, the ISP uses a feature I don't, and that is the fractional baud rate register. This is able to tune the baud rate at a relatively fine grain to between 1x and 2x the rate called for by the coarse baud rate registers. When the ISP is used to kick off my code, my code sets the coarse baud rate, but didn't touch the fractional baud rate register, which was left at about 1.5 . Therefore the part was programmed to talk at the wrong rate by 50%, out of tolerance for the serial port. The FT232 could tell that the part was talking, but couldn't understand any of it.

With those two things taken care of, I can now start my code with the ISP, from the very beginning of my code. Next step is to see what I am doing wrong such that my code doesn't start itself.

Tuesday, February 9, 2016

LPC4078 operational

I am continuing my project of doing robot stuff with a buddget of zero, plus the stuff on my bench. Well, the stuff on my bench includes a Loginator2368 purple board, and a box of LPC4078 Cortex-M4 microcontrollers. Those controllers happen to be pin-compatible with the LPC2368 the board was designed for. In this case that means that all the power pins have the same jobs and voltage levels (even if they have different names and perhaps different internal connections), all the special pins like reset are in the same place, and all the GPIO pins have the same numbers, and where the 2368 and 4078 have the same peripherals, they have compatible pin assignments.

One thing that the 4078 has that the 2148 doesn't is internal pullup resistors. I can take advantage of these to reduce the part count in several places.

Therefore I got out my soldering iron and finally attached the board and LPC4078. I was trying to figure out what was the minimum amount of components I could get away with, since I can't find my solder paste stencil for this board, and would have to manually solder everything. I decided to skip the power section, so no external battery or regulator. I skipped the USB section, the LEDs, and the voltage reference. I also skipped the crystal, since the part has an internal RC oscillator which is good enough for now. I even skipped all of the bypass caps.

The only part that it looked like I absolutely needed was the reset button, since my first read on the datasheet didn't show the reset pin as having a pullup. It turns out that it does, so, I didn't even need that. I did end up stealing a push button from another board and using it as a reset switch.

I did the normal SMD IC soldering thing. I carefully lined up the part on the board, then used a normal soldering iron and normal solder to glob all the pins on each side down, and to each other. I then used wick to clean up all the bridges. This naturally leaves the connections to the board intact. I then soldered on the reset button, more as a convenience than anything. (Next time, leave a reset terminal on the edge!) Also solder a shield 6-pin top-and-bottom connector onto the power and serial connections.

Then, hook it up to power. No smoke, the chip isn't hot, it passes the smoke test. But, I've seen this board not work before, with the LPC2368 that it was designed for. So, the real test is running the ISP.

?
Synchronized
Synchronized
OK
12000
OK 

It works!!! I had to guess at the frequency it wants, since I don't have an oscillator attached at all. The internal RC oscillator runs at 12MHz, so I put that in.

Now to massively restructure the code. A Cortex-M4 is quite a different beast than an ARM7TDMI. It has an FPU and built-in VIC instead of the VIC as a peripheral. It also doesn't use 32-bit ARM instructions, but (mostly) 16-bit Thumb2 instructions. The 4078 is also a much different part than the 2148, with a different set of registers in different places. But, probably 90% of the code can be in common, if we carefully separate the code into the part which is different and the part which is common. That's the next big adventure. Throw one switch in the Makefile and compile for a 4078 instead of 2148.