Byte ordering is a thing....

I’m upgrading the Air Quality sensor software so that I can send configuration messages over LoRa to the sensor. The thing I’m most interested in is the ability to tell the sensor the rate at which it should send readings. It would be useful to have a message sent every five minutes or so for testing, and then dial back the rate once the sensor has been installed. I’m using a tiny packet format to do this. The packet is three bytes long:

01 58 02

The first byte is the command number (there is only one command at the moment). The other two bytes are the number of seconds the sensor should wait between sending each reading. The value I want to send won’t fit into a single byte (a byte can represent an unsigned integer up to the value 255 and I want to have intervals longer than 255 seconds) so I’m using two bytes. The first byte is the “low” byte of the value I’m sending and the second byte is the “high” byte. In the tiny sample above the low byte is HEX 58 and the high byte is 2. This gives an interval of 2 * 256 + whatever 58 hex is in decimal. This boils down to an interval value of 600 seconds, or 10 minutes.

I wrote the software, tested it, installed it in all four of the sensors we are building and then set them off. I then sent the above command to the sensors to set 10 minute updates and then relaxed in the knowledge of a job well done.

And all my sensors stopped transmitting completely. Wah.

Turns out that I am an idiot. I’d done everything right except the last bit, when I’d sent the following command:

01 02 58

I’d got the MSB and LSB bytes the wrong way round. Rather than having 2*256 I now have HEX 58 times 256. A big number. A six hour or so number. So all my sensors were now not going to transmit for six hours.

Sending the new command wasn’t a problem, but because of the way that we are using LoRa the sensors only look for new commands when they send a reading. So I had to wait for six hours before the command would get picked up and acted on. This is something to bear in mind about LoRa. The remote node is not always listening for a message, it only turns on after it has sent something. This is one way they manage to get such impressive battery life for LoRa connected devices.

I managed to solve the problem by turning the sensors off and on again. They always transmit a packet when they start up so I was able to update the timing with more sensible values without having to wait six hours.

Refactoring Monitair

Fun fact. Users don’t care how their code is factored. They just want their programs to do what they want. All the time. Programmers have a different perspective. They want code that looks nice and is easy to maintain. This means that you often have to revisit decisions made at the start of the project, and change things to make them better.

Real world architects don’t have this luxury. If half way through the construction of a building they find a better structure there is no-way they can reconstruct it. But programmers can. But it is not without its risks. Ages ago I found a Haiku that I rather liked. I think it went something like this:

Your code was ugly
I refactored it for you
It no longer works

Having said that, I’ve just spent a day re-factoring the Monitair software. Because it was ugly. And it seems to still work. Which is nice.

Snaps now on GitHub

A couple of days ago I got an email from James. He’s been working through my “Begin to Code with C#” book and having fun learning C#, which is great to know. He’s even reached the point where he has spotted an improvement to my code. It occurred to me that what he really wants to do is to put the changes into the GitHub repository where the code was stored, so that everyone could benefit from it.

But, rather stupidly, I’ve never put Snaps on GitHub. Well, that’s easy to fix, and so you can now get the latest version of Snaps for download from here:

https://github.com/CrazyRobMiles/snaps

If you don’t know what GitHub is, then you’re missing out. GitHub is a way that you can manage data, whether its a bunch of code or that book you’re writing. GitHub holds the source and also allows you to make incremental changes, all the while keeping track of the differences so that at any point you can go back to a previous version. It also has fantastic facilities for group working, so that several people can work on a single large project and manage the changes that their different work items produce. Until recently you could only use GitHub for free if you were happy to make your creation public. To have a private GitHub repository cost you a monthly subscription. That’s changed now, users can create private GitHub repositories for free, which is awesome. Three words of advice: Get. Into. GitHub.

As for Snaps, that’s a set of language extensions that make it very easy to create Windows Universal Applications. I created it so that people can learn to program without being hit by a lot of extra stuff that you need to know to create modern applications. It has lots and lots of features, including a sprite engine for game creation. It also has all the sample code from the programming book built into it, so you can play with and modify the sample code very easily.

I’m going to message James and suggest that he propose his modifications and we can put them into place as part of the Snaps release. That will also give him something nice to put on his CV.

Dealing with the impossible

I’ve got a new saying that I quite like. It’s probably not original, but that’s true of a lot of stuff that I do.

If what you are observing is impossible it is either not happening, or not impossible.

I’ve found and fixed my impossible bug. Of course it wasn’t where I thought it was, because if it was, I’d have looked there and mended it. The fault wasn’t even in the code doing the job, it was in the code telling me what the system was doing.

I’d missed a break out of a C switch statement (a frighteningly easy thing to do in that language) so that my reporting code was printing out the wrong messages. Which led me to concoct all kinds of fanciful theories about memory corruption and race conditions - rather than simply checking that what I was being told was happening was actually happening.

The important thing about situations like these is that you change the way that you work to reduce the chances of them happening again. So, in future I’ll make sure that I test all the possible outputs of my diagnostic system.

Debugging with Rob

OK. Today I’m going to find out why my program is crashing. I’m going to do this in an “experimental” way. I’m going to put up a hypothesis (a theory about why something happens) and then try to test it. Before I do that though, perhaps I’d better describe the problem. I apologise if some of this explanation is a bit technical:

The program runs for a while and then stops.

The length of time the program runs varies, and I can’t tell what happens at the point of failure because it is inside a device. And I can’t take a look inside the program because those tools aren’t available to me.

So, let’s try my first hypothesis:

Hypothesis 1: “The program flashes the neopixel leds at the same time as it receives data from the air quality sensor. Perhaps the code that controls the lights is affecting the code that gets the data.

Test:

  • Increased speed of pixel update.

  • Loaded up the serial interface with data.

  • Increased rate of MQTT update to once per second

Results

Display update slowed right down (as we might expect) but no crash

On this basis I’ll conclude that this hypothesis is not valid. So let’s try another:

Hypothesis 2: Serial data reception is interfering with the WiFi transmission. When the device has got a complete reading from the air quality sensor it sends a message over WiFi to the server.

Test:

  • Turned off the Pixel updates

  • Loaded up the serial interface with data

  • Added code to confirm successful network message sent

Result:

When the serial data is being transmitted the data transfer is slower because the serial interface steals cycles from the processor. Eventually the transfer collapses and the Publish method starts to return false. Shortly after that the whole system falls over.

Further Test:

restored the pixels and loaded up the serial port

Result:

When the serial port is loaded the performance collapses before.

So, it is not a good idea to use the network connection while you are receiving data from the serial port.

This is because I’m using a software simulation of the hardware that normally receives serial data. This simulation has specific timing constraints that means that it needs to “lock out” other processes when it runs. And it seems that this is causing the problem. Under normal circumstances the node only sends a network message every six minutes or so, and the chances of interference are small. But when I’m doing “proper” testing - sending lots of messages and receiving loads of data - I notice the problem.

The solution is to re-work the code so that the two things don’t occur at the same time. Which means I get to write more code. Which I quite like.

You might like to know (if you’ve read this far) that the notes above actually came from my diary. I always write these things up each day. The idea is that if I get a similar problem in the future I’ll have something to go back to. If you don’t write a diary/log when you do this kind of thing you’re really missing out on a trick.

Pointers to Functions in C# and Python

After the description of pointers to functions in C it occurred to me that you might like to know how to do function pointers in other languages. Then again you might not, in which case just look at the picture above that I took in Singapore a while back.

Pointers to methods in C#

If you want to make a reference to a function in C# you need to create a delegate type which contains the reference. Instances of the delegate type can then be made to refer to methods. It’s a bit more long-winded than the C way of doing things, but delegates can also be used as the basis of some quite powerful publish and subscribe mechanisms which I might mention later.

class Program
{
delegate void SimpleDelegate();
static void aMethod()
{
Console.WriteLine("Hello from a method");
}
static void Main(string[] args)
{
SimpleDelegate myDelegateInstance = aMethod;
myDelegateInstance();
}
}

This is a tiny example of a program that uses a delegate to make a reference to a method. The delegate type that I’ve created is called SimpleDelegate. Variables of this type (I made one called myDelegateInstance) can be made to refer to a a method and then called.

If you compare this with the C syntax, it’s not that different. The good news is that before the program runs the compiler can make sure a program will never call a method incorrectly using a delegate. Because SimpleDelegate is declared as returning void (i.e. nothing) and accepting no parameters it is impossible to make instances of SimpleDelegate refer to a method that returns a different type or has a different set of parameters. If you worry about these kind of things, this is a good thing.

Pointers to Functions in Python

If you want to make a pointer to a function in Python you just do it:

def a_function():
print("Hello from a function")

function_ref = a_function
function_ref()

The variable function_ref is made to refer to the function a_function and the function_ref variable can then be called as a function. Later in the program function_ref might store an integer, or string, or a reference to an object of type Cheese. That’s how Python is. This means that a Python program can fail in ways that a C# program won’t, because the errors that cause the Python program to fail would be picked up by the C# compiler.

This nicely encapsulates the difference between the cultures of C# and Python. C# is great because it forces you to invest effort to make sure that the objects in your programs can only ever be fitted together correctly. Python takes everything on trust. It lets the programmer just get on with expressing the actions to be performed and assumes that things will turn out OK. If they don’t the program will fail at runtime.

I love the way that C# forces me to think about my code, and I love the way that Python just lets me get on with writing, without me having to worry about adding lots of extra syntax to express what I know I want. You should know about both ways of working.

Pointers to Functions in C

I’m re-writing my neopixel animation software for the new light. I’m using a C feature that is sometimes very useful. I’m using pointers to functions.

Note that this is quite strong magic. If you don’t use it properly your program will just stop with no errors or warnings. But then again, that’s pretty much how it is when you write Arduino code anyway, so feel free to have a go. You won’t break any hardware.

The reason why I’m using function pointers is that I want to make the program to select between different animations to control the lights. The update behaviour for each animation is a different function. For example, I’ve got two different displays, one which walks colours around the lights and one which just displays the text in a particular colour. That means I’ve got two update functions:

void update_coloured_message()
{
// update the coloured message
}

void update_text_message()
{
// update the text message
}

Each of these is a function that accepts no parameters and does not return a result. I can create a C variable that can refer to these methods as follows

void (*animation_update) ();

This rather convoluted syntax creates a variable called animation_update that can refer to methods that accept no parameters and don’t return a value. I can make this variable refer to one of my update methods by using a simple assignment:

animation_update = update_text_message;

Note that normally when you want to create a reference to a value in C or C++ you need to put an ampersand (&) in front of the thing you are getting a pointer to. In the case of function references you can leave this out. This is either a neat trick to make C programming easier or a nasty trick to make C programming more confusing. I’ll let you decide.

Anyhoo, once I’ve set animation_update to refer to a function I can call the pointer just as I would the function:

animation_update();

I’ve actually gone one more, in that I’m using function pointers in a structure. Go me. I’ve got a structure that describes an animation in terms of the function that sets up the animation, the function that updates the animation and the number of frames it should run for:

struct Animation_item
{
void(*animation_setup) ();
void(*animation_update)();
int duration;
};

I then create an array of these items that contain all the animations that I want to move between:

struct Animation_item animations[] = {
{ setup_coloured_message, update_coloured_message, 600 },
{setup_walking_colours, update_walking_colours, 600} };

Then just need to count around the array to move between animations.

animations[animation_number].animation_update();

This statement calls the animation update method for the animation that is selected by the value in animation_number. If I add more animations to the array my program just picks them up automatically with no need to change anything else.

Code like an idiot

Hmm. No matter how good you are at programming, developers will still put bugs in their programs. Including me. Lost a couple of hours today because of this little beauty.

if ((tickCount % lights[i].colourSpeed)!=0 || lights[i].colourSpeed==0)
    return;

This code has been working inside HullOS for a couple of years. No problems. However, when I tried to port the code onto a Wemos device it kept crashing with an invalid instruction message.

Wah. The idea is that the code will exit if is not time to update the colour animation (tick count is not an exact multiple of colourSpeed) or if no animation is being performed (colourSpeed is zero). 

It took me a while to figure out what was happening. Turns out this code is wrong, but under some circumstances it will work perfectly well. 

It's all to do with the % (modulus) operator. This gives the remainder that you get if you divide one number by another (or example 7 % 4 would be 3). If the modulus is zero that we have an exact multiple on our hands. So, every time tickCount reaches the next multiple of colourSpeed the program will perform an update. 

Snag is that I've been stupid, in that I use a colourSpeed value of 0 to mean that the light is not being updated. And the behaviour of a C++ program when you do x % 0 is undefined. In other words, any number modulus zero is a stupid question to ask.

To make things worse the fact that I'm doing this bad thing as one half of an or expression means that, depending on the whim of the compiler, the modulus calculation may not get evaluated when colourSpeed is zero. If the code produced by the compiler does the test for zero before the modulus my program works. If the tests are done the other way round the program explodes.

It's an easy fix, just break the single test into two tests so that I control the order they are performed:

if (lights[i].colourSpeed == 0)
    return;
if ((tickCount % lights[i].colourSpeed) != 0 )
    return;

Now we never use the value of 0 in a modulus expression.  

Oh well, just goes to show that you can learn stuff by doing stuff.....

C# Quick Question 2 resolution

Ha. Nobody seemed to know this. Or nobody cared....

Anyhoo, the question was:

"When would a value type be stored on the heap?" And no, the answer doesn't involve boxing.

A value type just holds a value. Not a reference to something. If I have an object managed by reference I have to be careful about removing it from memory, because I need to be sure that nothing in the program is referring to the object. This means that objects managed by reference are stored in a "heap" of objects. It is the job of the "garbage collector" to find objects that are not referred to by anything, and remove these. 

However, value types are easy to get rid of. There can't be any references to them. 

Think of a value type as a book in a library. We can take the book away and nobody can use it any more. That's sad. But we can be sure that nobody is using the book when we take it away. Think of a reference type as a web site. We can't just delete a web site because we don't know how many people out there have made references to the site. 

So, when do we need to put a value type on the heap, and treat it a bit like a reference type? One answer is when we "box" a value type to convert a value type into a into a reference type, but this this is not what I'm after. 

The answer is when a value type is used in a "closure". We see closures when we use lambda expressions. OK, so what's a lambda expression? Lambda expressions are a pure way of expressing the "something goes in, something happens and something comes out" part of behaviours.

The types of the elements and the result to be returned are inferred from the context in which the lambda expression is used. Instead of writing a method and creating a reference to the method, you can just create a lambda expression.

We get into lambda expressions shortly after we start trying to treat lumps of functional code as we would data. Consider the following method:

static int add(int a, int b)
{
    return a + b;
}

This method is called add, it takes two integers and returns their sum. Maybe, for some reason, I want to my program to select between adding, subtracting and some other mathematical operations that take in two integers and return an integer result. C# lets me create pointers to methods. They are called delegates and we would make such a delegate like this:

delegate int IntOperation(int a, int b);

A delegate is another type I can use in my program. So I can create a variable of type IntOperation. And I can make my delegate refer to a method. Watch carefully:

IntOperation thingToDo;

This statement has made a delegate called thingToDo that can refer to methods that accept two integers and return a single integer result. Now I can make this delegate refer to a method:

thingToDo = add;

Now, if I call thingToDo, the add method will run:

int result = thingToDo(1,10);

This would set the value result to 11, because thingToDo presently refers to the add method. With me so far? Good. We can create our methods "the hard way", or we can use a lambda function instead. Look at this statement:

thingTodo = (a, b) => { return a - b; };

Don't panic. It wants to be your friend. Let's take a careful look at what I've done. I've made thingToDo refer to a lump of code that will perform a subtract operation. The code that does the subtraction doesn't live anywhere and has no name.  It is a lambda expression .Sometimes called an "anonymous function". The sequence => is the lambda operator and it separates the "stuff goes in" from the "stuff comes out" parts of the expression. In this case the parameters a and b to in, and the result of a-b comes out. Because thingToDo accepts integers and returns integers, inside the lambda expression the types of a and b are integers. 

OK. We can put these anonymous lumps of code in our program and pass them around on the end of references, just like any other object. What about closures? Take a look at this strange code:

{
    int localValue = 99;
    thingTodo = (a, b) => { return a + b + localValue; };
}

This is stupid, useless code. The statement inside the lambda function is using the local variable localValue for some silly reason. This is quite legal, compiles fine, and gives the compiler a problem to solve. The problem is that the lifetime of localValue exceeds the block in which it is declared. Remember that local variables are usually stored on the stack and deleted when the program leaves the block of code in which the local variable is declared. However, thingToDo is still referring to a lambda expression that uses the value of localValue. So localValue can't die. It must live on outside the block in which it is declared.

The compiler solves this problem by putting the localValue variable on the heap, rather than the stack. The heap is where we keep stuff that has to stick around for a while. This extension of the life of a local variable is called a "closure". 

Short question. Long answer. Phew. 

C# Quick Question 1 resolution

This was the question:

Can I make some C# that compiles and contains these two statements? What type are d and x?

d = null;
d += x;

I wrote the question as a result of my surprise that you could do this with delegate types. In other words, you could add things to a null delegate.

However, it turns out that it works with other things too, including strings. It seems that a += overload (which is how we get the behaviour that allows us to use += to append strings and add handlers to delegates) is also smart enough to make a new object if it turns out that the original is null. Which makes very good sense I suppose. 

Using the StopWatch class properly

Most people don't really care about the speed that their programs run at. Unless they run too slow of course. 

However, if really do want to know timings, C# provides a rather useful StopWatch class in the System.Diagnostics namespace that you can use to measure the time it takes your program to do something. You use it like this:

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// Do something that takes a while
stopwatch.Stop();
Console.Error.WriteLine("It took: " + stopwatch.ElapsedMilliseconds);

It is especially useful if you're trying to re-write a program so that it uses parallel processing, and you want to find out how long things take to complete. Parallel programming is where your code makes use of all the processors in your computer, not just one of them. It should be faster. 

I'm writing some stuff about parallel features in C# at the moment. For a book that I might have mentioned. So I wrote two versions of the code and then discovered, to my dismay, that the one I'd carefully optimised actually seemed to run slower than the original. I know that for small data sets a parallel solution might not be worth the effort of setting up all the parallel gubbins, and I also know that if there are any shared variables that the parallel code ends up fighting over, this can impact on speed, but whatever I did, the parallel version always took at least as long as the original. Wah. Took me a little while to find the mistake. Here's my code:

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// Do single threaded version
stopwatch.Stop();
Console.Error.WriteLine("It took: " + stopwatch.ElapsedMilliseconds);
stopwatch.Start();
// Do parallel version
stopwatch.Stop();
Console.Error.WriteLine("It took: " + stopwatch.ElapsedMilliseconds);

The parallel version always, always takes longer than the serial version. Have you found the mistake yet?

Turns out that stopping and starting a stopwatch doesn't reset it. So the time for the parallel version is added onto the time for the original. And I'm an idiot. The method: 

stopwatch.Restart();

- resets the stopwatch and starts it, so that my second operation is timed correctly. I'm now getting sensible speedups, which is nice...

"In-band" error signalling is dangerous

I've been working on the Hull Pixelbot for what seems like ages (You probably think I've been blogging about it for roughly as long. Don't care. My blog.)

Anyhoo, today I learned the perils of "in band error signalling". The phrase "In-band" is a radio term. It means that control information is sent in the came channel as the data. My "in-band" errors worked like this:

int findVariablePos (char * name)
{
    int result;
     /* Stuff happens in here to find the variable  */
    return result; // return the offset of the variable */
}

The function findVariablePos returns the offset into the variable name table of the variable with the given name. In a program a variable is a named location that you can use to store data that is used as the program runs. If my program does this:

i = 99

- this is an attempt to store 99 in a variable called 'i' for no readily apparent reason. The thing running the program needs to have a way of finding out where in the computer the variable i is actually stored. That's the job of findVariablePos. Deep inside the program that actually runs the Hull Pixelbot script there is a statement like this:

int iPos = findVariablePos ("i")

The statement above (we're writing C by the way) would get me the position in the variable table of the variable with the name 'i'. "Aha", you say. "What if the program doesn't contain a variable called 'i'. ". Well, in that case the findVariablePos function returns the value -1 to indicate that the name was not found. This is kind of sensible, because you can't have anything at the position -1 in a table, negative numbers are meaningless in this context. All good. To make things clearer I even did this:

#define VARIABLE_NOT_FOUND -1

This gives meaning to the value, so that I can write tests that make sense:

if (iPOS==VARIABLE_NOT_FOUND)
{
    Serial.println("Variable not found");
}

All good. Works fine. Then I re-factor the code and add a bunch of new error codes. I then decide it would be nice to have a set of numbers that the user (and other programs) can use to make sense of error messages. And I make the following change:

#define VARIABLE_NOT_FOUND 2

This makes sense in the context of fiddling with my error numbers, but it means that if the program ever puts a variable in location 2 in the variable table, the program will completely fail to find it because every time it gets the offset value this will be regarded as meaning that no variable was found. 

Which is of course what happened. The problem in caused by a bad design decision (using the data value as a means of signalling errors) and then doing something without considering the consequences. The latest version of findVariablePos looks like this:

int findVariablePos (char * name, int * position)
{
    int result;
     /* Stuff happens in here to find the variable  */
    
    *position = result;
    return FOUND_OK;
}

The result of the call is returned via one channel (the result of function) and the position value is returned by the method setting the value of the second parameter, which is a pointer to the variable. The call is a bit more complicated:

int iPos;
if (findVariablePos ("i",  &iPos) == FOUND_OK)
{
    Serial.println("Found the variable");
}

However, it now doesn't matter what the error numbers are, and whether or not they clash with any valid variable positions. 

"In-band" error handling is great if your'e in a hurry and you're trying to keep the code simple and quick. But they also leave you open to problems further down the tracks.