[...] create an array with commands and help texts.
Me too; I recommend this approach.
Apologies for the long post that follows; feel free to ignore. Some might find the ideas in it useful, though.
On embedded/microcontrollers, I use a buffer to store the string data, so that it is trivial to support command editing via backspace (ASCII 8, BS) and so on.
When a newline (NUL, LF, CR, or any combination) is received, I process the command, into something like
#include <stddef.h>
#include <stdint.h>
#ifndef MAX_ARGS
#define MAX_ARGS 16
#endif
unsigned char *arg_ptr[MAX_ARGS];
uint32_t arg_hlen[MAX_ARGS];
unsigned char args;
The
hlen is a combination of hash and length, with length in least significant bits, and the DJB2 xor hash variant in high bits:
#ifndef LEN_BITS
#define LEN_BITS 8
#endif
#define LEN_MASK ((1 << LEN_BITS) - 1)
uint32_t hlen(unsigned char *const ptr) {
// Empty strings yield zero
if (!ptr || !*ptr)
return 0;
size_t len = 0;
uint32_t result = 5381;
while (ptr[len])
result = (result * 33) ^ (uint32_t)(ptr[len++]);
return (result << LEN_BITS) | (len & LEN_MASK);
}
but they are calculated when the command buffer is tokenized. Each token will be terminated with NULs (
'\0') by replacing the whitespace with NULs.
For defining commands, I use a very similar structure as nctnico:
typedef __attribute__((aligned (sizeof (void *)))) struct {
const unsigned char *cmd;
uint32_t hlen;
int (*func)(int, unsigned char *, uint32_t *);
const unsigned char *help;
} command_definition;
and all command structures and their strings will be stored in Flash. The return value of
func() varies; sometimes I use
const unsigned char *, with the function either returning NULL for OK, or an error string. It also varies whether I pass the
arg_ as parameters to
func() or not. (On AVR, ARM, and x86_64 architectures I do prefer to pass as parameters, as they can pass up to five scalar parameters in registers.)
The reason for the separate typedef is that it allows the
aligned attribute to set the alignment exactly. The last member is a pointer, to ensure
sizeof (command definition) is a multiple of pointer size too.
Because I use ELF-based toolchains (gcc, clang), I often use a dedicated section via a preprocessor macro (and standard linker script section start and end symbols) to collect all command structures into a linear array:
#define DEFINE_COMMAND(_var, _cmd, _hlen, _func, _help) \
__attribute__((used, section ("cmds"))) \
static const command_definition _var = { \
.cmd = _cmd, \
.hlen = _hlen, \
.func = _func, \
.help = _help \
}
The linker script exposes a symbol
__start_cmds at the start of the combined array, and
__stop_cmds, so that you can use
extern const command_definition __start_cmds[];
extern const command_definition __stop_cmds[];
#define CMDS ((size_t)(__stop_cmds - __start_cmds))
#define CMD(i) (__start_cmds[i])
Thing is, the linker will combine all
DEFINE_COMMAND() statements, even in completely different source files (as long as they are all linked to the same binary), into a single array this way. If you have different configurations, where some source files are included or dropped from the binary, this makes it very easy to control whether commands related to those source files are available or not.
To find a matching command, you scan through the
CMDs, checking if the
hlen matches. If it does, you do a string compare. This way, it is very fast to find the actual command even if you have a few dozen of them.
A few years ago, I wrote
this RPN calculator example (at StackOverflow) to run in Linux as an example of how to use the ELF section mechanism. Basically, it implements a simple reverse Polish notation calculator, with operators (functions/commands) implemented in separate files. Just by selecting which files are linked in to the calculator, you select which operators are available.
GCC and Clang generate the
__start_section and
__stop_section symbols automatically. For other compilers, you need to edit the linker script to define the symbols.
My projects often have a set of variables that I want to modify. (Sometimes the interface can only modify these variables, in which case I don't have a command interface at all per se, just an interface that accepts
varname? to query a variable by name,
?varname to describe a variable, and
varname=value to set a variable, with whitespace around
= and
? ignored.)
These use the same basic logic, except with a different structure, definition macro, and section name:
typedef __attribute__((aligned (sizeof (void *)))) struct {
void *ref;
const unsigned char *name;
const unsigned char *help;
uint32_t hlen;
uint_fast16_t type;
void *limits;
} variable_definition;
#define DEFINE_VARIABLE(_refname, _var, _name, _help, _limits, _hlen, _type) \
__attribute__((used, section ("vars"))) \
static const variable_definition _refname = { \
.ref = &(_var), \
.name = _name, \
.help = _help, \
.limits = _limits, \
.hlen = _hlen, \
.type = _type \
}
extern const variable_definition __start_vars[];
extern const variable_definition __stop_vars[];
#define VARS ((size_t)(__stop_vars - __start_vars))
#define VAR(i) (__start_vars[i])
The
type member is basically an enum that dictates what the
ref pointer is cast to when accessing the variable. You can use any type you want for it, but it will take at least the same size as a pointer, because the entire structure needs to be aligned to pointer size.
The
limits member is either
NULL or a
type-dependent pointer to a structure defining the allowed range of values.
These typically end up used via three commands:
get,
set, and
help (or
describe). Sometimes it can be useful to let the user define a few variables of their own (of some fixed type); you'd probably create those with
let and delete with
del (or
unset).
You can then add support for arithmetic expressions using variables (and optionally functions), but I normally don't bother. It, too, starts by splitting the expression into lexical elements, but there are many ways to implement the parsing and evaluation of such expressions. In practice, you'd parse the expression into a stack of tokens, each token being either an operator (like
+,
-,
*) or a value (either a number, or a reference to a variable). RPN is easiest, because it is simply a stack of values and operators. The
shunting yard algorithm is quite simple for normal math notation, but there are many other
operator-precedence parsers you might use.