As the RP2040 is a dual-core 32-bit ARM Cortex M0+ with ample memory for small event queues, we can use 32-bit unsigned integers to hold the event descriptors in a reasonably sized queue. Obviously not just keypresses, but all sorts of events; even X-Y coordinate pairs for touch events on touchscreens.
To protect against concurrent access, the queues need a spin lock (that work across cores and interrupts). Since no queue function will access another queue, a single spin lock across all queues will suffice.
spin_lock_t *queue_lock;It must be initialized at runtime by allocating one of the very few available spinlocks via
queue_lock = spin_lock_init(spin_lock_claim_unused(true));The Pico SDK implementation not only blocks the other core, but also temporarily disables interrupts, so the push and pop functions can be used on the same queue on either core and from interrupt handlers.
To get some polymorphicity from plain C99 or later code (no templates), we can declare one or more queues as just uint32_t arrays with an extra element. The length must be between 4 and 2048, inclusive, and preferably even (or you'll waste four bytes). For example,
volatile uint32_t example_queue[2048];Each such queue is initialized at run time via
queue_init(example_queue, sizeof example_queue);which sets the first word to reflect an empty queue of that particular size.
Bits 31..22 of the first word is
m; the size of the queue is 2
m+1.
Bits 21..11 contain the index of the oldest element in the queue.
Bits 10..0 contain the number of elements in the queue.
static inline void queue_init(volatile uint32_t *const q, uint32_t bytes) {
// TODO: assert that bytes is between 12 and 8192, inclusive. If not a multiple of eight, some bytes are wasted.
q[0] = ((bytes - 8) & 0x1ff8) << 19;
}
/* Push a (nonzero) event to a queue.
* Returns event if successful, zero to indicate queue is full.
*/
static inline uint32_t queue_push(volatile uint32_t *const q, uint32_t ev) {
uint32_t saved = spin_lock_blocking(queue_lock);
const uint32_t key = q[0];
const uint32_t have = key & 0x7FF;
const uint32_t base = (key >> 11) & 0x7FF;
const uint32_t size = (key >> 21) | 0x001; // Always odd
if (have >= size) {
// Queue is full
ev = 0;
} else {
uint32_t i = base + have;
if (i >= size)
i -= size;
q[i + 1] = ev;
q[0] = key + 1; // In q[0]: have++;
}
spin_unlock(queue_lock, saved);
return ev;
}
/* Pop a (nonzero) event from the queue.
* Returns event if successful, zero to indicate the queue was empty.
*/
static inline uint32_t queue_pop(volatile uint32_t *const q) {
uint32_t saved = spin_lock_blocking(queue_lock);
const uint32_t key = q[0];
const uint32_t have = key & 0x7FF;
uint32_t base = (key >> 11) & 0x7FF;
const uint32_t size = (key >> 21) | 0x001; // Always odd
uint32_t ev;
if (have <= 0) {
// Queue is empty
ev = 0;
} else {
ev = q[++base];
if (base >= size) {
// In q[0]: have--, base = base - size, with base incremented by one
q[0] = key + 0x7FF - (size << 11);
} else {
// In q[0]: have--, with base incremented by one
q[0] = key + 0x7FF;
}
}
spin_unlock(queue_lock, saved);
return ev;
}
Above, the buffer is circular. The math applied to the first word is "tricky" (in the bad, not easily maintained sense), because instead of repacking it, we can apply the operations directly to it using a single addition or an addition and a subtraction. The simple idea is that if
val = ((size & 0x7FE) << 21) | ((base & 0x7FF) << 11) | (have & 0x7FF);incrementing
base and then repacking
val is the same as adding
(1<<11)=0x800 to
val. Similarly, incrementing
have is the same as incrementing
val.
size never changes.
Note that
size always being odd means we don't need to store its least significant bit, which allows us to store three 11-bit values in a single 32-bit value. This corresponds to a queue structure whose size is a multiple of eight bytes. The maximum queue length is then 2047 events.
On raw metal on single-core ARM Cortex-M,
queue_lock is not needed, and
uint32_t saved = spin_lock_blocking(queue_lock); is replaced by something like
uint32_t saved = __get_PRIMASK(); if (saved) __disable_irq();, and
spin_unlock(queue_lock, saved); is replaced with something like
__set_PRIMASK(saved); or
if (saved) __enable_irq();. The exact details vary a bit between M0, M4, M7, etc.
On AVRs,
queue_lock is not needed, and
uint32_t saved = spin_lock_blocking(queue_lock); is replaced with
unsigned char saved = SREG; cli(); and
spin_unlock(queue_lock, saved); is replaced with
SREG = saved;.
In general, on single-core microcontrollers
saved=spin_lock_blocking() needs to disable interrupts that may use queue functions, returning the previous interrupt enable/disable state, which
spin_unlock(saved) then needs to restore.
Not really. A queue means overhead and using an interrupt for something non-critical as scanning buttons is adding more complexity then necessary.
I like my buttons reliable and precise, with software debouncing and accelerating autorepeat. I hate devices that occasionally miss keypresses, because you are "too fast". Varying autorepeat rate also feels very clunky and amateurish to me.
If the buttons are directly connected (not a matrix), the periodic scanning interrupt can be normally disabled, and instead only a state change interrupt active that re-enables the periodic scanning interrupt on the first detected state transition. (On many Cortex-M implementations, there is only one hardware input pin interrupt per GPIO bank, so this is quick and easy to do.) The periodic scanning interrupt also counts the number of iterations with no events and no keys being pressed, and when that exceeds some configured limit, disables itself and enables the state change interrupt. This saves resources.
When the input events include combined button presses and/or doublepresses, it greatly simplifies the input handling, because the state machine only notices verified events, and can run at varying intervals, without having to consider
elapsed time between iterations. Instead of counting milliseconds, I count scanning loop iterations, and can tune the scan rate to something suitable.
As a particular example, consider a capacitive touchscreen on a small 240×320 display (
2.8",
3.2", or
3.5" size). You can use an event queue with 32-bit entries to record the touch tracks (id:pressure:X:Y), and even support multitouch (one id per concurrent touch). The rate at which you poll the touchscreen determines how many waypoints you get for gestures and such. If you poll at specific regular intervals, the waypoints are regularly spaced in time, and delta between consecutive points is directly proportional to velocity. Otherwise, your waypoints are basically randomly polled along the track the user dragged.
The code is not hard to craft nor use; it is really only the RAM used by such queues that can be a limiting factor.
Consider my example code earlier in this message: the state machine simply calls
uint32_t ev = queue_pop(input_events); whenever it wants to see if there are pending input events that would affect the state machine. The corresponding
queue_push(input_events, new_ev) calls can occur in interrupt service routines, or even on another core (or in an interrupt service routine on another core) on Raspberry Pi Pico (aka RP2040).
If your gadget provides a serial port or USB serial for commands (so it can be remotely operated instead of using the physical buttons), your serial command parser can generate the corresponding input events too, without added complexity.
My point here is to show that such event queues can be used on microcontrollers, even AVRs; and that they can be used in various ways with finite state machines. Whether they fit to any specific use case, depends obviously on the use case, with the memory they need being their most limiting factor.
I don't
always use them in my UI projects, either: after all, they're just another tool one can use when appropriate.
No, I have a better solution! All of the above is bad.
:'(