EEVblog Electronics Community Forum
Electronics => Projects, Designs, and Technical Stuff => Topic started by: NivagSwerdna on May 14, 2021, 10:45:30 am
-
[attachimg=1]
I'm using ATSAMC21J for a particular retro project which is 5V based (hence the ATSAMC21 fits rather nicely); the project is quite generic... basically it is meant to be a test circuit that you drop into a 40 pin DIP socket and then you can twiddle the values. Because it is generic I cannot easily control the physical mapping of function to a pin (since it differs for different target devices)...
This leaves me with a dogs breakfast of addressing in the PORT registers PORTA and PORTB.... each is 32 bits.
So... if I want to change the value of all the A's in green.... A0..A15... the target address bus.. I need to twiddle bits... each bit is going to take a few instructions until I get a value that I can write.. indeed the change can never be atomic since it is spread across 2 registers....
All this bit twiddling is going to take a lot of time (probably manageable but not great) and for instance guarantees that I would never be able to emulate the underlying device at device speeds (my clock in 48MHz; theirs is 3MHz).
Can anyone suggest a 5V part, 64 QFN or similar where I could map the physical pins into a more logical structure? Or any other clever tricks for that matter!
Thanks in advance
-
There are software techniques that can be used to make bit remapping like this faster (versus one bit at a time anyway), but they depend on how convoluted the map is and are probably not fast enough to run a 3MHz bus (is it actually 3MHz, or is that the CPU clock and bus is slower?) from a 48MHz MCU.
I know you said the device is meant to be generic, would an adapter board that remaps the pins as needed per application be a problem? What you're doing looks a lot like emulating an old micro bus, so you can probably cover a ton of target boards with just a handful of adapters for the most common parts if that's the case?
Ultimate flexibility would be to use an FPGA to provide the pin remapping, but not sure how many 5V options there are these days.
-
Aside, I don't know why or how the fuck they make those nonsense pinouts... ::)
Given that that's the state of things, the intended solution is the PCB, that's what it's there for. Perhaps you should consider a second riser board on top of the DIP40, to configure for different MCU pinouts?
You may also be interested in an MCU with external bus (it's uh, EBM in AVRs, not sure if same in SAMs; see also FSMC in STM32s), if the timing matches (or can be fitted with a few logic gates), you can end up with the peripheral bus mapped into physical memory, vastly improving throughput and simplifying access.
The upscale AVR XMEGAs (and probably newer MEGAs?) have EBM, generally have logical pinouts, and are available in QFNs, but maybe don't have the memory or CPU power you need here, I don't know. They're also 3.3V devices.
3.3V isn't a hard stop: it's TTL compatible. That may be retro-relevant. For example I have an XMEGA directly wired to a GPIB bus, it works fine. It's not a good idea if 5V CMOS devices are present, however.
Tim
-
ATSAMC21 has PORTx.OUT, PORTx.OUTSET, PORTx.OUTCLR, and PORTx.OUTTGL registers to set the output pin states; PORTx.DIR, PORTx.DIRSET, PORTx.DIRCLR, and PORTx.DIRTGL to set the pin direction. Only when reading pin states, do you need any bit operations. SAMC21 is a Cortex-M0+ with a single-cycle multiplier, so we can do something pretty crafty:
#include <stdint.h>
/* TODO: Verify these addresses! */
#define PORTA_DIRCLR (*(volatile uint32_t *)0x60000004)
#define PORTA_DIRSET (*(volatile uint32_t *)0x60000008)
#define PORTA_DIRTGL (*(volatile uint32_t *)0x6000000C)
#define PORTA_OUTCLR (*(volatile uint32_t *)0x60000014)
#define PORTA_OUTSET (*(volatile uint32_t *)0x60000018)
#define PORTA_OUTTGL (*(volatile uint32_t *)0x6000001C)
#define PORTA_IN (*(volatile uint32_t *)0x60000020)
#define PORTB_DIRCLR (*(volatile uint32_t *)0x60000084)
#define PORTB_DIRSET (*(volatile uint32_t *)0x60000088)
#define PORTB_DIRTGL (*(volatile uint32_t *)0x6000008C)
#define PORTB_OUTCLR (*(volatile uint32_t *)0x60000094)
#define PORTB_OUTSET (*(volatile uint32_t *)0x60000098)
#define PORTB_OUTTGL (*(volatile uint32_t *)0x6000009C)
#define PORTB_IN (*(volatile uint32_t *)0x600000A0)
/* Pin configuration, 640 bytes. */
uint32_t pin_mask[40][2];
uint32_t pin_mult[40][2];
/* PA00 = 0, PA31 = 31, PB00 = 32, PB31 = 63. */
int pin_define(const int pin, const int num)
{
/* Safety check */
if (pin < 0 || pin >= 40 || num < 0 || num >= 64)
return -1;
if (num < 32) {
pin_mask[pin][0] = ((uint32_t)1) << num;
pin_mask[pin][1] = 0;
pin_mult[pin][0] = ((uint32_t)1) << (31 - num);
pin_mult[pin][1] = 0;
} else {
pin_mask[pin][0] = 0;
pin_mask[pin][1] = ((uint32_t)1) << (num - 32);
pin_mult[pin][0] = 0;
pin_mult[pin][1] = ((uint32_t)1) << (63 - num);
}
return 0;
}
void pin_mode_in(const int pin) { PORTA_DIRCLR = pin_mask[pin][0]; PORTB_DIRCLR = pin_mask[pin][1]; }
void pin_mode_out(const int pin) { PORTA_DIRSET = pin_mask[pin][0]; PORTB_DIRSET = pin_mask[pin][1]; }
void pin_mode_tgl(const int pin) { PORTA_DIRTGL = pin_mask[pin][0]; PORTB_DIRTGL = pin_mask[pin][1]; }
void pin_mode(const int pin, const int mode)
{
if (mode & 1) {
pin_mode_out(pin);
} else {
pin_mode_in(pin);
/* TODO: pullups etc., per additional mode bits */
}
}
void pin_out_set(const int pin) { PORTA_OUTSET = pin_mask[pin][0]; PORTB_OUTSET = pin_mask[pin][1]; }
void pin_out_clr(const int pin) { PORTA_OUTCLR = pin_mask[pin][0]; PORTB_OUTCLR = pin_mask[pin][1]; }
void pin_out_tgl(const int pin) { PORTA_OUTTGL = pin_mask[pin][0]; PORTB_OUTTGL = pin_mask[pin][1]; }
void pin_out(const int pin, const int state)
{
if (state) {
pin_out_set(pin);
} else {
pin_out_clr(pin);
}
}
uint32_t pin_in(const int pin)
{
return !!(((PORTA_IN * pin_mult[pin][0]) | (PORTB_IN * pin_mult[pin][1])) & 0x80000000);
}
which using arm-gcc 5.4.1 (-Wall -Os -mcpu=cortex-m0plus -mthumb) generates
.syntax unified
.cpu cortex-m0plus
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 4
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.thumb
.syntax unified
.file "ops.c"
.text
.align 1
.global pin_define
.code 16
.thumb_func
.type pin_define, %function
pin_define:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
push {r4, r5, r6, r7, lr}
cmp r0, #39
bhi .L5
cmp r1, #63
bhi .L5
ldr r2, .L7
lsls r3, r0, #3
ldr r5, .L7+4
cmp r1, #31
bgt .L3
movs r4, #1
movs r0, r4
lsls r0, r0, r1
str r0, [r2, r3]
movs r0, #0
adds r2, r2, r3
str r0, [r2, #4]
movs r2, #31
subs r1, r2, r1
lsls r4, r4, r1
str r4, [r5, r3]
adds r3, r5, r3
str r0, [r3, #4]
b .L2
.L3:
movs r4, #1
movs r6, r1
movs r7, r4
movs r0, #0
subs r6, r6, #32
lsls r7, r7, r6
str r0, [r2, r3]
adds r2, r2, r3
str r7, [r2, #4]
movs r2, #63
subs r1, r2, r1
lsls r4, r4, r1
str r0, [r5, r3]
adds r3, r5, r3
str r4, [r3, #4]
b .L2
.L5:
movs r0, #1
rsbs r0, r0, #0
.L2:
@ sp needed
pop {r4, r5, r6, r7, pc}
.L8:
.align 2
.L7:
.word pin_mask
.word pin_mult
.size pin_define, .-pin_define
.align 1
.global pin_mode_in
.code 16
.thumb_func
.type pin_mode_in, %function
pin_mode_in:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L10
lsls r0, r0, #3
ldr r1, [r0, r3]
ldr r2, .L10+4
adds r0, r3, r0
str r1, [r2]
ldr r2, [r0, #4]
ldr r3, .L10+8
@ sp needed
str r2, [r3]
bx lr
.L11:
.align 2
.L10:
.word pin_mask
.word 1610612740
.word 1610612868
.size pin_mode_in, .-pin_mode_in
.align 1
.global pin_mode_out
.code 16
.thumb_func
.type pin_mode_out, %function
pin_mode_out:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L13
lsls r0, r0, #3
ldr r1, [r0, r3]
ldr r2, .L13+4
adds r0, r3, r0
str r1, [r2]
ldr r2, [r0, #4]
ldr r3, .L13+8
@ sp needed
str r2, [r3]
bx lr
.L14:
.align 2
.L13:
.word pin_mask
.word 1610612744
.word 1610612872
.size pin_mode_out, .-pin_mode_out
.align 1
.global pin_mode_tgl
.code 16
.thumb_func
.type pin_mode_tgl, %function
pin_mode_tgl:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L16
lsls r0, r0, #3
ldr r1, [r0, r3]
ldr r2, .L16+4
adds r0, r3, r0
str r1, [r2]
ldr r2, [r0, #4]
ldr r3, .L16+8
@ sp needed
str r2, [r3]
bx lr
.L17:
.align 2
.L16:
.word pin_mask
.word 1610612748
.word 1610612876
.size pin_mode_tgl, .-pin_mode_tgl
.align 1
.global pin_mode
.code 16
.thumb_func
.type pin_mode, %function
pin_mode:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
push {r4, lr}
lsls r3, r1, #31
bpl .L19
bl pin_mode_out
b .L18
.L19:
bl pin_mode_in
.L18:
@ sp needed
pop {r4, pc}
.size pin_mode, .-pin_mode
.align 1
.global pin_out_set
.code 16
.thumb_func
.type pin_out_set, %function
pin_out_set:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L22
lsls r0, r0, #3
ldr r1, [r0, r3]
ldr r2, .L22+4
adds r0, r3, r0
str r1, [r2]
ldr r2, [r0, #4]
ldr r3, .L22+8
@ sp needed
str r2, [r3]
bx lr
.L23:
.align 2
.L22:
.word pin_mask
.word 1610612760
.word 1610612888
.size pin_out_set, .-pin_out_set
.align 1
.global pin_out_clr
.code 16
.thumb_func
.type pin_out_clr, %function
pin_out_clr:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L25
lsls r0, r0, #3
ldr r1, [r0, r3]
ldr r2, .L25+4
adds r0, r3, r0
str r1, [r2]
ldr r2, [r0, #4]
ldr r3, .L25+8
@ sp needed
str r2, [r3]
bx lr
.L26:
.align 2
.L25:
.word pin_mask
.word 1610612756
.word 1610612884
.size pin_out_clr, .-pin_out_clr
.align 1
.global pin_out_tgl
.code 16
.thumb_func
.type pin_out_tgl, %function
pin_out_tgl:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L28
lsls r0, r0, #3
ldr r1, [r0, r3]
ldr r2, .L28+4
adds r0, r3, r0
str r1, [r2]
ldr r2, [r0, #4]
ldr r3, .L28+8
@ sp needed
str r2, [r3]
bx lr
.L29:
.align 2
.L28:
.word pin_mask
.word 1610612764
.word 1610612892
.size pin_out_tgl, .-pin_out_tgl
.align 1
.global pin_out
.code 16
.thumb_func
.type pin_out, %function
pin_out:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
push {r4, lr}
cmp r1, #0
beq .L31
bl pin_out_set
b .L30
.L31:
bl pin_out_clr
.L30:
@ sp needed
pop {r4, pc}
.size pin_out, .-pin_out
.align 1
.global pin_in
.code 16
.thumb_func
.type pin_in, %function
pin_in:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
movs r2, r0
ldr r3, .L34
push {r4, lr}
ldr r4, [r3]
ldr r3, .L34+4
ldr r1, .L34+8
ldr r0, [r3]
lsls r3, r2, #3
ldr r2, [r3, r1]
adds r3, r1, r3
ldr r3, [r3, #4]
muls r2, r4
muls r0, r3
orrs r0, r2
lsrs r0, r0, #31
@ sp needed
pop {r4, pc}
.L35:
.align 2
.L34:
.word 1610612768
.word 1610612896
.word pin_mult
.size pin_in, .-pin_in
.comm pin_mult,320,4
.comm pin_mask,320,4
.ident "GCC: (GNU Tools for ARM Embedded Processors) 5.4.1 20160919 (release) [ARM/embedded-5-branch revision 240496]"
(exact output, only whitespace modified for easier reading).
Essentially, pin_mask[logical][bank] contains the bit mask – only one bit set per logical pin! – used with clear/set/toggle registers (when setting pin direction or output state); and pin_mult[logical][bank] contains a multiplier that shifts the desired bit to the most significant position, or zero if the bank does not affect the logical pin state, for use when reading the pin states. You have 40 logical pins, and the SAMC21 has GPIO pins in two logical banks, so these lookup tables do take 640 bytes of SRAM.
Is this fast enough for you? I doubt you can get 3 MHz with a 48 MHz MCU – that is 1:12 – but it is not that much work.
A similar approach – using a bit mask per pin per bank and CLR/SET/TGL registers; and a multiplier to "shift" the input bit to the highest bit position but still being able to clear the result to zero – works on many other ARMs as well. Note that instead of a multiplier, you can use a shift count, but only if the shift instruction supports clearing the entire register (shift down by 32), or if you do not use the pin corresponding to bit 0 in any bank (so that you can add an explicit additional shift right). But, when you have a single-cycle 32×32 multiplication instruction, it makes sense to use it to ones advantage.
If I was doing this, I'd prototype it using Teensy 4.1 with a similar scheme (with four GPIO banks, not two). It has an i.MX RT1062 running at up to 600 MHz (~960 MHz if overclocked and well cooled). That'd at least tell oneself how much computing power is actually needed, even if one didn't end up using that particular processor.
-
That looks interesting!
-
Similarly – if you have sufficient SRAM for the lookup tables –, you can create N-bit output "buses", by using a pre-populated lookup table,
uint32_t bus_map[MAXVAL+1][banks];
where MAXVAL is (uint32_t)((1 << N)-1), i.e. 2N-1. To write value v to the bus, you write bus_map[v & BUS_MAX] to the OUTSET registers of each bank, and bus_map[(~v) & BUS_MAX] to the OUTCLR registers.
If the bus pins happen to be in the same bank and the bus width is a compile time constant, this is just one lookup, a XOR, and two 32-bit writes.
Technically, all the pins in the same bank do change at the same time, but either rising or falling slopes are always in the same order (depends on which write you do first); in different banks, there is that small latency, but still, the pin change order is always the same, and should suffice for a bus.
If N is large, I recommend splitting it into multiple lookup tables. It does slow things down a bit (because v must be split into groups of bits), but it dramatically reduces the size of the lookup tables needed.
For example, if you have a 16-bit bus on a SAMC21, you only need 2048 bytes for two lookup tables, and you can use any GPIO pins in whatever order you choose for the bus:
uint32_t bus_outmap[256][2][2]; /* [value][group][bank] */
void bus_write16(uint32_t value)
{
const uint32_t val0 = value & 255;
const uint32_t val1 = (value >> 8) & 255;
const uint32_t not0 = val0 ^ 255;
const uint32_t not1 = val1 ^ 255;
/* Rising edges. */
PORTA_OUTSET = bus_outmap[val0][0][0] | bus_outmap[val1][1][0];
PORTB_OUTSET = bus_outmap[val0][0][1] | bus_outmap[val1][1][1];
/* Falling edges. */
PORTA_OUTCLR = bus_outmap[not0][0][0] | bus_outmap[not1][1][0];
PORTB_OUTCLR = bus_outmap[not0][0][1] | bus_outmap[not1][1][1];
}
Unfortunately, reading from a bus requires a rather large lookup table. The logic is the same, except inverted, and that now you have a bus width of 2×32=64 bits (on SAMC21; twice that on many other ARMs); note that this lookup table size does not depend on the bus width at all, only on the number of GPIO bits on the hardware. For example, using 8192 bytes of lookup tables, regardless of the bus size:
uint32_t bus_inmap[256][4][2]; /* [value][group][bank] */
uint32_t bus_read(void)
{
const uint32_t val0 = PORTA_IN;
const uint32_t val1 = PORTB_IN;
return bus_inmap[ val0 & 255][0][0] | bus_inmap[ val1 & 255][0][1]
| bus_inmap[(val0 >> 8) & 255][1][0] | bus_inmap[(val1 >> 8) & 255][1][1]
| bus_inmap[(val0 >> 16) & 255][2][0] | bus_inmap[(val1 >> 16) & 255][2][1]
| bus_inmap[(val0 >> 24) & 255][3][0] | bus_inmap[(val1 >> 24) & 255][3][1];
}
Again, if the pins are all in the same bank, you can save half (three quarters on those ARMs that have four GPIO bank pins) on lookup table size.
Setting up the lookup tables is simple, but not very fast (there are two loops of (MAXVAL+1) per "bus" bit). I would clear the lookup tables first, then set the lookup bits for each individual pin:
void bus_init(void)
{
memset(bus_outmap, 0, sizeof bus_outmap);
memset(bus_inmap, 0, sizeof bus_inmap);
}
/* pin: PA00=0, PA01=1, .., PA31=31, PB00=32, PB01=33, .., PB31=63.
*/
void bus_define(const uint32_t addrbit, const uint32_t pin)
{
/* Output mapping. */
{
const uint32_t value = 1 << (addrbit & 7);
const uint32_t group = addrbit >> 3;
const uint32_t pinmask = 1 << (pin & 31);
const uint32_t bank = pin / 32;
for (uint32_t val = value; val < 256; val += value)
bus_outmap[val][group][bank] |= pinmask;
}
/* Input mapping. */
{
const uint32_t value = 1 << (pin & 7);
const uint32_t group = (pin >> 3) & 3;
const uint32_t addrmask = 1 << addrpin;
const uint32_t bank = pin / 32;
for (uint32_t val = value; val < 256; val += value)
bus_inmap[val][group][bank] |= addrmask;
}
/* TODO: Set the actual pin properties, like default direction,
driving strength etc., perhaps based on an additional
parameter to this function. */
}
I recently discovered that on Teensy 4.0 I can use this to implement a pretty darn efficient 16-bit "bus" using GPIO1 pins (labeled (https://www.pjrc.com/teensy/pinout.html) 0, 1, 14-21, 24-27 on Teensy), you see. Note that while the pads labeled 24-33 look odd, one can solder an SMD 2×5 pin header with 0.1" spacing to them, so that a carrier or break-out board only needs pins in the standard 0.1" spacing.
(Teensy 4's i.MX RT1062 has four GPIO banks, each with a data set (DR_SET), clear (DR_CLEAR), and toggle (DR_TOGGLE) registers, so in that regard quite similar to SAMC21.)
I am still investigating on exactly how I could use an 18-bit bus (3×6 bits for RGB data, and 8 bits for register stuff) efficiently for use with parallel display modules (ILI9341 and the like), without using the pins for an UART, SPI, or I2C. The lookup tables take up to 3200 bytes, but the i.MX RT1062 on it has lots of RAM (a megabyte total).
Obviously, I cannot use DMA here, but since I just want this as an USB-controlled embedded Linux "framebuffer" with a few buttons/encoders, that should be okay; note that its native USB is HS (max. 480 Mbit/s), not FS/LS (12/1 Mbit/s). I probably could even have a 32-bit true color framebuffer with an automatic 8-bit (via lookup) overlays, combined during output updates. Might be useful for error/status messages for an appliance, without disturbing the application-accessible true color framebuffer... Obviously, an indexed color (paletted) framebuffer would be easy to support, since the "palette" can directly refer to the GPIO port bit masks. ("Clear" is always the binary inverse of the "set", applied to the "bus" bits only. The output toggle register is useful for the write strobe, too.)
Without using the six oddball SD card pads on the bottom, there are 34 GPIO pins on the Teensy 4.0: abovementioned 16 pins in bank 1; 9 pins (labeled 6-13, 32) in bank 2; 3 pins (labeled 28, 30, 31) in bank 3; and 6 pins (labeled 2-5, 31, 33) in bank 4. Bank 4 has five pins (labeled 2, 3, 4, 33, 5) in consecutive bits so no lookup table is needed for those. Four bank 1 pins (labeled 19, 18, 14, 15) are also in consecutive bits, as are six pins labeled 17, 16, 22, 23, 20, 21. Unfortunately, pins labeled 16-19 (bank 1) have two of the three I2C buses, and pins labeled 11-13 (and 10 for chip select, bank 2) have the easiest-to-access SPI bus. There is another SPI bus in bank 1 using pins labeled 0, 1, 26, 27; and a third one using the oddball SD card pads labeled 34-39, but I don't like using the oddball pads as they are hard to break out. (I haven't included the six oddball SD card pads in this list anywhere, except mentioning they are labeled 34 through 39.)
Can you tell I've looked at various 32-bit microcontrollers, and their datasheets to see which ones have "nice" GPIO banks? ;D
It is a darned pity the i.MX RT1062 is only available in MAPBGA. If I wasn't just an uncle bumblefuck hobbyist, I'd do my own i.MX RT1062 board, with a different set of pins, optimized for interfacing parallely-bussy thingies to high-speed USB 2.0 (max. 480 Mbit/s). Even a TQFP scares me a bit..
(Even with the slow tty layer in Linux when using USB Serial, I can get 20+ Mbits/s ping-pong (https://forum.pjrc.com/threads/59587-Help-understanding-odd-USB-Serial-test-results-(Teensy-4-0)?p=230013#post230013) and 200+ Mbits/s one-way (https://forum.pjrc.com/threads/60622-Serial-Communication-Problems-Teensy-4-0?p=237636#post237636) with a Teensy 4.0, so it should have no issues whatever even with 60Hz full updates to a 32-bit 320×240 framebuffer (147,456,000 bits per second).)