Products > Embedded Computing

implementing ICSP (program Microchip MCU) on a linux SBC

(1/3) > >>

JPortici:
It's an idea i have been toying with from time to time... I have on my todo list to build a product programmer/tester, the product main feature is CANBus, and it's used also to update the firmware.
Currently the workflow for program/test is program the bootloader, then attach a CAN interface to a PC and download firmware, run test suite, and the aim is to condense everything in one device, press one button and run all tasks.

I had plans to do everything with a suitable microcontroller, however writing the firmware in a way that was easy to extend/adapt has not been easy, instead the PC software was a breeze to write, which made me thinking: what if i designed a board running embedded linux?

I say desgned as it would be a great learning experience:
- designing a board for an ARM9/A5/A7 that is not "impossible" to route and hand assemble (jaycarlson's website comes to mind.)
- getting linux to boot
- optimizing linux so it hopefully starts in seconds, not minutes
- understand what to do in order to not brick the filesystem when the user inevitably remove power without shutting down (because realistically this is going to happen: connect board to power, sit waiting until booted, program one device, remove power and toss away.)

But what i haven't figured out is how to program the bootloader in the device, in other words how to implement ICSP (the target boards have a microchip microcontroller).
I already written in the past the code to implement ICSP in a microcontroller, so it could program a secondary controller on board, it's almost trivial on todays devices, as you can just use SPI and logic level signals.
But i wouldn't know where to start on linux: i think i should write a driver? so i can call - say - the function "ReadDeviceID" and the driver would enable the output, toggle MCRL, clock out the command, disable the output, clock in the response...  But how to do it?

Unfortunately, i can't use an ARM device with a microcontroller coprocessor.. i haven't found one that i could call suitable (I'm probably going to use an ATSAMA9X60, i was also eyeing the ATSAMA75 which would have 4+ canbus, so i could do 4 boards at once instead of 2, but it seems much more difficult to design a PCB)
I also don't want to use a separate controller.. though that would be easy (separate controller that runs ICSP code, control it via UART..) i would prefer to use one single programmable part.

I also don't want to use an existing SBC + Hat/Cape/Whatever, i would instead do everything with a microcontroller, as umpleasant as it is.

Nominal Animal:
I do something similar with many embedded Linux appliances – devices like routers, TV boxes, et cetera.  Instead of ICSP, I typically add small displays describing operational state to nontechnical users, and buttons to trigger specific things (especially custom stuff on OpenWRT routers).

On embedded boards, from tiny ones like Milk-V Duo and Ox64 to larger ones like Odroids, the GPIO pins and UART/SPI/I²C buses are typically easily accessible, sometimes needing a small Device Tree Overlay to enable (if the default peripheral for those specific pins differs) – a compiled structure describing attached hardware to kernel drivers, read at boot time.  The problem is always timing.  In Linux, userspace has no guarantees of being able to respond within any specific interval, so only pins whose state change or state change acknowledgement may be delayed (for up to several hundred milliseconds in some cases) can really be controlled from userspace.

My preferred solution is to use a microcontroller with native USB to handle the timing, and control the microcontroller via e.g. USB serial.  When there is bulk data to be transferred, I like to use dual USB serial, i.e. two pairs of USB endpoints on the same device, and use one for bulk data and one for control messages.  (That single USB device then provides two TTY ACM character devices, and you can control their access modes and create a suitable symlink in /dev using suitable udev rules.)  For Teensy microcontrollers, the Teensyduino integration to the Arduino environment means that one only needs to select that option from the menus, and the Serial (main USB Serial) and SerialUSB1 (secondary USB Serial) will automatically be available.  The cores even expose whether the corresponding tty device is opened on the host by the object evaluating to True, and to False otherwise.  Teensy 4.0 is way overkill for something like this, but they aren't that expensive, so I like to use them.

Low- and Full-Speed USB 2.0 transfers data in up to 64 byte (of payload) packets, and High-Speed in up to 512 byte packets, so the MCU-Host protocol is best designed around such packets, even when using USB Serial.  Using termios on the host side –– forget libusb, it does not do all error checks, only most of those that occur often enough –– you will end up using tcdrain(devfd) after each command to ensure it is sent to the device, but assuming you do check errors properly (including short reads and writes with read() and write), the end result is extremely robust.  I say this from experience, testing my Teensy communications often for a few gigabytes.  (Teensy 4.x can easily sustain 200+ Mbit/s over USB serial in one direction.)  Using "raw" termios, either from C or Python, works extremely well for me, and the data is explicitly binary, no need to limit yourself to ASCII or text commands and responses.

The main difficulty is designing the Host-MCU communications in an easily maintained manner, without generalizing it too much.

For displays, I like to limit to a small number of "commands", the most used one being a bulk data update of a rectangular region in the display, which often is the whole display, and is something many display controllers like ILI93488, ST7899, et cetera support.  While they also support scrolling operations, I've discovered I don't really need those in practice.  The optimum complexity level for me is when I can replace the display, reflash the microcontroller to match the display controller, and in the host software at most change the display width and height in pixels, but no other functional changes.  In fact, it is the MCU alone that responds to the "identify display" request with display dimensions and approximate frame rate.  It has taken me several years of hobby-level playing with this to end up with this conclusion/opinion, though, so expect others to disagree.  I particularly like asynchronous binary commands, with one byte reserved for ID, so that when the command completes, the MCU will respond with the ID and the status.  I imagine that for ICSP, a synchronous (one query at a time, responded to before next query is worked on) one is more suitable, but I could be wrong there.

That is also why I cannot suggest you a specific ICSP command set: it should be based on your experience, and your view of how it might change in future.  Extending is easy and perfectly okay, but making things behave differently for the same command depending on the interface version is horribly nasty.  Thus, reserving extra parameter or parameters that you specifically initialize to all zeros can be useful for some commands you expect you may need to vary for later hardware.  I warmly recommend the idea of implementing timing and target device specific complexity in the MCU, keeping the host software pretty generic, even though I know very well that host software development is much easier and much faster: it is all about managing complexity, and keeping sensitive/difficult code volume minimal and not diffused among generic code – here, keeping the low-level ICSP stuff in the MCU, letting the host software work on a more abstract level ignoring device-specific details.  I've found that this just makes things work more reliably, without requiring superhuman coding skills.

Apologies for the long wall of text.

JPortici:

--- Quote from: Nominal Animal on September 05, 2024, 04:42:53 pm ---Apologies for the long wall of text.

--- End quote ---

No worries!
So that you know, ICSP is "In Circuit Serial Programming", which is the protocol to program the firmware inside PIC MCUs, in current generation devices it has been simplified so at the hardware level it is just SPI (albeit with MOSI/MISO on the same pin, so tie both and MOSI is tristated when reading data from the device)
as i said, i could of course add a small MCU and have it talk to the SBC via UART (or USB of course) but i'm trying to see if it's feasible to do directly from linux.

Maybe i have a misconception of what a driver does, but in my mind it's a small program that read/write from/to the peripheral and GPIO, but running at a higher "priority", and when required it either returns the function, or calls a callback so the software in userspace can go on
I guess my question would be if i understood it correctly, and what should i do to implement such driver

Nominal Animal:

--- Quote from: JPortici on September 05, 2024, 08:00:24 pm ---So that you know, ICSP is "In Circuit Serial Programming", which is the protocol to program the firmware inside PIC MCUs
--- End quote ---
I know, and I actually know most of the ICSP commands various models of PICs use, too.  Nevertheless, as I haven't made my own PIC ICSP dongle, I don't know the optimal abstract-command set.  You mentioned you have, so you'd be better able to define that.


--- Quote from: JPortici on September 05, 2024, 08:00:24 pm ---Maybe i have a misconception of what a driver does, but in my mind it's a small program that read/write from/to the peripheral and GPIO, but running at a higher "priority", and when required it either returns the function, or calls a callback so the software in userspace can go on
--- End quote ---
Yes.  In Linux, the timing requirements mean you'd need to create a driver (specifically a character device driver), in the form of a kernel module.

Linux Device Drivers, 3rd Edition describes the implementation and background, although the in-kernel interfaces have changed somewhat.  The idea is that the driver exposes a character device under /dev that an userspace process (with suitable privileges, requirements configured via udev rules as the default is root-only) opens the device as normal, operates on it, the driver implementing the operations.

The userspace-facing interface has several mechanisms, but two are simple and recommended: read()/pread() and write()/pwrite() to read or write data from/to the device, starting at a specific offset or position (default is current file position); and ioctl() messages with a single userspace memory structure as an input/output parameter.  (This obviously suggests using read/pread/write/pwrite for Flash/RAM access, noting the offsets are 64 bit even on 32-bit architectures, and ioctl()s for commands and requests, but again, not having implemented a PIC ICSP dongle myself, I don't presume to claim that is the best option.  In fact, the Linux event device interface –– exactly this kind of a character device, but one that reads/writes only complete struct input_event structures, containing a kernel timestamp, unsigned 16-bit type, unsigned 16-bit code, and signed 32-bit value –– uses reads and writes only, ignoring "file position", and although some idiots tried to move it to an ioctl(), the read/write model has been found best for Human Interface Device messages.  A set of messages that occur at the same time are always separated by a single .type=EV_SYN,.code=SYN_REPORT,.value=0 message.)

It is doable, yes.  I recommend using Bootlin Elixir Cross Referencer for investigating the Linux kernel source tree for specific kernel versions, especially when reading LDD3 (2.6.39.4 tree), as it makes all identifiers links with cross references.


Compare to using a microcontroller with native full-speed (12 Mbit/s) or high-speed (480 Mbit/s) USB, implementing one or more USB Serial (USB CDC) endpoints.  Again, my preference is Teensy 4.0 with Teensyduino (although bare metal is quite possible, as the bootloader is on a separate chip; only means you don't really have access to JTAG/SWD pins on the i.MX RT1062), with TXU0102 (UART) or TXU0304 (SPI) voltage translators, or ISO6721/7721 (UART) or ISO6741/7741 (SPI) isolators, both only needing a couple of 100nF supply bypass capacitors.

You implement a request-response (or command-response) protocol, noting that with high-speed USB, each data packet can have up to 512 bytes of payload data: that much data can appear "atomically" in your receive buffers.  I recommend splitting longer data transfers into smaller packets, so you won't need larger buffers, although Teensy 4.x do have 1Mbyte of RAM (in two 512k parts).

On the host computer, you configure an udev rule to set the proper owner, group, and mode for the device node, and create a symlink (nobody wants to scan which /dev/ttyACMN it might be!), say /dev/usb-icsp-interface.  You write an userspace application in whatever language you want (but I do recommend using termios and not the serial libraries; they're all crappy in my opinion), but note that as explained in man 2 read and man 2 write, each such call can return a smaller value ("short count") than requested, in which case you simply need to retry with the rest of the data.  In Python, you use open(devicepath, "rb", buffering=0) to get raw I/O, so that the object will be of io.RawIOBase class or a derivative; and termios module to set it to raw mode.  It is almost always done by obtaining the current settings, saving them for restoring just before closing the device and copying as the basis of the new settings.  I can show you exact code if you have decided what programming language you'll be using.

In essence, moving the "icsp driver" code from kernel to a separate microcontroller you simplify the programming interface, as you aren't restricted to the kernel interfaces, nor do you need to consider any code running in parallel at all.  You have full control and essentially full isolation.  Depending on your needs, you can use a microcontroller from any number of families, from the cheap WCH CH55x to the NXP i.MX RT1062 used on Teensy 4, including many Microchip ones.

The main difference is that you need to convert from function calls to serialized command structures or requests and their responses passed along a bidirectional pipe, and on the MCU, implement the function from commands/requests received serially via USB.


The Linux userspace API for SPI devices is documented here and in the <linux/spi/spidev.h> header file, both part of the Linux kernel docs.  It is based on 'spidev' character devices, and is rather limited.  In particular, read() and write() are half-duplex, with the other side discarded, although that should be fine for a shared MOSI/MISO/DI/DO pin (which I only realized after writing the above).

If and only if that interface is sufficient and implemented for the SBC you want, you can use the SPI interface from an userspace application.  Obviously, you can also use GPIO pins in conjunction, but with the timing caveat: you must allow for any operation to be delayed by up to several hundred milliseconds when the SBC is otherwise busy.  The same applies to separate transfers over SPI, unless you use the ioctl() interface with one or more struct spi_ioc_transfer structures, in which case they are almost continuous and the chip select is kept asserted for all transfers in the same ioctl command.  Changing the userspace process and I/O priority can shrink that to maybe two dozen milliseconds under load –– less than that when not loaded, of course ––, but then a bug in it causing a busy loop can make the machine nearly unusable; as in terminating that process take a couple of minutes.

Personally, I'd test the SPI interface on the SBC you consider using first, using a microcontroller as the SPI slave.  For bulk data, I like to use the 32 high bits of the Xorshift64* pseudo-random number generator with userspace-specified 64-bit seed (any nonzero 64-bit state is okay); here, one test command would read the current seed and error counter, another set the seed and reset error counter, one send a sequence of data that the MCU would check against Xorshift64*, and another receive a sequence of data (userspace comparing to Xorshift64*).  This is one of the rare generators that passes all BigCrush statistical tests for randomness, which not even Mersenne Twister does. The period is 2⁶⁴-1, which is sufficient.  Plus, it is extremely fast on microcontrollers, consisting of (64-bit) bit shifts and exclusive-ors, with a final 64-bit multiplication (of which only 32 most significant bits are actually used) for mixing.  This is what I used for USB Serial testing on Teensy 4.x, getting 200+ Mbit/s sustained indefinitely.

Only after such testing would I trust the SPI and spidev implementation for ICSP use.  (It is not commonly enough used to be automatically trustworthy, in my opinion, you see.)

DiTBho:
Ideed, when, many many years ago, the bdm was implemented in kernel space (on the top of the parallel port, actually an hack) it never worked well, it broke with every kernel source new version, and above all, since nobody uses the parallel port anymore, nobody was interested in the thing anymore, not even if we could have replicated the PC-parallel port with the GPIOs in its place.

Instead, the step next was exactly the approach proposed above: an USB MPU connected to the GNU/Linux SBC. It takes on the load of the commands, passed from an app in userspace (talks bulk-only), and translates them into the time signals to be fed to the target to be debugged and/or programmed.

For the last 10 years, this worked significantly better, and has proven to be much easier to maintain and update, than the previous solution.

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod