Oh, wow, this is stupid. So there are two devices on the same JTAG port. One is ARM-compatible with IR length of 4. This is the one that has ARM-specific access registers, but has no standard test registers. And then there is another one that is the actual device with IR length of 5.
On a positive note, this will let me test multiple chained devices.
The IR length 4 one is the actual ARM core, and the IR length 5 one is the boundary scan of the chip itself implemented in the GPIO controllers.
The IR length 4 one is the actual ARM core, and the IR length 5 one is the boundary scan of the chip itself implemented in the GPIO controllers.I understand that. It just makes no sense to me. And having that 9 bit IR will further slow down everything. JTAG loses on speed here big time.
This dynamic reconfiguration would make a lot of sense to me. There is almost no scenario when I want to access boundary scan. Yet I want to debug and program the device a lot.
I was just looking in details at how JTAG AP/DP are implemented in the ARM part. And it also sucks. I'm not entirely sure why they did it that way, but a lot of debug accesses would require IR modification. For some reason they kept DAP register address and read/write bit as part of the DR access. But AP/DP bit is implemented as separate IR instructions.
I'm also not clear why they needed 5 bit IR for 4 instructions. And why ARM needed 4 bit IR for 4 registers.
I just setup OpenOCD for the first time to test the JTAG implementation. I'm now a bit better familiar with the output.
In your case the error is basically the first real communication over SWD that is failing. The reason for this may depend on the debugger firmware itself. It does not return correct status code for simplest pin toggles. Those failures you see is OpenOCD is trying to send JTAG to SWD switch sequence, followed by reading of the IDCODE register. All of those things fail. And there is no real reason for the first part to fail at all. You need to have a closer look at the implementation of the dap_swj_sequence (0x12) command.
I don't think it is OpenOCD issue. Something wrong with the debugger itself.
I would try my tool with your hardware https://github.com/ataradov/edbg. It does not matter what MCU you actually select, even if your MCU is not supported, just select any MCU and see if the tool can read information about the debugger and do basic initialization.
I suspect you have some sort of integration issue. And from the OpenOCD logs, it does not even look like it reads any information about the debugger, it just jumps right into the business.
For a "real" debugger, you definitely want to use USB HS. USB FS + HIDs limits you to 64 KBytes/second. This is ok for basic stuff, but if you really want to get the fast debugging, you need to run it over USB HS.
Currently I have an 8-slot ring buffer for the incoming requests. The USB interrupt would cause the request be copied over into the ring buffer, then the main loop now release from WFI would go in and execute things, finally another USB interrupt cause the results to be picked up. Maybe I need to get rid of this ring buffer and just respond to things in the USB interrupt? Or at least not rely on the ring buffer and USB interrupt for outgoing data?
I wonder if using USB HS/SS + CMSIS-DAP v2 Bulk mode + FPGA with hard MCU cores + implement dap_swd_transfer and dap_jtag_transfer using HDL = some really REALLY fast JTAG and SWD action?
CMSIS-DAP is a request-response protocol. There will be nothing in the buffer after the first packet before you respond to it. CMSIS-DAP has capability to buffer multiple requests, but you need to advertise that capability explicitly through DAP_INFO_PACKET_COUNT response.
Most of the time tools just send one request and wait for the response, since this is the most compatible configuration.
It is hard to tell what may be wrong without debugging the code. Sometimes APIs require the input buffer to remove the first byte (command ID), sometimes they need to keep it. So this one thing to check.
Again, run my tool and see if it can at least identify your debugger and print information about it, since OpenOCD does not seem to do that.
Not really. In practice SWD ports on many parts are limited to some pretty low clock, like 16 MHz or so. And this you can pretty well bit-bang from a fast MCU without FPGA.
SS is definitely overkill. HS does improve things a lot, mostly due to ability to send 512 or 1024 byte packets.
Also, there are MCUs with hardware SWD/JTAG drivers, although they are not well documented. Nuvoton M480 series devices have USB HS and SWD peripheral. But nuvoton does not publish any documents about it. But I'm sure it is possible to figure it out with some effort. I never really cared to do that, since optimized bit-banging is plenty fast.
The built-in speed limit is a bummer... I wonder if an advanced MCU/MPU can allow fast SWD/JTAG at ~100MHz speed after the main core itself is clocked up.
I think Nuvoton used that chip in their Nu-Link debug pods. Thus IMO that hardware is pretty much internal use only and the only reason they kept it in the publicly available chips undocumented is just so they don't have to create new masks.
Faster MCUs/CPUs will probably have higher limit on the clock. At the same time, actual debug session does not consume a lot of bandwidth. As long as you are not dumping massive arrays of data, the debugging part is pretty efficient.
One thing that is irritating - why did they mention SWD host in the pin multiplexing table? But that's just a general documentation issue. Their documentation leaves a lot to be desired.
$ ./edbg -lr
Attached debuggers:
63C8AF92 - SushiBits Innovative SushiBits One with CMSIS-DAP
static void target_select(target_options_t *options)
{
uint32_t chip_id, chip_exid;
dap_reset_target_hw(1);
dap_reset_link();
// Stop the core
dap_write_word(DHCSR, DHCSR_DBGKEY | DHCSR_DEBUGEN | DHCSR_HALT);
dap_write_word(DEMCR, DEMCR_VC_CORERESET);
dap_write_word(AIRCR, AIRCR_VECTKEY | AIRCR_SYSRESETREQ);
uint32_t id = dap_read_word(0xe0042000);
verbose("ID = 0x%08x\n", id);
verbose("--- done ---\n");
exit(1);
}
Debugger: Alex Taradov Generic CMSIS-DAP Adapter F4F56A90 v0.5 (SJ)
Clock frequency: 16.0 MHz
ID = 0x10016438
--- done ---
Info : CMSIS-DAP: SWD Supported
Info : CMSIS-DAP: JTAG Supported
Info : CMSIS-DAP: FW Version = v0.5
Info : CMSIS-DAP: Serial# = F4F56A90
Info : CMSIS-DAP: Interface Initialised (JTAG)
Info : SWCLK/TCK = 1 SWDIO/TMS = 1 TDI = 1 TDO = 1 nTRST = 0 nRESET = 1
Info : CMSIS-DAP: Interface ready
Info : clock speed 1000 kHz
Info : cmsis-dap JTAG TLR_RESET
Info : cmsis-dap JTAG TLR_RESET
Info : JTAG tap: stm32f3x.cpu tap/device found: 0x4ba00477 (mfg: 0x23b (ARM Ltd.), part: 0xba00, ver: 0x4)
Info : JTAG tap: stm32f3x.bs tap/device found: 0x06438041 (mfg: 0x020 (STMicroelectronics), part: 0x6438, ver: 0x0)
Info : stm32f3x.cpu: hardware has 6 breakpoints, 4 watchpoints
Info : starting gdb server for stm32f3x.cpu on 3333
Info : Listening on port 3333 for gdb connections
{name stm32f1x base 0 size 0 bus_width 0 chip_width 0}
0xe0042000: 10016438
Info : 134 3881 cmsis_dap.c:788 cmsis_dap_get_caps_info(): CMSIS-DAP: SWD Supported
Info : 135 3881 cmsis_dap.c:790 cmsis_dap_get_caps_info(): CMSIS-DAP: JTAG Supported
Info : 136 3882 cmsis_dap.c:768 cmsis_dap_get_version_info(): CMSIS-DAP: FW Version = 1.0
Info : 137 3886 cmsis_dap.c:881 cmsis_dap_swd_open(): CMSIS-DAP: Interface Initialised (SWD)
Debug: 138 3888 cmsis_dap.c:953 cmsis_dap_init(): CMSIS-DAP: Packet Size = 64
Debug: 139 3890 cmsis_dap.c:966 cmsis_dap_init(): CMSIS-DAP: Packet Count = 8
Debug: 140 3890 cmsis_dap.c:969 cmsis_dap_init(): Allocating FIFO for 3 pending packets
Info : 141 3892 cmsis_dap.c:809 cmsis_dap_get_status(): SWCLK/TCK = 1 SWDIO/TMS = 1 TDI = 1 TDO = 1 nTRST = 1 nRESET = 1
Info : 142 3902 cmsis_dap.c:1023 cmsis_dap_init(): CMSIS-DAP: Interface ready
$ ./edbg -b -t samv71 -r -f temp.bin
Error: invalid response received
Error: invalid response received
int dbg_dap_cmd(uint8_t *data, int resp_size, int req_size)
{
uint8_t cmd = data[0];
int res;
verbose("---\n");
for (int i = 0; i < 16; i++)
verbose("0x%02x ", data[i]);
verbose("\n");
memset(hid_buffer, 0xff, report_size + 1);
hid_buffer[0] = 0x00; // Report ID
memcpy(&hid_buffer[1], data, req_size);
res = write(debugger_fd, hid_buffer, report_size + 1);
if (res < 0)
perror_exit("debugger write()");
res = read(debugger_fd, hid_buffer, report_size + 1);
if (res < 0)
perror_exit("debugger read()");
check(res, "empty response received");
for (int i = 0; i < 16; i++)
verbose("0x%02x ", hid_buffer[i]);
verbose("\n");
check(hid_buffer[0] == cmd, "invalid response received");
res--;
memcpy(data, &hid_buffer[1], (resp_size < res) ? resp_size : res);
return res;
}
Unfortunately, OpenOCD just uses JTAG_SEQENCE for everything, including ARM specific stuff. So implementing the code for the JTAG transfers would be hard, as there is no software to test it.
Provide the complete output and we'll see what exactly fails, since it seems to work at least somewhat.
$ ./edbg -b -t samv71 -r -f temp.bin
---
0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0xf0 0x4f 0x00 0xe9 0xa4 0x7f 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
---
0x00 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xff 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
---
0x01 0x00 0x00 0xf0 0x09 0xb7 0xeb 0xfe 0x7f 0x00 0x00 0xb1 0x16 0x09 0x04 0x01
0xff 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
---
0x03 0xf0 0x09 0xb7 0xeb 0xfe 0x7f 0x00 0x00 0xb6 0x16 0x09 0x04 0x01 0x00 0x00
0xff 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Error: invalid response received