Well, I'm running into the first problem. I'm using CD74HCT393 to generate increasing 8-bit pattern. Given that the counter is slow, I also slowed down the system clock to 8 MHz using the lowest possible PLL output and the highest possible PLL divider. But I kept all other bus prescalers the same, so overall timing relationships did not change.
As expected, I get 4 MHz clock on the timer output. With this clock DMA struggles to receive the data, some values are just missing. Here is an example capture:
01 03 04 07 09 0a 0d 0f 10 13 15 16 19 1b 1c 1f 21 22 25 27 28 2b 2d 2e 31 33 34 37 39
This is not the counter problem. I verified that by switching the clock back and manually toggling the clock pin and capturing the data. In this mode the capture works reliably up to 6.4 MHz. And at that point propagation delays within the counter start to be too big, and the MSB gets corrupted. So 6.4 MHz is realistic highest clock for CD74HCT393 in 8-bit mode.
4 MHz is well below that. And the failure does not really show the bit sampling issues, but more capture or bandwidth issues.
So this is the first problem to be solved.
EDIT 1: BTW, pin write on this device is truly one cycle. The following code outputs 4 MHz with CPU running at 8 MHz.
while (1)
{
HAL_GPIO_CLK_A_set();
HAL_GPIO_CLK_A_clr();
......... same stuff many times
HAL_GPIO_CLK_A_set();
HAL_GPIO_CLK_A_clr();
}
EDIT 2: And if you divide the timer output by 3 to get 8/3=2.667 MHz, every other sample is missing:
18 1a 1c 1e 20 22 24 26 28 2a 2c 2e 30 32 34 36 38 3a 3c 3e 40 42 44 46 48 4a 4c 4e 50
Only dividing by 4 produces the correct result.
So it looks like something else needs to be overclocked as well.