Im new so the following will likely be wrong is so many ways.
I see a lot going on here, so I want to try to unroll as much as needed to satisfy the itch in the back of my brain.
ya there is a clock domain crossing, but that is easy to spot and needs fixing, but does nothing to explain the bit errors.
I think the problem is more complex. the pattern after 0x200000 has my full attention! and is a red flag for a timing violation.
reg [63:0] data;
......
assign miso = data[bit_to_send];
I think this line is basically guaranteed to generate a timing violation due to how slow this operation will be.
I also think this violation will not show up on any timing report because the destination of this signal is not a flip flop.
When I see this, i see a 64:1 multiplexer being inferred.
A quick google finds
this datasheet which says you don't have mux hardware, but 4bit LUTs.
which can be used to make a 2:1 mux, and if you connect 63 of them together into a 6 layer deep nary tree, you will get the 64:1 mux needed.
This is only part of the problem.
I can't see your top module, clks, constrants, simulation, so im going to make a few generic assumptions.
First is that your logic is good. and simulations perfect.
Im not sure what spi mode zero is off the top of my head, I think its sample on the rising edge. but put that to the side for now. lets look at the tripple O's (order of operation).
always @(negedge clk or posedge cs) begin
if (cs) begin
...
end else begin
bit_to_send <= bit_to_send - 1;
end
end
on the negedge of the clock, update `bit_to_send`.
`bit_to_send` then begins propagating through the 64:1 mux.
so if this is the critical path, it will look alot like
data_d > net > lut0 > net > lut1 > net > lut2 > net > lut3 > net > lut4 > net > lut5 > net > miso
Now the spec.
analog.com defines spi mode 0 as
clock Polarity in Idle State - Logic low
Clock Phase Used to Sample and/or Shift the Data - Data sampled on rising edge and shifted out on the falling edge
ok, so we have the tripple O's and spec. what does not match.
ooo says on falling edge, we update bit_to_send and shift out the data. but with a huge delay.
So I think the root of the error is the amount of time it takes to propagate the 64:1 mux.
Quick experiment. register the output and trade the mux for a shift register.
you will have to figure out some new always blocks to handle all of this, your only assigning data at the falling edge of ce.
maybe something like
assign miso = data[63];
....
always @(negedge cs or negedge clk) begin
if( /*cs edge detector magic*/ ) begin
data <= value;
end else begin
data <= data << 1;
end
end
that should remove the 64:1 mux and replace it with a 2:1 mux, that assigns the value to 'data'.
input 0 would be the 'value' port and input 1 would be the (data << 1 ).