Author Topic: FPGA Memory arbitration (Read 4121 times)

kd5pev · « **on:** May 12, 2013, 12:38:38 am »

Hello.
I am working on a project where I need to interface a microcontroller (AT90USB1286) with an FPGA (Spartan 6 lx9).

I currently have the AVR connected to the FPGA via the AVR's XMEM interface.
Inside the FPGA, this is connected to a dual port 65536x8 blockram (64 KByte).

Now, I want the majority of this RAM to be left to the AVR for its own purposes -- heap, stack, etc.
But, I want the FPGA to be more than a glorified SRAM chip; I want there to be a few memory mapped peripherals made from FPGA fabric.

Attached to this post is a general picture of what I want to do: ~48Kbyte of RAM for the AVR, and then 4 peripherals of 4Kbyte each.
My question is: how do I arbitrate access to the shared RAM to the peripherals?

Rufus · « **Reply #1 on:** May 12, 2013, 01:09:50 am »

Quote from: kd5pev on May 12, 2013, 12:38:38 am

My question is: how do I arbitrate access to the shared RAM to the peripherals?

Don't think arbitrate is the right word but then not exactly clear what you are trying to do.

For what I think you are trying to do you just decode the AVR address bus providing a select signal for the RAM for addresses up to 48k and select signals for the 4k blocks above that. The select signals gate writes to the associated RAM or block and gate data onto the AVR data bus for reads.

kd5pev · « **Reply #2 on:** May 12, 2013, 01:18:42 am »

Thank you.
I think I was trying to make it more complex than necessary.
I tend to do that.

What I was thinking of doing was:

Code: [Select]

 
AVR <-> RAM <-> Black box <-> Peripheral 1 
                          <-> Peripheral 2
                          <-> Peripheral 3
                          <-> Peripheral 4

What I got from your post was this:

Code: [Select]

AVR <-> Address decoder <-> RAM
                        <-> Peripheral 1
                        <-> Peripheral 2
                        <-> Peripheral 3
                        <-> Peripheral 4

Is that a correct interpretation?
If so, I think an address decoder would be much simpler to build than what I was originally going after.

Rufus · « **Reply #3 on:** May 12, 2013, 02:08:00 am »

Code: [Select]

[quote author=kd5pev link=topic=16824.msg230961#msg230961 date=1368321522]
AVR <-> Address decoder <-> RAM
                        <-> Peripheral 1
                        <-> Peripheral 2
                        <-> Peripheral 3
                        <-> Peripheral 4

Is that a correct interpretation?
If so, I think an address decoder would be much simpler to build than what I was originally going after.
[/quote]

In simplistic schematic terms yes. It is just the way microprocessors have always managed memory and peripherals on external busses.

kd5pev · « **Reply #4 on:** May 12, 2013, 03:38:22 am »

Quote from: Rufus on May 12, 2013, 02:08:00 am

In simplistic schematic terms yes. It is just the way microprocessors have always managed memory and peripherals on external busses.

Okay, I think I understand that now.

Now this is more of a hypothetical question at this point:
How do I handle two (or more) components that want access to the same RAM module?

Code: [Select]

Component A <-> |
                 <=> [ ?? ] <=> RAM
Component B <-> |

marshallh · « **Reply #5 on:** May 12, 2013, 04:57:55 am »

arbiter is the correct term. I did a design that had 4 separate ddr controllers, and 6 client modules that wanted to use them (only 2 had global access)

first thing is to have combinational muxes for the inputs to the actual ddr controller proper (this is just part of the arbiter)

here is the set of muxes for the outputs to the client module

and finally a very simple FSM for each controller that handles requests in a prioritized manner

all these are utilizing a block based approach. also you will notice there are synchronizers, each client was in a separate clock domain. CL device is at the top of the chain so it gets first priority, being the most timing critical.

This is probably more info that you wanted but there you go. You have X physical resource and you need to have a system for acknowledging requests (whether its for 8 bits, a 128bit word, or a block transfer) and then signaling completion and giving priority to some requests over the others.

Something this design lacks is pre-emption of transfers. In this case the granularity of transfers was small enough (512 bytes) that this could be overlooked. But if you all share a single RAM and you have sometihng that MUST have data NOW, you can extend the FSM to save the transfer state and hang up the client while the most important one cleans up.

free_electron · « **Reply #6 on:** May 12, 2013, 02:06:38 pm »

You can do that perfectly fine. I use it all the time. Just instantiate dual port ram.

Port 1 goes to your processor. Port2 goes to your peripherals. The drawback is that you need to make a 'trap' meaning : a detector that sees that you just touched something in the top 4 k and tells the peripherals : there is new data.

I use dual ( and sometimes 3 port) memory as a passgate.

There is 2 blocks of 1 kbyte of dp ram. Cpu has read/ write on block 1 and read only on block 2.
Pc (through a usb controller) has read write on block 2 and read only on block 1.

So, the pc can see what the cpu is doing , and the cpu can see what the pc is doing.

Let's say you want to make a 'coprocessor'

Byte 1 and byte 2 of block 2 are data in. (Pc can readwrite, cpu can read)
Byte 1 and 2 of block 1 are data out( pc can read but cpu can readwrite)
Byte 1024 of block 2 is instruction (pc can readwrite)
Byte 1024 of block 1 is status (pc can read)

There is a 'trap' on both sides that detects writing to the top byte and provides an interrupt

So: the cpu code looks like this

Pragma location 1 volatile byte d0
Pragma location 2 volatile byte d1
Pragma location 1024 volatile byte opcode
Pragma location 1025 int result

Void interrupthandler(void) handles int1
Case opcode
0: result =d0+d1
1: result = d0-d1
2: result = d0*d1
Endcase

And so on.
The pc writes data in d0,d1 and opcode. The trap detects the write to opcode and kicks the interrupt pin of the cpu. The cpu calculates in place. The writing of result triggers the return trap kicking the interrupt to the pc.

All data is written directly in place. No need for io routines, no need for moving data, no need for printf scanf or any other time consuming stuff. Its even faster than dma ( during dma most cpu cores are stalled as the dma controller blocks the bus to do the transport. Some dma controllers have segmented busses but those are an exception in small microcontroller land)

I made a piece of software where is define in and outgoing data and it produces the header files for cpu and pc code with the correct mapping and variable names.

Works like a charm. I do this 3 way. Pc, arm, fpga logic.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: FPGA Memory arbitration (Read 4121 times)

kd5pev

FPGA Memory arbitration

Rufus

Re: FPGA Memory arbitration

kd5pev

Re: FPGA Memory arbitration

Rufus

Re: FPGA Memory arbitration

kd5pev

Re: FPGA Memory arbitration

marshallh

Re: FPGA Memory arbitration

free_electron

Re: FPGA Memory arbitration

Share me