I was thinking about including a softcore (this one specifically)That guy is sick with a terminal case of NIH syndrome (had some run-ins with him on other forums where he heavily promotes his stuff), so I would recommend to go with something more established, like RISC-V, as this way you can utilize gcc/binutils toolchain for writing software in C instead of using some custom stuff. You can find some really small cores for RV32I, or you can even implement your own - it's very simple because this instruction set only contains around 40 opcodes.
I came across the linked project above which seems to support FAT16/32 and up to SDHC v2 cards.Judging by a lot of Chiness on the project's home page, you are probably going to have to dig on your own with little to no docs while integrating it into your project.
FAT16 is obsolete now and I don't think it's even used anywhere these days, so I wouldn't waste my time supporting it. But FAT32 and exFAT are definitely worth supporting, even if there is a clear trend for latter to largely replace the former. That said, since you control what kind of card is to be used, you can go either way, or even limit support to just a single one if you wish.
FAT systems are fairly simple, if you understand how linked lists work then you will feel right at home with it as it's just a bunch of linked lists.
FAT32 will go up to 4GB,
Linked to another recent thread... if you want to support FAT, then I suggest just using FatFs. It's become fairly good and it just works. It's written in portable C. http://elm-chan.org/fsw/ff/00index_e.html
Of course you'll still have to implement the low-level access to SD cards. I have the SD spec (been working on that lately), as far as I've read in it, access in SPI mode is required to be available in all card types except SDUC cards. SDUC cards are those with higher capacity than 2TB - I doubt you'll be using them for a system such as this one, and also given their current price. But be aware that the SPI mode only gives access to a small subset of the SD commands, and while you can absolutely use it to read and write the card and implement a file system, it's pretty inefficient. The end result would be pretty slow (although, compared to typical storage available at the time for 8-bit computers, that'd still be pretty "fast" ).
As a thought, I find it kinda odd to design a relatively complex and powerful system (graphics controller, storage management) - using subsystems (such as a 32-bit soft core) that are much more powerful than your main CPU. Interesting, certainly, but an odd project in my book.
FAT32 will go up to 4GB,
No, no, ... no.
FAT32 can support partitions up to *2 TB* (with a cluster size of 64 KB.)
The limitation is for file size. Max file size in FAT32 is 4 GB. Not at all the partition size. So unless you're going to write individual files that are larger than 4 GB (which would definitely look odd on an 8-bit system), you may never have to bother.
That said, if you use FatFs as I suggested, the library supports FAT12, 16, 32 and exFAT, so you'd be all covered.
Will take a look at RISC-V instead then. Thinking outside the box for a moment, if I'm going to be implementing a RISC softcore in the FPGA, how complicated would it be to incorporate a USB HID controller stack within it as well? Would that be doable, with my limited expertise and experience, I wonder?
Oh, you don't speak Mandarin? Seriously though, at first glance the project doesn't appear to be too complicated and I should be able to get my head around it - there's a couple of examples showing how to read a file by name and read a sector, so it's just a small step further to write to both as the interface will have to write to the SD card (commands, at least) to read anything from it. At the moment I see more trouble working out how to get the data from the SD interface to the buffer (i.e. where to place the buffer, how to access it etc).
Oh, you don't speak Mandarin? Seriously though, at first glance the project doesn't appear to be too complicated and I should be able to get my head around it - there's a couple of examples showing how to read a file by name and read a sector, so it's just a small step further to write to both as the interface will have to write to the SD card (commands, at least) to read anything from it. At the moment I see more trouble working out how to get the data from the SD interface to the buffer (i.e. where to place the buffer, how to access it etc).I'm not saying you shouldn't use it - I'm just warning that it might not be as easy as it sounds. At the end of the day, SDIO is not a very complicated protocol, so I think you should be fine.
The 512byte, or even 1kbyte should be in a single 1kbyte dual ram block.
Once filled, you can copy to and from DDR3 in 128 bit chunks. This will offer maximum speed.
You can also go direct to and from DDR3, but operating in 8 bit mode means a slower transfer to DDR3.
It is up to you. The easy part about going to and from DDR3 is that you get the multiport which is shared with everything else and you get 1 less step in complexity.
Note that a single M9K block is 1 kilobyte. If you define anything smaller, the compiler will still eat the 1kbyte or 9kilobit anyways. For larger sizes, it will eat chunks in multiples of 2.
All the access to/from the SD card is via 512-byte blocks, so the granularity of 8-bit mode for the DDR read/writes would be unnecessary. For a read, couldn't I just stack up 16 bytes from the SD card then push them directly into DDR3 in one transaction? No need to do 8-bit transfers that way, could just write 128 bits with each transaction to the DDR3 for maximum efficiency?
I guess writes to the SD card will need that 16-byte buffer to hold the data from DDR3 whilst it's written. I need to read up on the writing process before I go too deep in planning it, but I'm guessing there's no issue making the write process wait a few clocks whilst another 16 bytes are retrieved from DDR3 to fill the buffer again? Or I could make a 32-byte buffer and fill one half while the other is being read?
LOL, the SD looks so god damn slow compared to the DDR3.
LOL, the SD looks so god damn slow compared to the DDR3. I wont waste my time doubling the width of my above illustration as it already is ridiculous enough but true for a 50mb/sec SD card.
I still think the dual port ram block as a 512 byte or 1024 byte buffer is the best way to go. We do not want to generate the plethora of activate and precharges with all their associated delays in between due to other access cycles where the SD card routine will need to pause and wait after every 16 bytes which my inner DDR3 cache will accumulate the data, then burst out the response. Do the transfer in a single shot.
Just a quick note: getting a sustained 50 MBytes/s read with an SD Card will be challenging and will be doable only with the fastest cards. (And if you add FAT handling on top of that, it's probably going to be all the much harder.) One thing for sure (can say because as I mentioned I'm working with SD cards lately): you'll need to switch the card to 1.8V mode (not all cards support it) and then one of the highest clock rates supported. The fastest you can get in SDIO mode, 4-bit, at 3.3V is 50 MHz - which will give you a max throughput of 25 MB/s minus any overhead. And supporting 1.8V mode is not as trivial - your circuit needs to support powering SD cards at both 3.3V and 1.8V, and be able to switch between them. Makes the hardware more involved. Oh and the initialization phase for an SD card is not trivial either - do not expect implementing all commands and init sequences purely in HDL - that would be pure madness. That part too will need to be done in software. Only the low-level part of the SDIO bus can be handled reasonably in HDL.
LOL, the SD looks so god damn slow compared to the DDR3. I wont waste my time doubling the width of my above illustration as it already is ridiculous enough but true for a 50mb/sec SD card.
I still think the dual port ram block as a 512 byte or 1024 byte buffer is the best way to go. We do not want to generate the plethora of activate and precharges with all their associated delays in between due to other access cycles where the SD card routine will need to pause and wait after every 16 bytes which my inner DDR3 cache will accumulate the data, then burst out the response. Do the transfer in a single shot.
Crikey - it's not until I see a visual representation that the difference in clock speeds makes sense. Okay, no worries, I'll set up a dual-port 1KB (is that enough?) block RAM buffer in the FPGA, then transfer data from that buffer one block (512 bytes) at a time to the DDR3.
Just a quick note: getting a sustained 50 MBytes/s read with an SD Card will be challenging and will be doable only with the fastest cards. (And if you add FAT handling on top of that, it's probably going to be all the much harder.) One thing for sure (can say because as I mentioned I'm working with SD cards lately): you'll need to switch the card to 1.8V mode (not all cards support it) and then one of the highest clock rates supported. The fastest you can get in SDIO mode, 4-bit, at 3.3V is 50 MHz - which will give you a max throughput of 25 MB/s minus any overhead. And supporting 1.8V mode is not as trivial - your circuit needs to support powering SD cards at both 3.3V and 1.8V, and be able to switch between them. Makes the hardware more involved. Oh and the initialization phase for an SD card is not trivial either - do not expect implementing all commands and init sequences purely in HDL - that would be pure madness. That part too will need to be done in software. Only the low-level part of the SDIO bus can be handled reasonably in HDL.
Yeah, I'm happy with 25MB/sec. I haven't got any justification for all the added complexity trying to squeeze every last drop of speed out of the SD interface - even a 12.5MHz SPI connection would be vastly faster than any 'historically accurate' storage device for the Z80, let alone a 25MHz 4-bit SDIO interface. The biggest file I have in CP/M currently sits at just over 16KB.
Here's a meta-question, though - should I move this line of discussion regarding setting up an SD interface to another thread? It IS relevant to this one in that there's discussion around using the DDR3 and it's housed in the same FPGA GPU project, but that's about it.
Use the megafunction tool to test generate a dual port, dual clock ram. It will tell you how many M9K blocks will be used. I think no matter what you choose, you may get stuck with 4 as a minimum because of the 128 bit wide side B. Dual clock just in case as you can always just tie the 2 clocks together. Dual port with each being a read & write port. Side A should be 4 or 8 bit for the SD-Card and side B should be 128bit for the DDR3. I would at least choose 512bytes worth, but if the minimum M9K count is 4, actually choosing a 4kbyte buffer will still use the same amount of M9K blocks. Only go up to 4K if you can make use of it, for example, transfer 8 consecutive 512 byte blocks, otherwise, there is no plus in doing so. I don't know much about FAT32.
Use the megafunction tool to test generate a dual port, dual clock ram. It will tell you how many M9K blocks will be used. I think no matter what you choose, you may get stuck with 4 as a minimum because of the 128 bit wide side B. Dual clock just in case as you can always just tie the 2 clocks together. Dual port with each being a read & write port. Side A should be 4 or 8 bit for the SD-Card and side B should be 128bit for the DDR3. I would at least choose 512bytes worth, but if the minimum M9K count is 4, actually choosing a 4kbyte buffer will still use the same amount of M9K blocks. Only go up to 4K if you can make use of it, for example, transfer 8 consecutive 512 byte blocks, otherwise, there is no plus in doing so. I don't know much about FAT32.
Here's what I've produced with the megafunction in Quartus. Hopefully it's not far from the mark. Takes up 8 M9K blocks, apparently. If it's okay, I'll tidy it up tomorrow and have a think about how I'm going to connect it to the SDInterface module.
Here's what I've produced with the megafunction in Quartus. Hopefully it's not far from the mark. Takes up 8 M9K blocks, apparently. If it's okay, I'll tidy it up tomorrow and have a think about how I'm going to connect it to the SDInterface module.
Looks ok. Only 1 feature is not needed: NEW_DATA_WITH_NBE_READ
Check the megafunction for read during write = new data. We only require 'Don't Care' or 'old data'. What's going on here is that you have instructed the compiler to make sure if there is a collision, where you simultaneously write to the same location during a read on the second port at that location, the compiler adds extra logic outside of the M9K block to pass the new data through instantly on the same clock. We don't need this as that delay is only 2 clock cycles max and we wont be writing to 1 location simultaneously reading the same ram byte of the second port side. It a waste of gates, albeit a small amount, it still has a cost associated with the feature though I doubt we will ever reach that point are we are running the DP ram at 100MHz, not the top end 300MHz.
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
I'm a little confused about the address width in the dual_port_block_cache.v for some reason (line 56). It's set up with a 7-bit address bus for port a, which is only 128 bytes? Have I made a mistake in the setup here, or am I just misunderstanding the values?
Port A should have a 10bit address if you are reserving 1kbyte and have it set to 8 bit. Maybe just a typo when using the megafunction.
...
Double check the M9K usage, they are precious as we need just as many for each maggie layer. So, only reserve the number of KB which offers the minimum M9K size.
Also, check the erase block size. I don't know how big it is, but, to edit data within a block, if I remember correctly, you need to read that block, edit the bytes you want to change inside that read buffer, then erase that block, then, write that block with your edited buffer.
For the DDR3 interface, you will use the CMD_CLK. It is 100MHz. Check the SD controller, it may be written in a way where you can use the 100MHz and it will divide the output SD clock for you. Otherwise, you will be stuck with using a PLL or CLK_IN which will generate timing report errors and we will need to fix the .sdc file to fix those.
The project should now read a sector/512 bytes from an SD card and write it to the cache RAM. RD_RDY goes high when the data has been read (and written to the cache RAM as that happens automatically). Now I just need to write those 512 bytes to DDR3, so I'm going to need...... and a little later...
- wr_ena(x)
- (PORT_ADDR_SIZE)'(addr(x))
- PORT_CACHE_BITS)'(wdata(x))
.. is that right? x and y will be the port numbers... x=3 and y=5?
- read_req[y]
- read_ready[y]
- (128)'read_data[y]
See if you can find a 'micro SD card verilog simulation model' like I have been using Mricon's DDR3 model to test my controller. This way, you can simulate your interface design with a virtual SD-Card in modelsim first.