FPGA VGA Controller for 8-bit computer

#2975 Reply
Posted by SiliconWizard on 24 Nov, 2021 18:25
Linked to another recent thread... if you want to support FAT, then I suggest just using FatFs. It's become fairly good and it just works. It's written in portable C. http://elm-chan.org/fsw/ff/00index_e.html

Of course you'll still have to implement the low-level access to SD cards. I have the SD spec (been working on that lately), as far as I've read in it, access in SPI mode is required to be available in all card types except SDUC cards. SDUC cards are those with higher capacity than 2TB - I doubt you'll be using them for a system such as this one, and also given their current price. But be aware that the SPI mode only gives access to a small subset of the SD commands, and while you can absolutely use it to read and write the card and implement a file system, it's pretty inefficient. The end result would be pretty slow (although, compared to typical storage available at the time for 8-bit computers, that'd still be pretty "fast" ).

As a thought, I find it kinda odd to design a relatively complex and powerful system (graphics controller, storage management) - using subsystems (such as a 32-bit soft core) that are much more powerful than your main CPU. Interesting, certainly, but an odd project in my book.

#2976 Reply
Posted by nockieboy on 24 Nov, 2021 19:38
Quote from: asmi on 24 Nov, 2021 17:58
Quote from: nockieboy on 24 Nov, 2021 16:43
I was thinking about including a softcore (this one specifically)
That guy is sick with a terminal case of NIH syndrome (had some run-ins with him on other forums where he heavily promotes his stuff), so I would recommend to go with something more established, like RISC-V, as this way you can utilize gcc/binutils toolchain for writing software in C instead of using some custom stuff. You can find some really small cores for RV32I, or you can even implement your own - it's very simple because this instruction set only contains around 40 opcodes.

Well, if nothing else I've learned a new term today - never heard of NIH syndrome before! Thanks for the heads-up, though. Will take a look at RISC-V instead then. Thinking outside the box for a moment, if I'm going to be implementing a RISC softcore in the FPGA, how complicated would it be to incorporate a USB HID controller stack within it as well? Would that be doable, with my limited expertise and experience, I wonder?

Quote from: asmi on 24 Nov, 2021 17:58
Quote from: nockieboy on 24 Nov, 2021 16:43
I came across the linked project above which seems to support FAT16/32 and up to SDHC v2 cards.
Judging by a lot of Chiness on the project's home page, you are probably going to have to dig on your own with little to no docs while integrating it into your project.

Oh, you don't speak Mandarin? Seriously though, at first glance the project doesn't appear to be too complicated and I should be able to get my head around it - there's a couple of examples showing how to read a file by name and read a sector, so it's just a small step further to write to both as the interface will have to write to the SD card (commands, at least) to read anything from it. At the moment I see more trouble working out how to get the data from the SD interface to the buffer (i.e. where to place the buffer, how to access it etc).

Quote from: asmi on 24 Nov, 2021 17:58
FAT16 is obsolete now and I don't think it's even used anywhere these days, so I wouldn't waste my time supporting it. But FAT32 and exFAT are definitely worth supporting, even if there is a clear trend for latter to largely replace the former. That said, since you control what kind of card is to be used, you can go either way, or even limit support to just a single one if you wish.
FAT systems are fairly simple, if you understand how linked lists work then you will feel right at home with it as it's just a bunch of linked lists.

FAT32 will go up to 4GB, but like you said previously, those size SD cards are only going to get less and less available as time goes by and technology marches on. I always have the option of ignoring the FAT16/32/exFAT format altogether and just going to the metal and reading/writing a proprietary format at the sector level like my uCOM does currently on the CF card, but I had no hand in the creation of that system (thanks to Grant Searle and his work there), so at best I'd just be copying and tweaking what he's done there and applying it to the SD card format. Whether or not that would work is a question I don't have enough knowledge to answer at this point, but in any case, I want to go the exFAT/FAT32 route if at all possible.

#2977 Reply
Posted by SiliconWizard on 24 Nov, 2021 19:57
Quote from: nockieboy on 24 Nov, 2021 19:38
FAT32 will go up to 4GB,

No, no, ... no.

FAT32 can support partitions up to *2 TB* (with a cluster size of 64 KB.)
The limitation is for file size. Max file size in FAT32 is 4 GB. Not at all the partition size. So unless you're going to write individual files that are larger than 4 GB (which would definitely look odd on an 8-bit system), you may never have to bother.

That said, if you use FatFs as I suggested, the library supports FAT12, 16, 32 and exFAT, so you'd be all covered.

#2978 Reply
Posted by nockieboy on 24 Nov, 2021 20:02
Quote from: SiliconWizard on 24 Nov, 2021 18:25
Linked to another recent thread... if you want to support FAT, then I suggest just using FatFs. It's become fairly good and it just works. It's written in portable C. http://elm-chan.org/fsw/ff/00index_e.html

Ah, thanks SiliconWizard, I'll take a look at that later. The only requirement I have for supporting any of the FAT varieties is that I can access it from Windows 10 with no middle-man software, otherwise I might as well not bother, but FATfs should be okay I guess.

Quote from: SiliconWizard on 24 Nov, 2021 18:25
Of course you'll still have to implement the low-level access to SD cards. I have the SD spec (been working on that lately), as far as I've read in it, access in SPI mode is required to be available in all card types except SDUC cards. SDUC cards are those with higher capacity than 2TB - I doubt you'll be using them for a system such as this one, and also given their current price. But be aware that the SPI mode only gives access to a small subset of the SD commands, and while you can absolutely use it to read and write the card and implement a file system, it's pretty inefficient. The end result would be pretty slow (although, compared to typical storage available at the time for 8-bit computers, that'd still be pretty "fast" ).

I'm not looking to implement an SPI interface to the SD card - I'm going full 4-bit SDIO for the reasons you've identified.

Quote from: SiliconWizard on 24 Nov, 2021 18:25
As a thought, I find it kinda odd to design a relatively complex and powerful system (graphics controller, storage management) - using subsystems (such as a 32-bit soft core) that are much more powerful than your main CPU. Interesting, certainly, but an odd project in my book.

Well, you have to remember that this project is a bit of a Frankenstein's monster. It started out with me learning about electronics whilst building a simple 7-chip Z80 computer on a breadboard. I've wanted to push my limits constantly since starting the project and, in a great example of feature creep (or project bloat, if you prefer), I now have a stackable, modular Z80 computer made with SMD components on custom PCBs and with up to 4MB of memory running CP/M from a CF card and with a hardware-accelerated GPU providing further IO opportunities (like PS/2, USB HID, HDMI output, etc). It certainly isn't a well-planned project, just a hobby that I'm following to its conclusion (when the help runs out or I just can't progress any further).

The features I'm adding to the FPGA are for my uCOM, but equally can apply to anyone else wanting to add one of these FPGA GPUs to their DIY system - be it an 8-bit system or something beefier. I'd certainly like to move on to the Motorola 68000-series when I'm done with this Z80.

Quote from: SiliconWizard on 24 Nov, 2021 19:57
Quote from: nockieboy on 24 Nov, 2021 19:38
FAT32 will go up to 4GB,

No, no, ... no.

FAT32 can support partitions up to *2 TB* (with a cluster size of 64 KB.)

Ah - my misunderstanding there. Okay, so FAT32 can be used on SD cards up to 2 TB in size?! That makes the argument for supporting exFAT or FATfs less compelling in my opinion, but only if the complexity of supporting those more modern filesystems is significant, I suppose.

Quote from: SiliconWizard on 24 Nov, 2021 19:57
The limitation is for file size. Max file size in FAT32 is 4 GB. Not at all the partition size. So unless you're going to write individual files that are larger than 4 GB (which would definitely look odd on an 8-bit system), you may never have to bother.

That said, if you use FatFs as I suggested, the library supports FAT12, 16, 32 and exFAT, so you'd be all covered.

Indeed - it does weaken the argument for messing around trying to support greater than FAT32. What I might do is stick to FAT32 initially, then upgrade to FATfs once I know more about what I'm doing.

#2979 Reply
Posted by SiliconWizard on 24 Nov, 2021 20:23
From a couple comments you made on FatFs, I'm under the impression you may have misunderstood what it was.
FatFs is a portable library written in C allowing to access FAT partitions. It's not a filesystem in itself. It's just a well tested library that is widely used on embedded systems to support FAT/16/32/exFAT. So instead of implementing FAT access yourself, you can use this nice library. It supports all FAT versions so you can use it with disks formatted in any of them.

#2980 Reply
Posted by asmi on 24 Nov, 2021 20:55
Quote from: nockieboy on 24 Nov, 2021 19:38
Will take a look at RISC-V instead then. Thinking outside the box for a moment, if I'm going to be implementing a RISC softcore in the FPGA, how complicated would it be to incorporate a USB HID controller stack within it as well? Would that be doable, with my limited expertise and experience, I wonder?
Everything is doable with enough effort USB stack leans heavily on a software side, but if you are not afraid to dive head-first into USB specs than - sure. You will learn a lot during this, that's for sure. And USB HID is one of the simplest flavors of USB. Hardware-wise you will only need to implement a USB ULPI PHY interface (to external USB PHY) and a DMA engine to transfer data back and forth. You can skim through ECHI specification to give you an idea, but keep in mind that this spec is for fully-featured USB 2.0 host controller implementation used on PCs, so you can likely cut many corners to simplify things.

Quote from: nockieboy on 24 Nov, 2021 19:38
Oh, you don't speak Mandarin? Seriously though, at first glance the project doesn't appear to be too complicated and I should be able to get my head around it - there's a couple of examples showing how to read a file by name and read a sector, so it's just a small step further to write to both as the interface will have to write to the SD card (commands, at least) to read anything from it. At the moment I see more trouble working out how to get the data from the SD interface to the buffer (i.e. where to place the buffer, how to access it etc).
I'm not saying you shouldn't use it - I'm just warning that it might not be as easy as it sounds. At the end of the day, SDIO is not a very complicated protocol, so I think you should be fine.

#2981 Reply
Posted by nockieboy on 25 Nov, 2021 22:09
Quote from: asmi on 24 Nov, 2021 20:55
Quote from: nockieboy on 24 Nov, 2021 19:38
Oh, you don't speak Mandarin? Seriously though, at first glance the project doesn't appear to be too complicated and I should be able to get my head around it - there's a couple of examples showing how to read a file by name and read a sector, so it's just a small step further to write to both as the interface will have to write to the SD card (commands, at least) to read anything from it. At the moment I see more trouble working out how to get the data from the SD interface to the buffer (i.e. where to place the buffer, how to access it etc).
I'm not saying you shouldn't use it - I'm just warning that it might not be as easy as it sounds. At the end of the day, SDIO is not a very complicated protocol, so I think you should be fine.

I've had a closer look at the HDL today and there's nothing in the demo code or supplied modules to cater for writing to the SD card, so I'm going to have to modify the state machine to allow for writes as well. I've also got to add in direction controls for the level converter on the DECA board. The project I'm using as a template makes no consideration for anything but reading from the SD card, so there's no direction control for the CMD line or DAT0 and DAT1-3 lines. I'm hoping to have this done by the end of play tomorrow (or maybe over the weekend if I'm too busy tomorrow), but don't foresee any major issues just yet. (Hoping that's not famous last words...)

#2982 Reply
Posted by BrianHG on 25 Nov, 2021 23:10
Quote from: nockieboy on 25 Nov, 2021 22:37
Quote from: BrianHG on 25 Nov, 2021 22:10
The 512byte, or even 1kbyte should be in a single 1kbyte dual ram block.
Once filled, you can copy to and from DDR3 in 128 bit chunks. This will offer maximum speed.

You can also go direct to and from DDR3, but operating in 8 bit mode means a slower transfer to DDR3.
It is up to you. The easy part about going to and from DDR3 is that you get the multiport which is shared with everything else and you get 1 less step in complexity.

Note that a single M9K block is 1 kilobyte. If you define anything smaller, the compiler will still eat the 1kbyte or 9kilobit anyways. For larger sizes, it will eat chunks in multiples of 2.

All the access to/from the SD card is via 512-byte blocks, so the granularity of 8-bit mode for the DDR read/writes would be unnecessary. For a read, couldn't I just stack up 16 bytes from the SD card then push them directly into DDR3 in one transaction? No need to do 8-bit transfers that way, could just write 128 bits with each transaction to the DDR3 for maximum efficiency?

I guess writes to the SD card will need that 16-byte buffer to hold the data from DDR3 whilst it's written. I need to read up on the writing process before I go too deep in planning it, but I'm guessing there's no issue making the write process wait a few clocks whilst another 16 bytes are retrieved from DDR3 to fill the buffer again? Or I could make a 32-byte buffer and fill one half while the other is being read?

Ok, before I answer your question, how fast does the SD card read at in megabytes per second?

#2983 Reply
Posted by BrianHG on 26 Nov, 2021 00:20
Look at the attached illustration. You will need to open it in a normal window and show at 100% size, and scroll to see what is going on...

I assumed that your SD card was capable of reading at 50 Megabytes per second. The simulation timing is accurate. Both the upper and lower sims transfer 512 bytes.

Sending 128bit chunks or 8 bit chunks wont make a difference as my controller already stacks the source 8bit read/writes into it's smart cache. It is the overall speed of the SD card which is the bottleneck as seen in my top illustration.

Discuss with asmi & SiliconWizard the ramifications of selecting the 2 possible paths.

Path #1 (top in illustration) allows other commands in-between to change memory rows allowing numerous slow-down and penalties at multiple conjunctures as you will have up to 16 video layers being drawn on the display, a geometry unit drawing things, a Z80 reading and writing data at it's slow pace as we reserved this one case knowing it is a clunker speedwise, and an audio system.

Path #2 crunches everything into a single point smaller in size than a single Z80 read op-code instruction. Also, when readin & writing the 512bytes from the SD-Card, (because of the M9K structure, you will have up to 4k instead of 512bytes), reading and writing can be done directly into the dual port ram without any individual wait states.

Note that with a 25 megabyte/sec SD card, the top #1 chart would be twice as wide with tiny DDR3 bursts being spaced at twice the distance from each other.

#2984 Reply
Posted by SiliconWizard on 26 Nov, 2021 00:50
Just a quick note: getting a sustained 50 MBytes/s read with an SD Card will be challenging and will be doable only with the fastest cards. (And if you add FAT handling on top of that, it's probably going to be all the much harder.) One thing for sure (can say because as I mentioned I'm working with SD cards lately): you'll need to switch the card to 1.8V mode (not all cards support it) and then one of the highest clock rates supported. The fastest you can get in SDIO mode, 4-bit, at 3.3V is 50 MHz - which will give you a max throughput of 25 MB/s minus any overhead. And supporting 1.8V mode is not as trivial - your circuit needs to support powering SD cards at both 3.3V and 1.8V, and be able to switch between them. Makes the hardware more involved. Oh and the initialization phase for an SD card is not trivial either - do not expect implementing all commands and init sequences purely in HDL - that would be pure madness. That part too will need to be done in software. Only the low-level part of the SDIO bus can be handled reasonably in HDL.

#2985 Reply
Posted by BrianHG on 26 Nov, 2021 00:59
LOL, the SD looks so god damn slow compared to the DDR3. I wont waste my time doubling the width of my above illustration as it already is ridiculous enough but true for a 50mb/sec SD card.

I still think the dual port ram block as a 512 byte or 1024 byte buffer is the best way to go. We do not want to generate the plethora of activate and precharges with all their associated delays in between due to other access cycles where the SD card routine will need to pause and wait after every 16 bytes which my inner DDR3 cache will accumulate the data, then burst out the response. Do the transfer in a single shot.

#2986 Reply
Posted by SiliconWizard on 26 Nov, 2021 01:03
Quote from: BrianHG on 26 Nov, 2021 00:59
LOL, the SD looks so god damn slow compared to the DDR3.

Definitely.

#2987 Reply
Posted by nockieboy on 26 Nov, 2021 08:55
Quote from: BrianHG on 26 Nov, 2021 00:59
LOL, the SD looks so god damn slow compared to the DDR3. I wont waste my time doubling the width of my above illustration as it already is ridiculous enough but true for a 50mb/sec SD card.

I still think the dual port ram block as a 512 byte or 1024 byte buffer is the best way to go. We do not want to generate the plethora of activate and precharges with all their associated delays in between due to other access cycles where the SD card routine will need to pause and wait after every 16 bytes which my inner DDR3 cache will accumulate the data, then burst out the response. Do the transfer in a single shot.

Crikey - it's not until I see a visual representation that the difference in clock speeds makes sense. Okay, no worries, I'll set up a dual-port 1KB (is that enough?) block RAM buffer in the FPGA, then transfer data from that buffer one block (512 bytes) at a time to the DDR3.

Quote from: SiliconWizard on 26 Nov, 2021 00:50
Just a quick note: getting a sustained 50 MBytes/s read with an SD Card will be challenging and will be doable only with the fastest cards. (And if you add FAT handling on top of that, it's probably going to be all the much harder.) One thing for sure (can say because as I mentioned I'm working with SD cards lately): you'll need to switch the card to 1.8V mode (not all cards support it) and then one of the highest clock rates supported. The fastest you can get in SDIO mode, 4-bit, at 3.3V is 50 MHz - which will give you a max throughput of 25 MB/s minus any overhead. And supporting 1.8V mode is not as trivial - your circuit needs to support powering SD cards at both 3.3V and 1.8V, and be able to switch between them. Makes the hardware more involved. Oh and the initialization phase for an SD card is not trivial either - do not expect implementing all commands and init sequences purely in HDL - that would be pure madness. That part too will need to be done in software. Only the low-level part of the SDIO bus can be handled reasonably in HDL.

Yeah, I'm happy with 25MB/sec. I haven't got any justification for all the added complexity trying to squeeze every last drop of speed out of the SD interface - even a 12.5MHz SPI connection would be vastly faster than any 'historically accurate' storage device for the Z80, let alone a 25MHz 4-bit SDIO interface. The biggest file I have in CP/M currently sits at just over 16KB.

Here's a meta-question, though - should I move this line of discussion regarding setting up an SD interface to another thread? It IS relevant to this one in that there's discussion around using the DDR3 and it's housed in the same FPGA GPU project, but that's about it.

#2988 Reply
Posted by BrianHG on 26 Nov, 2021 09:11
Quote from: nockieboy on 26 Nov, 2021 08:55
Quote from: BrianHG on 26 Nov, 2021 00:59
LOL, the SD looks so god damn slow compared to the DDR3. I wont waste my time doubling the width of my above illustration as it already is ridiculous enough but true for a 50mb/sec SD card.

I still think the dual port ram block as a 512 byte or 1024 byte buffer is the best way to go. We do not want to generate the plethora of activate and precharges with all their associated delays in between due to other access cycles where the SD card routine will need to pause and wait after every 16 bytes which my inner DDR3 cache will accumulate the data, then burst out the response. Do the transfer in a single shot.

Crikey - it's not until I see a visual representation that the difference in clock speeds makes sense. Okay, no worries, I'll set up a dual-port 1KB (is that enough?) block RAM buffer in the FPGA, then transfer data from that buffer one block (512 bytes) at a time to the DDR3.

Use the megafunction tool to test generate a dual port, dual clock ram. It will tell you how many M9K blocks will be used. I think no matter what you choose, you may get stuck with 4 as a minimum because of the 128 bit wide side B. Dual clock just in case as you can always just tie the 2 clocks together. Dual port with each being a read & write port. Side A should be 4 or 8 bit for the SD-Card and side B should be 128bit for the DDR3. I would at least choose 512bytes worth, but if the minimum M9K count is 4, actually choosing a 4kbyte buffer will still use the same amount of M9K blocks. Only go up to 4K if you can make use of it, for example, transfer 8 consecutive 512 byte blocks, otherwise, there is no plus in doing so. I don't know much about FAT32.
Quote

Quote from: SiliconWizard on 26 Nov, 2021 00:50
Just a quick note: getting a sustained 50 MBytes/s read with an SD Card will be challenging and will be doable only with the fastest cards. (And if you add FAT handling on top of that, it's probably going to be all the much harder.) One thing for sure (can say because as I mentioned I'm working with SD cards lately): you'll need to switch the card to 1.8V mode (not all cards support it) and then one of the highest clock rates supported. The fastest you can get in SDIO mode, 4-bit, at 3.3V is 50 MHz - which will give you a max throughput of 25 MB/s minus any overhead. And supporting 1.8V mode is not as trivial - your circuit needs to support powering SD cards at both 3.3V and 1.8V, and be able to switch between them. Makes the hardware more involved. Oh and the initialization phase for an SD card is not trivial either - do not expect implementing all commands and init sequences purely in HDL - that would be pure madness. That part too will need to be done in software. Only the low-level part of the SDIO bus can be handled reasonably in HDL.

Yeah, I'm happy with 25MB/sec. I haven't got any justification for all the added complexity trying to squeeze every last drop of speed out of the SD interface - even a 12.5MHz SPI connection would be vastly faster than any 'historically accurate' storage device for the Z80, let alone a 25MHz 4-bit SDIO interface. The biggest file I have in CP/M currently sits at just over 16KB.

Uninterrupted 25 megabytes a sec will just about give you 24fps with 480p uncompressed digital video. If you want sound, you'll need a bit more. Compressed video, well you can pull off 1080p using MJPEG2000.
Quote

Here's a meta-question, though - should I move this line of discussion regarding setting up an SD interface to another thread? It IS relevant to this one in that there's discussion around using the DDR3 and it's housed in the same FPGA GPU project, but that's about it.

Keep the SD card stuff here. It should be small compared to the video stuff and it is needed for playing back video from the SD-Card.

#2989 Reply
Posted by BrianHG on 26 Nov, 2021 09:42
Quote from: nockieboy on 26 Nov, 2021 08:55
" Crikey - "

#2990 Reply
Posted by nockieboy on 26 Nov, 2021 12:50
Okay, I've had a little time (but not as much as I'd hoped) and have tidied up the source code a little for the SD interface, making it a little more readable at least.

There's some artefacts left in the code from the example source code - artificially driving RD_REQ in SDInterface.sv, for example, as the original example code was set up to perform a read of Sector 0 after reset/powerup, which I'll remove once I start working on the connections to the Z80_Bridge and DDR3_Controller.

There's also no way to write data to the SD card just yet - again, I'll work on this as my understanding of the code and protocol increases. What I'm trying to work out at the moment, however, is how to control the level translator on the DECA board, which the example code doesn't account for. The DECA (wisely) has an SN74AVCA406L chip between the FPGA and SD socket, providing static discharge protection and voltage conversion between the FPGA and SD card. I'll likely use this setup for my SD socket on the GPU card when I can design and build it (much later).

As a result, I have four signals that the example code doesn't cater for:
- SD_CMD_DIR
- SD_D0_DIR
- SD_D123_DIR
- SD_SEL
If you look in SDReader.sv, lines 55 and 57-58 appears to be where the SD_CMD line is switched between input, output and tristate. I've added line 56 to set the direction of the SD_CMD line through the SN74AVCA406L accordingly (have checked the datasheet and am hoping I've got it the right way around!)

I just want to make sure I'm on the right track and haven't done anything wrong before I go too far with the changes to the code. Presumably SD_D0_DIR and SD_D123_DIR will follow the same logic and go in the same direction as SD_CMD at any given time and SD_SEL will remain low all the time any read/write is being performed?

SDInterface.sv.txt

SDReader.sv.txt

SDCmdCtrl.sv.txt

#2991 Reply
Posted by nockieboy on 26 Nov, 2021 22:36
Quote from: BrianHG on 26 Nov, 2021 09:11
Use the megafunction tool to test generate a dual port, dual clock ram. It will tell you how many M9K blocks will be used. I think no matter what you choose, you may get stuck with 4 as a minimum because of the 128 bit wide side B. Dual clock just in case as you can always just tie the 2 clocks together. Dual port with each being a read & write port. Side A should be 4 or 8 bit for the SD-Card and side B should be 128bit for the DDR3. I would at least choose 512bytes worth, but if the minimum M9K count is 4, actually choosing a 4kbyte buffer will still use the same amount of M9K blocks. Only go up to 4K if you can make use of it, for example, transfer 8 consecutive 512 byte blocks, otherwise, there is no plus in doing so. I don't know much about FAT32.

Here's what I've produced with the megafunction in Quartus. Hopefully it's not far from the mark. Takes up 8 M9K blocks, apparently. If it's okay, I'll tidy it up tomorrow and have a think about how I'm going to connect it to the SDInterface module.

SD_Buffer_RAM.v.txt

#2992 Reply
Posted by BrianHG on 26 Nov, 2021 22:48
Quote from: nockieboy on 26 Nov, 2021 22:36
Quote from: BrianHG on 26 Nov, 2021 09:11
Use the megafunction tool to test generate a dual port, dual clock ram. It will tell you how many M9K blocks will be used. I think no matter what you choose, you may get stuck with 4 as a minimum because of the 128 bit wide side B. Dual clock just in case as you can always just tie the 2 clocks together. Dual port with each being a read & write port. Side A should be 4 or 8 bit for the SD-Card and side B should be 128bit for the DDR3. I would at least choose 512bytes worth, but if the minimum M9K count is 4, actually choosing a 4kbyte buffer will still use the same amount of M9K blocks. Only go up to 4K if you can make use of it, for example, transfer 8 consecutive 512 byte blocks, otherwise, there is no plus in doing so. I don't know much about FAT32.

Here's what I've produced with the megafunction in Quartus. Hopefully it's not far from the mark. Takes up 8 M9K blocks, apparently. If it's okay, I'll tidy it up tomorrow and have a think about how I'm going to connect it to the SDInterface module.

Looks ok. Only 1 feature is not needed: NEW_DATA_WITH_NBE_READ
Check the megafunction for read during write = new data. We only require 'Don't Care' or 'old data'. What's going on here is that you have instructed the compiler to make sure if there is a collision, where you simultaneously write to the same location during a read on the second port at that location, the compiler adds extra logic outside of the M9K block to pass the new data through instantly on the same clock. We don't need this as that delay is only 2 clock cycles max and we wont be writing to 1 location simultaneously reading the same ram byte of the second port side. It a waste of gates, albeit a small amount, it still has a cost associated with the feature though I doubt we will ever reach that point are we are running the DP ram at 100MHz, not the top end 300MHz.

#2993 Reply
Posted by nockieboy on 27 Nov, 2021 07:43
Quote from: BrianHG on 26 Nov, 2021 22:48
Quote from: nockieboy on 26 Nov, 2021 22:36
Here's what I've produced with the megafunction in Quartus. Hopefully it's not far from the mark. Takes up 8 M9K blocks, apparently. If it's okay, I'll tidy it up tomorrow and have a think about how I'm going to connect it to the SDInterface module.

Looks ok. Only 1 feature is not needed: NEW_DATA_WITH_NBE_READ
Check the megafunction for read during write = new data. We only require 'Don't Care' or 'old data'. What's going on here is that you have instructed the compiler to make sure if there is a collision, where you simultaneously write to the same location during a read on the second port at that location, the compiler adds extra logic outside of the M9K block to pass the new data through instantly on the same clock. We don't need this as that delay is only 2 clock cycles max and we wont be writing to 1 location simultaneously reading the same ram byte of the second port side. It a waste of gates, albeit a small amount, it still has a cost associated with the feature though I doubt we will ever reach that point are we are running the DP ram at 100MHz, not the top end 300MHz.

Okay, no problem. Have changed the settings on lines 118 & 119 to:

Code: [Select]
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA", altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
Does it need read enables at all?

#2994 Reply
Posted by BrianHG on 27 Nov, 2021 07:58
Actually, read clock enable can be useful when sending data to the DDR3 as a read has a 2 clock delay and we may want to pause the read if the DDR3 is busy. (I bloody hate that, you wont believe the extent I had to work to deal with the fact that when reading, your response comes in a number of clock cycles later. Worse, if you want to read a lot fast, you actually are sending in multiple reads as the train of responses come out delayed.)

Everything else is ok.

Remember, if you aren't going to use the enable, you basically are just going to tie that input to a 1'b1.

#2995 Reply
Posted by nockieboy on 27 Nov, 2021 09:38
Okay, here's an update on where I am currently.

dual_port_block_cache.sv is - as it says on the tin - the dual port cache for reads/writes to the SD card. This is instantiated in SDInterface.sv (which I was going to remove, but have now realised it's going to be a key part of the structure) and I've wired up the output from SDReader.sv to it, so that (in theory at least) any data read from the SD card will now be written to the M9K cache.

The cache is 1KB in size, so for the moment I'm intending to separate it into two blocks of 512 bytes - one for reads FROM the SD card, one for writes TO the SD card. Seems sensible to keep the two separate, especially as block RAM is no longer a scarce resource now we're using DDR3 for everything, but (as always) I'm open to suggestions and advice on whether this is suitable or even necessary.

Take a look at SDInterface.sv. It contains some comments on how I think the cache is going to work. I've connected clock_a to CLOCK_50 - seemed sensible to have port A of the cache clocked at the same speed as the data coming from SDReader.sv? clock_b is connected to CLK, an as-yet-unspecified clock source from the top-level module, but I'm thinking this should be one of the DDR3_CLK clocks.

I'm a little confused about the address width in the dual_port_block_cache.v for some reason (line 56). It's set up with a 7-bit address bus for port a, which is only 128 bytes? Have I made a mistake in the setup here, or am I just misunderstanding the values?

dual_port_block_cache.v.txt

SDInterface.sv.txt

SDReader.sv.txt

#2996 Reply
Posted by BrianHG on 27 Nov, 2021 10:35
Quote from: nockieboy on 27 Nov, 2021 09:38
I'm a little confused about the address width in the dual_port_block_cache.v for some reason (line 56). It's set up with a 7-bit address bus for port a, which is only 128 bytes? Have I made a mistake in the setup here, or am I just misunderstanding the values?

Port A should have a 10bit address if you are reserving 1kbyte and have it set to 8 bit. Maybe just a typo when using the megafunction.

Port B should have a smaller address as it has fewer words at 128 bit wide.

Double check the M9K usage, they are precious as we need just as many for each maggie layer. So, only reserve the number of KB which offers the minimum M9K size.

Also, check the erase block size. I don't know how big it is, but, to edit data within a block, if I remember correctly, you need to read that block, edit the bytes you want to change inside that read buffer, then erase that block, then, write that block with your edited buffer.

For the DDR3 interface, you will use the CMD_CLK. It is 100MHz. Check the SD controller, it may be written in a way where you can use the 100MHz and it will divide the output SD clock for you. Otherwise, you will be stuck with using a PLL or CLK_IN which will generate timing report errors and we will need to fix the .sdc file to fix those.

#2997 Reply
Posted by nockieboy on 27 Nov, 2021 14:35
Quote from: BrianHG on 27 Nov, 2021 10:35
Port A should have a 10bit address if you are reserving 1kbyte and have it set to 8 bit. Maybe just a typo when using the megafunction.
...
Double check the M9K usage, they are precious as we need just as many for each maggie layer. So, only reserve the number of KB which offers the minimum M9K size.

Yes, was a typo. I've gone back to the megafunction to work out the M9K usage, and it appears that the 128-bit data bus on port B is ramping up the M9K usage to a minimum of 8 M9K blocks. Even if I select the minimum RAM size, 16x8 bits, it uses 8 M9K blocks. I can go right up to 8192x8 bits for the RAM and it still only uses 8 M9K blocks. So (unless you suggest changing port B's data width) instead of wasting 7.5KB of M9K RAM, I'm setting the cache RAM size to the full 8KB - if we need it later (e.g. for FAT support) then we've got it.

Quote from: BrianHG on 27 Nov, 2021 10:35
Also, check the erase block size. I don't know how big it is, but, to edit data within a block, if I remember correctly, you need to read that block, edit the bytes you want to change inside that read buffer, then erase that block, then, write that block with your edited buffer.

I'm assuming you mean the SD card's erase block size? I can find no mention of erase block size for M9Ks. It looks like the erase block size can vary from SD card to SD card - primarily based on its capacity and quality of its controller, I would imagine. A common erase block size seems to be 4KB, although there's no guarantee that would be the case for any particular SD card; this should be a moot point, however, as the SD card's controller should handle block editing/erasure as part of a write of any size to the SD card, along with wear levelling etc. Why do you ask, could this be an issue?

Quote from: BrianHG on 27 Nov, 2021 10:35
For the DDR3 interface, you will use the CMD_CLK. It is 100MHz. Check the SD controller, it may be written in a way where you can use the 100MHz and it will divide the output SD clock for you. Otherwise, you will be stuck with using a PLL or CLK_IN which will generate timing report errors and we will need to fix the .sdc file to fix those.

Ah yes, SDReader has a clock divider parameter for just that purpose. Okay, both sides of the cache RAM are being clocked at 100MHz from CMD_CLK now.

The project should now read a sector/512 bytes from an SD card and write it to the cache RAM. RD_RDY goes high when the data has been read (and written to the cache RAM as that happens automatically). Now I just need to write those 512 bytes to DDR3, so I'm going to need....
- wr_ena(x)
- (PORT_ADDR_SIZE)'(addr(x))
- PORT_CACHE_BITS)'(wdata(x))
.. and a little later...
- read_req[y]
- read_ready[y]
- (128)'read_data[y]
.. is that right? x and y will be the port numbers... x=3 and y=5?

#2998 Reply
Posted by BrianHG on 27 Nov, 2021 21:51
Quote from: nockieboy on 27 Nov, 2021 14:35
The project should now read a sector/512 bytes from an SD card and write it to the cache RAM. RD_RDY goes high when the data has been read (and written to the cache RAM as that happens automatically). Now I just need to write those 512 bytes to DDR3, so I'm going to need....
- wr_ena(x)
- (PORT_ADDR_SIZE)'(addr(x))
- PORT_CACHE_BITS)'(wdata(x))
.. and a little later...
- read_req[y]
- read_ready[y]
- (128)'read_data[y]
.. is that right? x and y will be the port numbers... x=3 and y=5?
For now, this is ok. With the new V1.5 controller, each single port can read and write just like the DP ram.
Remember, you need to set all the unused CMD_xxx ports to fixed values.

See if you can find a 'micro SD card verilog simulation model' like I have been using Mricon's DDR3 model to test my controller. This way, you can simulate your interface design with a virtual SD-Card in modelsim first. Then next, tie that to my Z80 bus simulator wired into my DDR3 controller simulator to give you a full 100% test run before you even stick it into your GPU. This way, you will know if it's working first. I have given you a ton of example sim models so you can see how to setup modelsim, but, the Z80 bus will probably be the most useful.

#2999 Reply
Posted by nockieboy on 27 Nov, 2021 23:43
Quote from: BrianHG on 27 Nov, 2021 21:51
See if you can find a 'micro SD card verilog simulation model' like I have been using Mricon's DDR3 model to test my controller. This way, you can simulate your interface design with a virtual SD-Card in modelsim first.

Failing at this first hurdle here. I've not found anything yet. Will keep looking.

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

There was an error while thanking

Thanking...

Go to page:

« 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 »

Full site Menu

Navigation

Powered by SMFPacks Advanced Attachments Uploader Mod