-
Software: Ideas for fast streaming of disk data in Retro Z80 design?
Posted by
obiwanjacobi
on 02 May, 2016 07:27
-
Hi
I am building my Retro Z80 project (see blog link in signature) and am researching how to best hook up file storage. I figured off loading the Z80 CPU as much as possible would be preferable in order to keep application programming as fast and simple as possible. I wouldn't mind putting a micro on the Storage interface hardware to manage details of interacting with an SD card or IDE/PATA HD and do local buffering etc.
So instead of the Z80 having to deal with drive sectors and the complexities of a file system, I would like to abstract all that into a stream based API. My first idea is to have a generic OpenStream function that takes an uri that points to the resource. For example: "file://[named storage]/root/directory/file.ext". Using the uri scheme has the advantage that other protocols (like http) can be accessed using the same API. The disadvantage is that the string based nature of an uri probably introduces some overhead compared to a binary interface. However my Z80 design has multiple banks of 64k (1MB max) so that may be worth it. The Stream* that is returned could be compared with the C FILE struct, although physically it will probably be different.
The protocol handler for file behind OpenStream will implement the driving the hardware interface to communicate the storage file content. I see three options:
DMA - The storage interface hardware requests the Z80 bus and pumps a block of data directly into RAM (MMU RAM blocks are 4k). This option will probably be the fastest but perhaps require extra complexity in the software to manages the blocks?
Serial IO - Using I2C or SPI (UART?) to serially communicate the data. Problem here is that the Z80 has no hardware for this. For this to work, it would require extra hardware and probably still be the slowest option.
Parallel IO - Use Z80 input and output instruction to communicate with the storage interface hardware at a specific io address. This would be fairly easy to do both in hardware and software.
Functionally Streams also have to be seekable (for files anyway) so the interface with the hardware interface must also support that.
Finally, the question is if this sound reasonable or are there better ways of doing this? If you have any experience with this, I would love to hear from you. Thanx!
-
#1 Reply
Posted by
hamster_nz
on 02 May, 2016 07:49
-
I have only limited experience of this, but it sounds very odd to put an OS level (or even higher) layer of abstraction at the lowest-level of the design.
I would have thought that you would want the lowest level to closely reflect the physical attributes and capabilities of the physical device being accessed (maybe block, character or packet, like in Unix land), then build on top of that
-
-
Why?
I know that is how it is usually done, but if you want to off load as much processing as possible from the central CPU, you need to go to a higher abstraction level. I want to have just enough control to be able to do all the stream functions without any extra added details.
That would mean that low level functions like format, defrag etc would indeed be implemented by the storage interface hardware and be available through a separate API.
-
#3 Reply
Posted by
hamster_nz
on 02 May, 2016 08:44
-
Why?
Enumerate the API/features you want to support, and then the different resources that you want to access them. They naturally form into groups (with a few oddballs like I2C addressing, and configuration).
Open, close, create, seek, read, write, some sort of wait for data, error detection/status, some way to enumerate and organise things into a hierarchy.
Maybe have a look at how a high-end SCSI card works, with a command queue that is used to pass requests and data to/from the target. That is a pretty generic 'talk with anything' interface
How do you envisage the equivalent of directories working? Do you want to be able to give objects attributes other than just holding data? (I.e will you have a file system at all)
-
-
Yes, determining the final shape of the API is not done yet. I am doing exactly like you've said and writing down the protocols I wish to support and what the common denominators are. I didn't mention it - not to blur the prime objective of this question. Also thinking of streaming audio in this way - to give you an idea of what my thoughts are...
The functions/operations I plan to support may be derived from the http set of verbs: get/post/del etc. Not sure yet. Whatever makes most sense in most protocols.
I have not yet determined the needs for meta data or header data in any detail, but my initial idea was to be able to open meta-streams. That would allow getting/setting file attributes as well as specifying http-headers if it would ever come to that.
Hierarchical information is retrieved on a stream per node basis. So "file://[root]" would return a stream listing the directory just like a dir command. Cool thing here is you could do "file://[root]/*.txt" to return only text files.
Specific streams will have helper code for reading and writing. So a set of FileStreamReader functions would help with navigating over the stream content.
(I am also thinking of adding extra support for chunk-files that I may be using as a program/executable format.
Files like .wav and .mid are chunk files.)
-
#5 Reply
Posted by
stmdude
on 02 May, 2016 09:49
-
Have a look at what CBM/Commodore did with their disk-drives for VIC20/C64/C128. It seems very similar to what you want to do, so maybe you can get some ideas from there.
-
-
Have a look at what CBM/Commodore did with their disk-drives for VIC20/C64/C128. It seems very similar to what you want to do, so maybe you can get some ideas from there.
Great idea, thanx!
Edit: found this:
http://c64emulator.111mb.de/c64/docu/IEC_1541_info.pdfVery low level...
-
#7 Reply
Posted by
ale500
on 03 May, 2016 04:20
-
I think you have here 2 different concepts, that (imho) need to be separated.
The first one is to get data from a (any) source quickly. Use DMA, burst modes and so on.
A (software, OS) layer that allows you to request/send a variety of content to/from different devices.
CBM solved (and like it may OSs) with the IEC bus (sadly, no DMA) and with its kernal. "Device not present" was a great feat. The PC didn't have something like that... but copy to serial or to printer and so on, that was there.
I'd not confuse the transport with the kind of data you put on the bus. Define something that (already) works, you do not need to re-invent everything. When you want to test something, the best is if you have some server software on your PC and "serve" content via a RS-232 (at 1MBit for instance). Easy to setup, debug and program. Even testing a new ROM can be done this way...
-
-
Yes, there is the hardware transport and the software API. But I felt that a specific choice on the hardware may have consequences for how to structure the software. As I sort of have an idea how I want the software to look - would that have consequences for the hardware? I understand that you could see these as separate but would that be optimal? My question here is to gather more input for making these decisions.
-
#9 Reply
Posted by
MrSlack
on 03 May, 2016 05:50
-
Another one. If you're using an MMU you can use a virtual address space and memory mapped files/streams. The pages don't have to be held initially in RAM. With some logic you can demand load them from whatever you are streaming from in 4k blocks if there is a page fault.
I don't know how complicated your MMU is going to be though.
-
-
I do not have virtual memory in a x86 sense. I do have multiple 64k banks and 20 memory address lines organized in 4k blocks I can put anywhere in active (64k) memory address space. A couple of blocks will need to be fixed (page0 and stack etc), all others can be de/selected.
Actually I am currently working out the details of this in software (together with integrating it into the sdcc /z88dk). So I do not have the final details yet, but the idea of DMA transfer would be to use 4k blocks of memory to store (parts of) the file in with the stream abstraction on top of that. When block is almost read in its entirety, a new block can be fetched (async) and reading can continue on the new 4k block. For a short time you need 2 4k blocks, otherwise only one. For other protocols I hope I can use the same mechanism...?
-
#11 Reply
Posted by
MrSlack
on 03 May, 2016 06:35
-
That would work. The first Unix versions used a similar method of abstracting streams. Look for Lyons Commentary on Unix - there is a lot in that about this sort of stuff.
-
-
That would work. The first Unix versions used a similar method of abstracting streams. Look for Lyons Commentary on Unix - there is a lot in that about this sort of stuff.
You mean something like
this and
this?
-
#13 Reply
Posted by
MrSlack
on 03 May, 2016 19:09
-
That's the ones. Also Operating Systems Design and Implementation by Tanenbaum.
-
#14 Reply
Posted by
ale500
on 04 May, 2016 04:37
-
But I felt that a specific choice on the hardware may have consequences for how to structure the software.
Yes, but that is exactly why I suggested using a serial connection for test: abstraction. It adds a layer but keeps your software/API independent of the transport medium being a disk that answers in bulk or a serial that gives you one char every other moon. The unix concept "everything is a file" may not be too far off.
-
#15 Reply
Posted by
Rasz
on 04 May, 2016 16:28
-
sounds like you want network file system
-
-
sounds like you want network file system
In a sense, yes. If you replace the network with an internal bus.
-
#17 Reply
Posted by
legacy
on 06 May, 2016 16:25
-
what about a super simple NFS over a super fast serial @ 1Mbps?
you can design it in HDL
-
-
Ok, perhaps the situation is not clear (or I don't get what you're saying).
On one side there's the program that is running on the Z80 and on the other side, some kind of smart IO device - in this case a HD or SD card. "Smart" means that it has the brains to elevate the protocol up from the native device it interfaces. I want to extent this idea for other types of IO devices as well - if at all possible.
The idea is to exchange data at an optimum rate between the Z80 program and the smart IO device. Because the Z80 is processor and not a controller it has no on-board serial capabilities. What it can natively do is read/write 8 bits from/to memory address space and the IO address space (both 64k).
So you see, any serial interface would require extra hardware and translation because the Z80 is a parallel beast, not a serial one. The two best/simplest options that remain are DMA (the smart IO device reads/writes data directly from/to the Z80 memory) or basic IO instruction (we use the Z80 IO address space).
As for HDL, I have (more than) a couple of Altera Max II's laying around I want to use for this project to keep it low cost. I already use one to do all the address, IO and other decoding (enabling memory chips etc.) for the Z80 system. I know a little VHDL.
I think that DMA would be the most optimal option. The only problem is that previously I only had one 'entity' being able to request the bus - the System Controller (A PSoC5 that takes care of PC related comms incl. downloading initial software. The system has no ROM) - where as now, we would be moving to a scenario with possible multiple 'masters'. But that should not be a problem. It would only require some small changes in the PSoC5 software that currently assumes it is king...
So with the hardware transport figured out, the real challenge that remains is the software protocol / API. What would be the optimal interaction between the smart IO device and the Z80 stream functions? Of course I can naively implement this and I am sure I can get it running. But perhaps there are existing algorithms that work better. I have looked at the unix v6 code but I cannot see any magic there. So perhaps there is no magic...?
-
#19 Reply
Posted by
hamster_nz
on 07 May, 2016 08:40
-
Hum. Dual ported RAM and aimple signalling would be the least effort.
An 4k block of DP RAM, an 8 bit register for ownership, an 8 bit register to indicate CMD/data and an IRQ signal. The 4k block is logically divided into 8 512 byte buffers.
To start a transaction the Z80 looks for a block it has ownership of, sets the command bit, then writes the command into the dual port RAM. It then sets the ownership bit to hand over to the external I/O processor.
When the command is accepted the I/O processor writes the reply into the buffer, and clears the ownership bit, and pings an IRQ to the Z80.
If a data transfer is required (e.g write or read), the Z80 fills the buffer, clears the command bit, and sets the ownership of the block back to the I/O processor. The I/O processor can then use that buffer to pass data back to the Z80.
That way you can have upto 8 I/O commands active at once.
If you sit the dual-port memory in the correct address window you could also use it to bootstrap the Z80
-
#20 Reply
Posted by
legacy
on 07 May, 2016 10:22
-
So you see, any serial interface would require extra hardware
------------------
| super uart0 ===== NFSlike over serial on Cat5 === linux SoC @media{SD,USBHD,...}, fs{Ext2,Ext33,Vfat,...}
cpu --- | FIFO super uart1 =====
| super uart2 =====
-----------------
fpga
-
#21 Reply
Posted by
legacy
on 07 May, 2016 10:23
-
Hum. Dual ported RAM and aimple signalling would be the least effort.
this is a good solution, used in avionics
-
#22 Reply
Posted by
C
on 07 May, 2016 15:01
-
When looking at an mass storage interface, think you need to look at the overhead to get data transparency. A byte in a block of data in memory or on mass storage can have any value. Bytes in the control and status can also have any value.
Some hardware interfaces can handle this in hardware.
You have hardware based packet interfaces like ethernet, HDLC, SDLC where the hardware defines the starting point of a packet in hardware. A counter value in the packet then keeps the the three(command, status, data) separate.
SCSI uses a 9th wire with the 8-bit data buss to do the separation. In addition allows many masters on the same buss.
A uart is a very poor choice for mass storage.
Look at how many times a block of memory to/from mass storage must be moved.
How much software time is needed.
TurboDOS
A Multiprocessor Operating System for Z-80-based computers
http://www.cpm8680.com/turbodos/TurboDOS 14 Implementers Guide.pdf
http://maben.homeip.net/static/S100/IMS/software/IMS%20Z80%20TurboDOS%2014%20Implementers%20Guide.pdfThe main limits to TurboDos was keeping compatible with CP/M user programs.
-
-
Using DP RAM is an interesting idea. I am not sure if its the least effort, though - because I already have 4 banks of 64k on my board, but not DP RAM... Also DP RAM is excellent when there is contention on the RAM addresses. Here, using a block based transfer of data, I do not see that contention. It would safe the Z80 from being interrupted by a BUSREQ though... Is that the reason to use it?
As for serial, I still don't see how that is 'better'..? It just adds hardware and translations/transformations... Or I am really not getting it.
I do not think I want to fix any parameters (packet size) in hardware - I already have a fixed memory-block size of 4k. I can imagine that it may make it even faster if I did, and perhaps I will never need the flexibility, but I just don't want to commit to any value(s) at this point.
-
#24 Reply
Posted by
hamster_nz
on 07 May, 2016 20:25
-
Quite a few have offered ideas, most of which has been discounted.
Maybe we should turn the question around.... how do YOU suggest YOU will implement a fast, high-level abstracted I/O interface, on an 8-bit CPU from 40 years ago, without using a serial interface, without using a shared memory space, without having an arbitrary restriction on transfer size, and without having a list of the feature set that you want to support in the first place?
WE can then discount them as being unworkable because they will require work,or because it doesn't allow some feature that you did not know you wanted