Yes, technically AVR can do up to 256 pixels on X, but you need 9 MHz instruction cycle at least. That's (let's round it) 20 MHz clock. I knew 6 MHz is too low, no matter how many tricks you have in your sleeve . I'm not sure what's left - 10 us? - can't find proper specification of CGA border and retrace. But in theory you could load 32 bytes for 100 cycles.
What's CGA pixel clock? Over 3.8MHz, at least (at 320 pixels). So yeah, you need more clock, so what? Double would seem to be quite achievable, at least for single cycle ISAs like AVR. And anyway, with MCUs in this class generally offering 16 to 32MHz (~= MIPS) operation, you have no reason or need to limit your clock rate, and with the extra cycles available, reading from a RAM/ROM buffer becomes a delicious prospect.
Writing it in C, probably not, but that's pretty obvious for anything the least bit time-critical.
16 bit PIC is probably comparable or even better in performance. Stretching that to 320 width and more lines shouldn't be too big of a deal. Sheer bit-banging is still met with the same serialized restriction (you might as well go with an overclocked, sequential, one-bit architecture!), but you have more capability to organize and ready data until it gets shifted out, so that still helps.
Knowing your hardware intimately is the biggest help. Even on the 8-bit PICs, you can probably manage much better. Example: pump a page of RAM into the output port (maybe something like, read RAM into W, increment pointer, output W to port, shift W, output port again, etc.), then shift the page for the next line and so on. If possible, generate new lines in the background somehow (or change them progressively, or..).
I only know little about old PIC, so I can't be of much help there. But figuring it out is half the fun, anyway.
BTW, what do you need square pixels for? Off-square is just a transformation away. It's not a bug, it's a feature! Use it to your advantage. A power-of-2 screen width is nice for computing graphs, for example.
I still want to use ISA card. What's holding me back for now is soldering 60 pin connector to 5-6 latch registers. That's a ton of cables. Might be cheaper to just draw and order a two layer PCB.
Blah, whine whine whine
Just be thankful you aren't wire-wrapping a thousand at once -- and having to patch the one wire that's buried under it all!
http://www.futurlec.com/Protoboards.shtml#ISABUSBRD.shtml# ISABUSBRD
It looks like they're still around, or you might be able to find an edge connector to ribbon cable adapter.
They also have a PCI proto board (through hole), which is... kind of scary...
One thing you mentioned got my attention. Making a continuous SPI. I didn't fully understand what you said, but I'll read it few times and try to put it together in my mind. I have to think about how to make it latch in the right moment - every 8 clock cycles. 74HC165 is the one (or one of them) you were looking for - parallel in, serial out, but I need 3 bit counter to divide the serial clock by 8 in order to latch with no gap. And yes, I'm not sure what I can do with RAM. Some PICs have parallel streaming port with support of external ram built in. You can access external ram as you access the internal with the difference that you also have it on a bus where you can latch a shift register like 74HC165 from... give me enough time and money ...
Also can you explain more about 1 bit FIFO? How will it help to close the gap? And how do you implement it?
Eh, I shouldn't have phrased it that way; more specifically, a one-bit wide channel with N bits of FIFO length. Clock bits in one end and out the other.
It goes between clock domains, so it needs to have dual clocking capability. It doesn't need to be a synchronizing register as such. It probably needs to be more than that.
One possible realization (not necessarily optimal, in terms of gates, or chips, or..) would be:
Input: SCK in, MOSI in
Output: SCK in, MOSI out
Input SCK clocks a counter. It might be a simple synchronous binary counter, N bits. It selects a one-of-2^N decoder, which feeds the clock input of 2^N type-D flip-flops. All the flip-flops have D = MOSI in.
Output SCK clocks a counter. Same idea. It selects a 1-of-2^N mux, which selects one of the flip-flop outputs, which goes to MOSI out.
The counters also have master resets tied to a secondary pin.
At the beginning of a line, you strobe RST. So RST might simply be horizontal retrace. The pixel clock is independent of CPU clock, and starts advancing the output address. Right away, it's reading gibberish (actually, previous FIFO contents, in a circular loop -- since the counter overflows on its own), but that's fine because we're in retrace. Once bytes are readied and SPI starts spitting things out, sooner or later it'll connect and be on its way.
You need well defined software delays to begin delivering data to this buffer, but the SPI clock can be as fast as possible and it doesn't matter, it doesn't need to be synchronized to the data. And you have much more leeway for timing between bytes.
You would have to re-time the horizontal retrace signal to the video clock domain, otherwise you'll end up with dot crawl or something like that. Keeping the processor harmonically locked to the pixel clock would also be desirable. But that's fine, nothing a PLL can't do.
The largest example of such an architecture would be, dual-port RAM the size of the frame buffer itself, in which case no refresh is even needed, you can just let the FIFO loop around and there's your frame. And the input address can then be random-access, because it's no longer having to stream an entire frame at once.
Some time ago I stumbled upon dual access ram. You can access it from two places at the same time. So you can read and write at the same time. You can write next row as previous is being sent. However I don't remember where I've seen this ram. It was made in quite small volumes - like 1k or something. Not sure if I can get 16k and buffer a full frame.
I have a few chips laying around, I believe enough to do something like this. I kind of don't care enough to put such a thing together, though... I already have the original hardware in my three PCs, so what good is it?
And the XT is not full speed video. It's interlaced and you can't see fast movements very well. It's similar to TV, but I think they have more iterations than even/odd. So it's equivalent to having slower frame rate - you still don't have full 25 frames/s information. But for slower videos missing information is less noticeable. And I agree it's cool
The demos isn't full speed, in the sense of full frame updates, but then: what can really be said to be full, anyway?
Zero video transmitted today (over any useful distance), whether by wire or through air, over the internet or via specific digital means, is uncompressed -- and all compressed video necessarily suffers from that type of artifact. How much so is a matter of degree and bitrate, nothing else. The most important sense of "full speed" is that,
something is happening, each and every frame, and the visual illusion is largely maintained. Complaining that it's not updating the entire frame, every frame (as if to spite the video RAM that we somehow don't need it anymore!), is a dumb complaint, just as dumb as using a handheld calculator to crank every cell of a spreadsheet when you could write it out in Excel instead. Work smarter, not harder! Do more with less (hardware and time, that is)!
Tim