Author Topic: TinyCADE: The 8-bit game system packing a chiptune synth and much more  (Read 9172 times)

0 Members and 1 Guest are viewing this topic.

Offline Kozma04Topic starter

  • Newbie
  • Posts: 3
  • Country: ro
Hi everyone,

For the last 2 years I have been working on TinyCADE, an Arduino-compatible game console which I made having in mind the objective of squeezing as much as possible from an 8-bit platform, and the prototype seems to have become pretty mature after 4 iterations.

TinyCADE is a handheld game console having 3 8-bit AVR microcontrollers at its heart: an AVR128DB64 for running the games, an ATtiny1614 for making cool chiptunes by mixing up to 8 audio channels in real time and an ATtiny212 for power management.

It also has a crisp 1.6inch grayscale 128x64 OLED screen with 4bpp, 8 tactile switches (+power and reset buttons), a 1W speaker, RGB LED, SD card support, USB Type-C connectivity and a 600mAh LiPo battery providing around 5 hours of gameplay.

The chiptune synth software running on the ATtiny1614 can mix up to 8 channels of individual amplitude, waveform and duty cycle, at 44.1Khz. It can also use 2 PCM waveforms and apply effects such as vibrato, sweep, ADSR envelope and more at 100hz.

Besides playing retro games made by others, an "Extension Backpack" board can be attached to the TinyCADE, which essentially turns the console into an Arduino-compatible development board, by breaking out to pin headers the power supplies and the unused MCU pins (digital IO, DAC, programming etc.).

The Extension Backpack also serves as a tool for "unbricking" the TinyCADE in case of an unfortunate event, requiring only a computer and a USB-C cable for that.

Right now, my biggest struggle is building a community around the TinyCADE since I am planning to launch a crowdfunding campaign on Crowd Supply for being able to begin the production phase.

I am really determined to have TinyCADE see the light of day, what do you think about it?

I have also posted updates and videos about the project on hackaday.io, hackster.io and Instagram:
https://hackaday.io/project/182954-tinycade
https://www.hackster.io/Kozma/tinycade-0558f6
https://www.instagram.com/tinycade_official
 

Offline Kozma04Topic starter

  • Newbie
  • Posts: 3
  • Country: ro
Also, here is a YT video showcasing the chiptune synthesizer, by playing a ported version of "Unreal Superhero 3" from FamiTracker:

https://youtu.be/KOLidY4E2uA
 
The following users thanked this post: Slh

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21609
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Cool!

How much work was that in terms of inter-MCU communication, frameworks, what's the API like, etc.?

The new AVRs are pretty powerful on peripherals, though I'm not sure how much use they are to game design.  Gotten much value from them?

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline Kozma04Topic starter

  • Newbie
  • Posts: 3
  • Country: ro
Thank you very much for your contribution, Tim!

Cool!

How much work was that in terms of inter-MCU communication, frameworks, what's the API like, etc.?

The new AVRs are pretty powerful on peripherals, though I'm not sure how much use they are to game design.  Gotten much value from them?

Tim

The main MCU (AVR128DB64) communicates with the t1614 and  t212 via 2 USART peripherals, by sending a few bytes as commands and expecting a response "asynchronously".

I managed to get a really good value from the peripherals of the ATtiny1614: I use the DAC to feed the processed and compressed (from the 16-bit sum of channel values to 8 bits using a LUT in PROGMEM of the arctan function) signal to the audio amplifier. For controlling the volume, I am just changing the DAC VREF, in order to not affect the signal resolution.
The sound effect processing running at ~100hz is triggered from an RTC PIT interrupt.
Also, the sound generation algorithm is executed inside of a timer interrupt at 44.1Khz, which is completely written in AVR ASM. Reading the LUT value from flash at the end is done using the LD instruction, as if it was in the SRAM.

FYI, the ATtiny1614 stores 24 "instruments" in the SRAM, which contain all the parameters for the available effects and are accessible via commands through USART. Each channel can be manipulated by setting its state (on/off), 8-bit volume and 16-bit frequency.
I forgot to mention the following before: the available waveforms are square, sawtooth, triangle, noise (pseudo-randomly generated and using the 4 additional bytes in the General Purpose I/O register memory, via in/out instructions), PCM A, PCM B. Only the square and sawtooth waveforms are affected by the duty cycle.

The power management MCU (ATtiny212) has its code written entirely in AVR ASM (maybe it is a bit overkill but with the Arduino core, only including the HardwareSerial ate up ~1.5Kb out of 2Lb of flash. IDK if I should port it to C). When the device is off, the MCU enters Power-Down mode, runs at 1Mhz and only the RTC PIT interrupt fires every ~1 second, incrementing a 32-bit variable in SRAM to keep track of time.
When the power button is pressed, a pin change interrupt is fired and the +5V boost converter (main power supply) is enabled.
This way, the device can only be "turned off" by pressing the power button again, sending the appropiate command via USART or having the internal VCC voltage (LiPo battery) reach below a
hard coded threshold of 3V.
Also, each user can "name" their TinyCADE by sending a 16-byte string to the t212, using the API for the main microcontroller (AVR128DB64), which is easily programmable by the user via a FT232RL, through USB. When the ATtiny212 detects the string, it waits 5 seconds for the user to confirm the command, by pressing the power button (which pin is isolated from the main MCU). If the command is confirmed, it is stored on the EEPROM for later access. Otherwise, normal operation is resumed. I am also planning to store the number of battery charging cycles in the EEPROM, but I am still trying to figure out how to detect such a cycle reliably.

Regarding the main MCU, I am not using many of the new peripherals for game development, as this heavily relies on the CPU itself. The library is a set of heavily optimized (I am not sure if too much) C-like functions for reading input, drawing sprites (1bpp, 4bpp or "transparent" using a 1bpp sprite and another 1bpp/4bpp sprite, from either flash or SRAM), battery/synth MCU control, "tiny" 3D graphics (see YouTube video. I followed the tutorials on scratchapixel.com for that, which are great!), Mode-7 like graphics, drawing tilemaps etc.
Maybe a DMA peripheral would have been useful for sending the frames to the OLED (via 6800 interface), but the "software" method is already fast enough, taking 2-3ms. :)
I also wrote a bootloader which derives from Optiboot, by adding functionality for flashing the MCU from a file on the SD card, using the PetitFS library). The file is selected using the buttons and a file browser menu displayed on the OLED.
I chose the AVR128DB64 due to its flash capacity, and the fact that it seems to run very fine overclocked at 32Mhz :)
Also, the ATtiny1614 is a bit overclocked as well, running at 20Mhz. But that seems no problem, running at room temperature.
All microcontrollers run from a 3.3V regulator, except the ATtiny212 which runs at the LiPo battery voltage. A logic level conversion circuit consisting of a voltage divider and a diode is used to prevent damaging the USART pins of the AVR128DB64.

I hope this post was not too long and that it answered your questions.  ;D
 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21609
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Eh, I wouldn't mind reading a few hours about such projects.  You could probably write for days... :-DD


I managed to get a really good value from the peripherals of the ATtiny1614: I use the DAC to feed the processed and compressed (from the 16-bit sum of channel values to 8 bits using a LUT in PROGMEM of the arctan function) signal to the audio amplifier. For controlling the volume, I am just changing the DAC VREF, in order to not affect the signal resolution.
The sound effect processing running at ~100hz is triggered from an RTC PIT interrupt.
Also, the sound generation algorithm is executed inside of a timer interrupt at 44.1Khz, which is completely written in AVR ASM. Reading the LUT value from flash at the end is done using the LD instruction, as if it was in the SRAM.

Arctan?  Oh, is that a crude implementation of mu law compression then?  (Or is that actually the function, I don't remember!)

Nice that these (well, some of them, not sure about the TINYs but probably?) have Flash mapped to RAM space, eliminating LPM instructions (and the setting-up-Z overhead, and the extra cycle or two for the slower instruction).

Oh, I should look up how to use that... I've been using avr-libc PROGMEM as usual, which just puts it in low Flash.  Haven't checked out that one update of libc (3.0), if it's got support for this sort of thing or what.  If nothing else, it'd be a new section, a linker instruction and that's about it.


Quote
FYI, the ATtiny1614 stores 24 "instruments" in the SRAM, which contain all the parameters for the available effects and are accessible via commands through USART. Each channel can be manipulated by setting its state (on/off), 8-bit volume and 16-bit frequency.
I forgot to mention the following before: the available waveforms are square, sawtooth, triangle, noise (pseudo-randomly generated and using the 4 additional bytes in the General Purpose I/O register memory, via in/out instructions), PCM A, PCM B. Only the square and sawtooth waveforms are affected by the duty cycle.

Very nice.  Which is pretty much drop-in Famitracker stuff (plus some), I mean the format is probably all different, but with a conversion tool to handle that, the sound is pretty much spot on, eh?


Quote
The power management MCU (ATtiny212) has its code written entirely in AVR ASM (maybe it is a bit overkill but with the Arduino core, only including the HardwareSerial ate up ~1.5Kb out of 2Lb of flash. IDK if I should port it to C).

Yeah, I've found GCC 8.1.0 at least, can be rather ponderous on code size/speed, depending on what you're doing.

That said, even the instruction set itself can be rather verbose, 16 bits per instruction (mostly) while not accomplishing a whole lot (RISCy).  Tempting to write some kind of bytecode interpreter, which would be more than adequate for a lot of lighter-duty applications, or tuned for specific use cases that don't need a fully functional VM every time; but ah, that's well beyond my experience, as far as what all to put in it, and how to implement all of that without it being totally molasses...

Though on that note, there was that one guy that ran Linux on a ATMEGA1284.  By emulating ARM.  Totally nuts. ;D


Quote
I am also planning to store the number of battery charging cycles in the EEPROM, but I am still trying to figure out how to detect such a cycle reliably.

Battery management is tricky; if all you have is an ADC, with not very many bits at that (12), the terminal voltage between say 50 and 80% charge is basically 1 LSB if that.  Or maybe an even wider range, I don't know.  But it's pretty nuts, you really need something special for that, or current monitoring (and with a very accurate sense amp and charge integrator, at that).

I recently did a project with MAX17048G+T10, which seems reasonable enough so far.  Voltage sense only, some proprietary magic inside.  Don't really have any deep cycles on the project as yet, so time will tell.  But it has a pretty tight ADC inside, with ridiculously low acq rate to match its low current consumption.  So you get a measure of Vbatt, charge, and various other things, via I2C bus.  Might be worth adding something like that; there's only so much you can do with MCUs alone (granted, they can cover quite a lot :) ).


Quote
Regarding the main MCU, I am not using many of the new peripherals for game development, as this heavily relies on the CPU itself. The library is a set of heavily optimized (I am not sure if too much) C-like functions for reading input, drawing sprites (1bpp, 4bpp or "transparent" using a 1bpp sprite and another 1bpp/4bpp sprite, from either flash or SRAM), battery/synth MCU control, "tiny" 3D graphics (see YouTube video. I followed the tutorials on scratchapixel.com for that, which are great!), Mode-7 like graphics, drawing tilemaps etc.
Maybe a DMA peripheral would have been useful for sending the frames to the OLED (via 6800 interface), but the "software" method is already fast enough, taking 2-3ms. :)

Yeah, sounds about right.  About the most value you could get out of them might be something like, generating interference patterns by polling combinations of them, or using the same effect for very slow division (which is sometimes okay, like how Bresenham line algorithm is division solved incrementally).  Still, it's real time, not necessarily something you can leave to complete in the background*, so yeh, understandable.

Or if nothing else, I mean, whatever user code is loaded, can do whatever it wants.  If someone wants to get clever with it, they can.

*Ordinarily this might be the case, though you could do something like, use TCBs to trigger/gate/clock TCA, or vice versa, routed via events and gated/latched/etc. via CCL.

BTW if you haven't tried it before, you can indeed get an R-S flip-flop in a single LUT.  Kinda useless since SEQ exists, but sometimes you need a f/f in front and using SEQ would have to just set the proceeding LUTs to buffers, then loop on to further logic... wasteful.  So, that's cute.

I've also made a D-S (delta sigma) decoder / totalizer / decimator, using CCL, EVSYS, TCA and TCB.  Not sure quite how useful this would be to gaming purposes, but quite handy for the application (an isolated ADC, so, the D-S bitstream is almost trivial to handle, just use a digital isolator chip).


As for DMA and parallel display buses -- I don't think the AVR-DAs have too much to show for this at least (tho, does any DB/+ have DMA or EBI??), you might have to go back to an XMEGA, probably one of the A series.  Which, you can get with serious amounts of Flash (384k) and DMA and even EBI (external bus interface), problem is they're also like $10 each so, kinda, who cares, why do that when an ARM is seriously better value at that point.  But then it wouldn't be an 8-bit platform anymore, so yeah, kinda paying for the privilege of poorer performance, and the -Dx are a pretty attractive compromise.

I haven't looked at the DMA in detail, but they're general enough as far as accessing anything in the RAM space, so possibly can be used for self-modifying scripting sorts of trickery, which would be a handy way to deal with bit-bang external bus access -- assuming the RAM contention isn't an issue (i.e. holding up the CPU) and EBI isn't available.

The two relevant projects I've done on XMEGA are, just a cutesy raycaster:

https://imgur.com/gallery/oTkxuOY

and a reverb effect box,
https://github.com/T3sl4co1l/Reverb



As you can see from the listing, the bit-bang access to SRAM, DAC and LCD takes a good 20 cycles or whatever, and that's with hand optimization.  GCC did particularly poorly on the DSP calcs, mainly because MUL isn't inlined, and there's no ISA level cleanup (optimization is done in IR only, AFAIK).  So, lots of 16-bit accesses shuffling around and zeroing registers to no effect, and the call overhead to the _umulsi etc. library stuff.  I more than doubled performance with the hand ASM there.

Also does a heroic amount of work entirely in the ISR (up to, whatever it was, >80% CPU use?), which took a bit of shuffling around to pull off, but not actually that bad.  The actual fix was pretty simple (I forget what exactly it was, but it's one of the last in the commit history I think), just had to think about it enough to want to do it.

Quirk: XMEGA can't do voluntarily nested interrupts, because the PMIC hardware handles interrupt flags for you.  I don't think there's any way to fool it e.g. to get one long-running interrupt overlapped with short ones.  Like if you use a periodic timer interrupt to do quick stuff most of the time, but also one long housekeeping / process update run every Nth pass or whatever.  But maybe that's just not such a great thing to be doing in the first place, I don't know.

Which, is something I [mostly] pulled off once, in college, on an ATMEGA32.  Timer refresh was for, I think, updating the display (posting another char or so to it), and whatever process operations (trigger ADC, signal filtering?).  But also sometimes taking a long pass (every 10th or 100th or something) to update menu state or whatever.  There were some weird display glitches (HD44780), things getting out of sync sometimes -- I didn't figure it out at the time.  Probably could've been solved using a buffered display driver, or double buffering, or better attention to concurrency / use of mutexes / etc.

I suppose if I were doing it again today, I'd do it a lot like, well, what the reverb box has.  Not that it has to worry about concurrency, but I'd probably be better off pulling everything through the menu system, rather than trying to have text or whatever come directly from another subsystem.  (Oh yeah, that's what it was related to probably, it was a power supply, with live updated readings of course; but they were updated asynchronously to the menu system or something like that.  Not so great an idea, it turns out. :P )  I will say this: I struggled for a long time trying to puzzle out what the hell datatypes a menu item needs to refer to, and how to forward declare them all.  I recall leaving off that project sort of in the middle of a refactoring, trying to get a more general menu system set up (menus as objects (structs) in PROGMEM, linked to each other by direct pointers).  Well, it seems I figured that out over the years, the reverb menu system went together fairly easily I think.  And abuses pointers just as much as the lords Kernighan and Ritchie intended. ;D

Anyway, interrupts; not sure if AVR-Dx can do that either.  The CPUINT is a ton simpler than XMEGA, but more complicated than MEGA.  Maybe something like, poking the priority registers from within the ISR, would do.


Quote
I chose the AVR128DB64 due to its flash capacity, and the fact that it seems to run very fine overclocked at 32Mhz :)
Also, the ATtiny1614 is a bit overclocked as well, running at 20Mhz. But that seems no problem, running at room temperature.
All microcontrollers run from a 3.3V regulator, except the ATtiny212 which runs at the LiPo battery voltage. A logic level conversion circuit consisting of a voltage divider and a diode is used to prevent damaging the USART pins of the AVR128DB64.

Oh yeah, heard about that... haven't tested it yet myself, but it is interesting that the clock select register has those two suspiciously "reserved" options on it.  Most of the time, "reserved" means "we didn't put anything there so don't even worry about it", but you always wonder... :D

And yep, clock mainly a problem at process corners, low voltage and high temperature.  Nice to be able to run 32MHz without needing an XMEGA, just that little edge.  Nice too that it runs from 5V supply; the core I think is isolated from IOVCC (hence SLPCTRL.VREGCTRL), also why they can offer so much hardware around it, guessing a fine pitch (<<100nm?) low voltage (1-1.8V?) or whatever sort of core, then level shifters to all the IOs.  Really nice that they pulled that off without a VCORE bypass cap wasting a pin (or several!).

And the IOs aren't anything special, pretty much like classic ATMEGAs, the whatever, 5-15ns risetime (with a slew rate option!), about what you'd expect at 5V.  (Kind of dangerously relevant, compared to the ~20ns pulse width of CLK_PER; this was almost an issue in my S-D totalizer, which also goes through a good 30 or 40ns of delays in the ADC and isolators, so is basically a whole clock out by the time it gets back on chip.  Managed to solve that by merely enabling the CCL synchronizer.  Tested with hot air on each component (ADC, iso, MCU), the count is rock steady with it on. :) )

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline mariush

  • Super Contributor
  • ***
  • Posts: 4983
  • Country: ro
  • .
Wouldn't it be easier to use 48000 Hz for audio?   Would seem to me like easier to work with, as it's better divisible by 8 , by 100...

Did you look at micros like DSPIC33CH ?  https://www.digikey.com/en/products/detail/microchip-technology/DSPIC33CH128MP203-I-M5/9357071
You get two cores, one up to 180 Mhz, one up to 200 Mhz, 152 KB of flash storage, 20 KB of ram (16 KB for first core, 4 KB for the second)  ... could easily use one of the cores for the audio processing part , or for the video part...

 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21609
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
I think you're misunderstanding the purpose of 8-bit systems in projects like this... it's a restriction challenge.  At least, I'm assuming that's relevant here.

Like, why put in basically a 486 or Pentium CPU?  You could run Doom or Quake on that thing even, probably (well, not without serious modification, with so little RAM, but some kind of stripped down graphics demo based on that, perhaps).  Why not a whole ass ARM64 and build a PC?  It's not like it costs much more! ;)

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf