Author Topic: Why I can not do it even after more than 30 years (Read 11287 times)

ali_asadzadeh · « **on:** June 28, 2017, 01:01:51 pm »

Hi,
When I was a kid I remember that I was playing with the Nintendo

https://en.wikipedia.org/wiki/Nintendo_Entertainment_System

They have used an 8-bit CPU running the code in the order of 2MHz, the cartridges had at least 4 games like super mario, Contra etc...

I think they were under 512KB of rom,

Now I can code in Cortex M, in the order of 200MHz processing power, have about 2MB of internal flash and can have as much as 512MB external SDRAM or external Flash, but even now, have segger emwin support, I cannot achieve what they have achieved 30 years ago!!! actually how do they did it? how did they implemented the games! in such low density and low capable devices! They had sound too…

I member they got hot ( because they used 7805 as their main Regulator) and we used to turn a fan on the whole console to be able to play more than 3 hours!

Rerouter · « **Reply #1 on:** June 28, 2017, 01:12:34 pm »

They did it by optimising for every last clock cycle, and offloading parts to external hardware, e.g. does the CPU need to control the screen, or just push rendered frames to a memory buffer,

Do you need every bit of an image stored, or can you reduce the colour palette and reuse a lot of your backgrounds

With a 32 bit micro you can far exceed what they accomplished, but you cant use the same approaches, as ARM is a different architecture to what was used at the time,

E.g. a lowly atmel sam 3x8e can push 16 bit colour to a 480x320 pixel lcd at over 100Hz using just its dma, and some cycle code, meaning almost all the computing power is left free to handle things like rendering, collisions, and player input.

You can even use things like run length encoding on what you need to read to render an image, reducing the file size without much code size change.

Your a single person, not a team of specilised engineers, you can accomplish it, but dont expect it to be as easy as an arduino library.

coppice · « **Reply #2 on:** June 28, 2017, 01:18:56 pm »

Carefully adapted screen refresh hardware was a big factor in getting good results from these early machines. It had to be fairly low complexity, because of cost, but it also had to minimise the effort of the CPU needed to keep the screen updated. They only had to manipulate a few bits per pixel, as these were not deep colour machines, and the resolution wasn't that great. These things substantially reduce the CPU workload. Also, as Rerouter commented, they were really aggressive in optimising every spare cycle out of heavily used sections of code.

Mechatrommer · « **Reply #3 on:** June 28, 2017, 01:44:11 pm »

Quote from: ali_asadzadeh on June 28, 2017, 01:01:51 pm

Now I can code in Cortex M

coding (learning the language, all the syntaxes and features) is the chapter 1 in programming, data structure and algorithm is chapter 2, optimizing them is chapter 3. optimize color by using pallete/table. lookup table of limited size and do extrapolation, algorithm of simplest division or square root, using fixed integer in place of floating point math, decision between ram limited vs cpu computation (flash size) limited etc etc is something you need to familiarize yourself with if you want to maximize the resources at hand. game engine (and hardware specific for cross platformness) is chapter 4. those who programmed nintendo were sharp people that chapter 3 and above is their bread and butter, not someone who think he can program something.

rstofer · « **Reply #4 on:** June 28, 2017, 01:56:53 pm »

Google for something like 'architecture of pacman' and there are sites with the Z80 source code and hardware layout descriptions. The Z80 could run as high as 6 MHz but didn't need to for PacMan. This was smokin' fast compared to the 2 MHz 8080.

I bought one of the Commodore computers and there was a game project included. One thing I learned is the concept of a super loop. The computer ran the loop for every frame. The thing is, it took more than one pass through the loop to complete an update. Some things were updated on one pass, other things on a different pass.

Here is a great description of PacMan - note that the Z80 only ran at 3.072 MHz.

https://www.lomont.org/Software/Games/PacMan/PacmanEmulation.pdf

Sprites are the tricky bit of the hardware. This allows an image (a sprite) to move over the top of the background image. This hardware was available on the Commodore and it may have been available on the Apple II - memory fades...

Here are the schematics for PacMan. Note that there is a barn full of logic outside the Z80:

http://www.classicgaming.cc/classics/pac-man/files/tech/Pac-Man-Schematics.pdf

The main site:

http://www.classicgaming.cc/classics/pac-man/technical-info

I am pretty sure there are similar sources for other games. I picked on PacMan because I have a great FPGA implementation using the T80 core (Z80) from OpenCores.org. I doubt that it is the same as this project:

http://arcade.gadgetfactory.net/index.php?n=Main.PacManHelp

Google for 'fpga pacman' ...

NivagSwerdna · « **Reply #5 on:** June 28, 2017, 01:58:30 pm »

Games hardware was often architected to be effective within the specific application area it was targeted rather than being general purpose so you find hardware solutions to what you are trying to achieve e.g. NES had a dedicated video controller (PPU) and also sound hardware. In Arcade machines you often see similar... e.g. early Konami machines had a Z80 for the game code but a secondary Z80 for sound processing and even then some of the sound functions were performed by dedicated hardware.

But...

Maybe you aren't trying hard enough... modern processors (and bus bandwidths) are so high you should be able to achieve the same games with emulation.

rstofer · « **Reply #6 on:** June 28, 2017, 02:02:00 pm »

It is worth researching the MAME project:
http://mamedev.org/

They have a way to emulate the CPU and then tack on the various RAMs and ROMs. The result is a system that will play 100s of games in their original format. I built a MAME based arcade machine 10 or 11 years ago. Interesting project!

Mechatrommer · « **Reply #7 on:** June 28, 2017, 02:08:14 pm »

Quote from: rstofer on June 28, 2017, 01:56:53 pm

Sprites are the tricky bit of the hardware.

sprite is a concept in software development. you only draw where it needs to and save previous frame of that tiny area in memory for the next frame when the sprite moves away, much like swap/push and pop in stack/memory management, except this is graphical (2D) data instead of numeric/binary data. sprite will avoid the need of refreshing the whole screen everytime.

tooki · « **Reply #8 on:** June 28, 2017, 02:41:13 pm »

Quote from: ali_asadzadeh on June 28, 2017, 01:01:51 pm

Hi,
When I was a kid I remember that I was playing with the Nintendo

https://en.wikipedia.org/wiki/Nintendo_Entertainment_System

They have used an 8-bit CPU running the code in the order of 2MHz, the cartridges had at least 4 games like super mario, Contra etc...

I think they were under 512KB of rom,

Now I can code in Cortex M, in the order of 200MHz processing power, have about 2MB of internal flash and can have as much as 512MB external SDRAM or external Flash, but even now, have segger emwin support, I cannot achieve what they have achieved 30 years ago!!! actually how do they did it? how did they implemented the games! in such low density and low capable devices! They had sound too…

I member they got hot ( because they used 7805 as their main Regulator) and we used to turn a fan on the whole console to be able to play more than 3 hours!

Ultimately it boils down to abstraction: modern systems use many layers of abstraction, which make them easier to program, more robust, and more secure. (And the code is far more portable.)

They did so much with so little by basically writing it all in assembler with every clock cycle optimized, every trick in the book used, etc. Very efficient, but also very difficult, very error prone, and damned near impossible to port to other hardware - you have to write the entire thing from scratch.

madires · « **Reply #9 on:** June 28, 2017, 02:42:57 pm »

They didn't link a bunch of fancy libs to create bloatware

IMHO the best starting point for learning to program is to start with an 8 bit MCU and a lot of resource restraints.

LaserSteve · « **Reply #10 on:** June 28, 2017, 03:04:03 pm »

This page, from a game developer demi-god, is worth reading to understand how to think "High Performance" on minimal hardware. He's talking vector, not raster, but the model of how to think high performance, is similar. He has quite a few more similar pages on his site.

http://www.jmargolin.com/vgens/vgens.htm

Steve

DBecker · « **Reply #11 on:** June 28, 2017, 04:55:38 pm »

I started in the late '70s using hand-assembled 6502 code and counting cycles.

You could do quite a bit with the system, but it was a constant search for short-cuts.

As an example, you won't see a modern system without nice fonts. Back then there wasn't "fonts", there was a single font. If it was an advanced system, it might have lower case. It was a rare system with lower case descenders (e.g. 'y' extends below the baseline). The typical system had a 5x7 glyph in an 8x8 box. That meant that character position calculation was a bit operation, avoiding even an add. By today's standards, it looked like crap. But it used several orders of magnitude less memory, and at least an order of magnitude less processor power.

Bounds checking was also minimal. Today the rule is that everything is bounds checked. You'll get dinged in a code review if every input and every array reference isn't checked. Even if you can prove the bounds, the argument is that the code might change. Back then you only checked critical values that were likely to both be wrong and cause problems. You might accept that divide-by-zero will regularly occur and not put a check in the code if the only result was e.g. a display glitch. That's not lazy coding -- it actually takes much more thought, and can result in much faster code with a predictable run-time. But like a F1 car, the next person that works on the code needs to understand that everything is tied together -- a frame tube might also be carrying coolant and a vital part of the aerodynamics.

rstofer · « **Reply #12 on:** June 28, 2017, 05:57:48 pm »

Figure out the video and sprite layers first. Consider a background (layer 0) with a moving cloud drifting by (layer 1). Then add an airplane (layer 2) in front of that cloud and add another cloud at the top layer (layer 3) that the airplane flies behind.

So, the layers have a priority order. On the airplane layer, there is only the airplane, every other space is 'clear'. Same for the cloud layers. Now all you have to do is combine the layers in priority order.

Remember, early arcade games didn't have a lot of resolution and therefore, there weren't all that many pixels to deal with.

Siwastaja · « **Reply #13 on:** June 28, 2017, 06:30:25 pm »

1) Specialized, even though trivial, HW blocks that offloaded the CPU (often even on the cartridge, to provide game-specific HW acceleration). Granted, modern MCUs have more advanced HW acceleration features, but they might not be optimal for building games; also, using them efficiently requires skill (just like it did back then).

2) Today's bloated coding standard and general lack of skill (buzzwords over understanding), using many layers of poorly-designed libraries and run-time "frameworks", with the mindset of little thought given to how the actual machine runs the code. While this is rationalized by explaining that it's necessary to increase code readibility, maintainability, portability and reusability, sadly, way too often, the modern code fails to achieve these goals, by being difficult to write and maintain, so the added complexity may be there for no reason, just from the habit, because "this is the way it's done today". Most young coders are too accustomed to "the modern ways", because this been going on for more than 15 years, to the level that even talking about it this way is a taboo subject (no one wants to have the way they practice their profession totally questioned!)

The issue here is actually the modern paradigm: "Write the code to describe your intentions and let the compiler handle the implementation; do not try to describe the operations you want the machine to do". This is very widely spread and universally seen as the truth, and it indeed sounds like a very nice way to think.

However, while this paradigm is quite right in that it makes little sense to always micromanage each code line today, you can still easily gain 1000x performance increases by knowing the right general idea how the machine performs its operation, so you won't do it in a generally very stupid way, which happens at level too high so that the compiler couldn't optimize it, because you are asking for a slightly different result (or a radically different way for the exact same result) than you'd be asking for if you knew what you actually wanted the machine to perform.

When I write software where performance is important, I can often see how changes in the way of thinking and rewriting the slow part can easily achieve 100x performance increases. I couldn't do that if I didn't have an idea how CPUs, memory and caches work, or if I had a mindset of describing my intentions about the end result only. And this 100x difference is even without any bloatware in the middle, because I won't use it in the first place. It can easily make another 100x difference to ditch all the unnecessary, complicating shit.

If your code does something in a second, it's often not fundamentally impossible to do it in a millisecond instead, and it doesn't usually require writing assembler or counting single clock cycles or whatever; because those are the tricks that often give you the final 20-50% performance gain if you are not satisfied with the 1000x increase you got by a sane, knowledgeable design. I very rarely need those tricks, although I do sometimes. It's usually something related to a specific instruction in the instruction set, around which I can build a higher-level idea, that cannot be expressed in C directly so that the compiler could do it automatically.

T3sl4co1l · « **Reply #14 on:** June 28, 2017, 06:52:55 pm »

Besides optimization and language / library differences, ponder this:

A Cortex-M4 is very roughly on par with the CPU of a Pentium PC, or Nintendo 64.

The memory is very different: much less ROM and RAM on a stock micro -- but similar amounts can be added when external interfaces are available. The peripherals are much more limited as well: a truly huge amount of bandwidth is dedicated to visual output, whether it's a PC's 1MB+ SVGA video device, or the N64's RPU.

Ponder this as well:

When I was growing up, I had a 486 PC (25MHz, 32 bits -- slower than a typical M4, but similar capability otherwise). Among other things, I taught myself 2D and 3D graphics programming, in QBasic.

Now, QBasic runs in 16 bit mode, and is interpreted (even when compiled to EXE, it works by calling the same subroutines). The run-time library is very heavy weight: every function call is FAR type (a 32 bit address, with dozens of clock cycles of overhead to execute the CALL instruction), and even a basic arithmetic expression might consume dozens of these function calls. (The floating point support is particularly slow: the run-time doesn't autodetect an FPU, so there's a half dozen CALLs just to test if it can use the hardware FPU, or if it has to branch to another part of the library that calculates it in software!)

The best performance I got, with a stock QBasic implementation, was something like 20 frames per second with pretty simple models. It wasn't optimal by any means, I didn't know things that in-depth at the time -- but we're talking on the order of 100kFLOPs (floating point operations per second) here, out of a 25MHz clock. That's slow! (The 486DX, by the way, completes hardware floating operations in a few dozen cycles. Not nearly as fast as today's CPUs, but no slouch -- and a great improvement on the ~300 cycles typical of what QBasic arithmetic was designed for: the 8087 FPU.)

For perspective:

I came back to these programs, here and there, over the years. On a Pentium, with fixed point arithmetic and assembler subroutines for the most intensive functions (the innermost loop in the rendering process, and writing graphics to the video buffer), I approached screen refresh rate: 40 to 70 FPS.

Later still, I "ported" the whole thing (everything from the outer control loop to the innermost render loop) into pure assembler. Also using floating point with optimal calculations (not stuffed away in subroutines). This pushed higher still. Although it also helped that I had a modern CPU at that time: a 1666MHz AMD Duron ran this at 450 FPS.

(That fits the entire program, and all its data, entirely into CPU caches. Only video writes and device I/O has to touch the bus. The CPU is at a disadvantage, running in 16 bit mode still -- but it's clearly capable of executing about one instruction per cycle, on average!)

So, in summary: you can always burden yourself with exponentially slower, while still semantically correct, code. (That is: the code still accomplishes the same exact task, but through very different sequences of instructions.) You can always add layers of abstraction. But each added layer slows down your program by some factor. Shake loose from those layers of overhead, and you can harness the pure speed of your CPU. You don't need to write in cryptic languages, like assembly, to achieve this: C does just fine. You do need to avoid using excessively heavy libraries and function calls, when a simple expression or loop will do.

Tim

igendel · « **Reply #15 on:** June 28, 2017, 06:56:57 pm »

For those interested in hard-code [edit: I meant hard-core!] optimization, I recommend the "Graphics Programming Black Book" by Michael Abrash (one of the programmers of Quake). Some of it is hopelessly outdated (tips and tricks for the 286/386 processor), but the rest can be a real eye opener.

http://www.drdobbs.com/parallel/graphics-programming-black-book/184404919

rstofer · « **Reply #16 on:** June 28, 2017, 07:54:45 pm »

Quote from: igendel on June 28, 2017, 06:56:57 pm

For those interested in hard-code [edit: I meant hard-core!] optimization, I recommend the "Graphics Programming Black Book" by Michael Abrash (one of the programmers of Quake). Some of it is hopelessly outdated (tips and tricks for the 286/386 processor), but the rest can be a real eye opener.

http://www.drdobbs.com/parallel/graphics-programming-black-book/184404919

Thanks for the link. I downloaded the files and started reading. The early chapters are quite an exposition!

C · « **Reply #17 on:** June 28, 2017, 08:40:37 pm »

Simple hardware & software can make a huge difference.

One of my Z80s had a programmable character generator. That is just a fancy way of saying that the font rom that contained 8 rows of 8 bits, was replaced with a ram chip. Sounds simple but actually required a few chips & some logic to get it to work in video display. A one byte write changed a character on screen.

When you look at VGA displays, most think of pixel that has color information in a memory array. It's built to display pictures & video.
A CAD system back then wanted to display data like a digital scope, with low compute power. A simple black & white pixel plane in a memory array, one per channel. This makes data display faster with less CPU and makes pictures & video take more CPU power.
A cheap CAD system could have one or more of these planes driving a video DAC. The Very high end CAD system could have be one CPU per plane.

When you moved up from low end systems back then, you could find some very nice stuff if you wanted to get work done.
TurboDos systems with greater then 16 Z80's in a system box that networked to many other system boxes like this. That is 16 CP/M users per box.
The 68k was really built to be many CPU's using a system bus with priority vector interrupt system.
The 68K system I used could expand from 8 slot system to 58 slots. And that was the recommended limit not actual hardware limit. Money was the big limit back then. This single user desktop system had at times had over $6000 in memory boards installed at times. This system existed before the PCXT

Todays systems leave out a lot that was common in some places back then.

ali_asadzadeh · « **Reply #18 on:** June 29, 2017, 11:42:44 am »

Thanks for your responses

I'm kinda professional in my own ways, I had designed systems with real time limits, mainly Protection relays with more than 60K lines of code, that they are monsters in their own ways, I should do optimization for a lot of things, calculating RMS,FFT,Frequency, time cure, event recording, modbus communications etc... Here you can see a sample Design from the Side view,

The problem is that I do not know how to write a game, how a game engine works! and how super mario would do it's voodoo thing!

BradC · « **Reply #19 on:** June 29, 2017, 11:55:56 am »

Quote from: ali_asadzadeh on June 29, 2017, 11:42:44 am

The problem is that I do not know how to write a game, how a game engine works! and how super mario would do it's voodoo thing!

I can not recommend this :

http://www.ic0nstrux.com/products/books/game-programming-for-the-propeller-powered-hydra-book

.. highly enough. It breaks game development down into the absolute basic building blocks, starting from ground zero. Grab a Parallax Propeller Demo board, and you can pretty much do all the stuff in the book step by step.

I'm not much into games, and I've spent most of my life writing them rather than playing with them, but when I was into the Propeller a few years ago I got a Parallax gift voucher and bought that book. If I was starting out from scratch, that'd be an awesome place to start.

FreddyVictor · « **Reply #20 on:** June 29, 2017, 12:25:20 pm »

Quote from: rstofer on June 28, 2017, 01:56:53 pm

Google for something like 'architecture of pacman' and there are sites with the Z80 source code and hardware layout descriptions. The Z80 could run as high as 6 MHz but didn't need to for PacMan. This was smokin' fast compared to the 2 MHz 8080.

alternatively, he could just have a peek at some code for STM32F429Disco: http://mikrocontroller.bplaced.net/wordpress/?page_id=3014
there's also a version for F7Disco aswell
not sure about running it on an M

Probably not the best way to learn tho'

Mechatrommer · « **Reply #21 on:** June 29, 2017, 12:43:29 pm »

Quote from: ali_asadzadeh on June 29, 2017, 11:42:44 am

Thanks for your responses I'm kinda professional in my own ways, I had designed systems with real time limits, mainly Protection relays with more than 60K lines of code, that they are monsters in their own ways, I should do optimization for a lot of things, calculating RMS,FFT,Frequency, time cure, event recording, modbus communications etc... Here you can see a sample Design from the Side view,
The problem is that I do not know how to write a game, how a game engine works! and how super mario would do it's voodoo thing!

software engineering or computing science is a broad subject. real time control is very well different to database management (sql) application, so does game and many other areas. a game developer may screwed up in realtime control system and vice versa... you can invent or imagine your own game/physic engine to define the reality of your applications, but as many will advice, we learn from experts, besides the free links above, you may get the specialized book for the subject. i have a book that i havent read for many many years, i got interest but when thinking the outcome, i thought its not worth of my effort...

https://www.amazon.com/Programming-2D-Games-Charles-Kelly/dp/146650868X
https://www.amazon.com/Cutting-Edge-3d-Game-Programming-C/dp/1883577705
https://www.amazon.com/Building-3D-Game-Engine-C/dp/0471123269

https://en.wikipedia.org/wiki/Game_engine
https://en.wikipedia.org/wiki/Physics_engine

although the nintendo back then may not be as bloated as current game development, so it took expertise to decide/optimize game engine for a particular system. game engine for nintendo may not be as fancy OOP as current game engine development, its probably highly coupled/locked/dependent to the hardware, as said every bits of codes and tables are tuned to that specific system probably unapplicable to other system. fwiw...

Nusa · « **Reply #22 on:** June 29, 2017, 01:24:49 pm »

Quote from: Mechatrommer on June 28, 2017, 02:08:14 pm

Quote from: rstofer on June 28, 2017, 01:56:53 pm
Sprites are the tricky bit of the hardware.
sprite is a concept in software development. you only draw where it needs to and save previous frame of that tiny area in memory for the next frame when the sprite moves away, much like swap/push and pop in stack/memory management, except this is graphical (2D) data instead of numeric/binary data. sprite will avoid the need of refreshing the whole screen everytime.

That's a modern implementation using powerful CPU's and ignoring the history of sprites. Originally it was implemented as a hardware feature that merged small bitmaps into the video image. You moved them by changing their coordinates. You animated them by changing the bitmap and/or palette. The hardware took care of actually displaying them. Most consoles from the 70s, 80s, and 90s had hardware sprite features....the details varied by platform.

rstofer · « **Reply #23 on:** June 29, 2017, 01:37:45 pm »

One thing that needs to be discussed is the type of game. An arcade game is an entirely different animal than, say, Unreal Tournament.

The arcade game probably uses specialized hardware for the graphics display and a rather common microcomputer for the game play. It is merely an issue of moving sprites across the board and detecting certain events.

PC (or play station) type games are entirely different and a LOT more complex with lighting, textures, physics and a host of other details that enrich the game beyond a simple arcade game. There's an entire industry built up around toolkits like Unreal Engine 3 (and there are other free engines as well):

https://www.unrealengine.com/what-is-unreal-engine-4

But developing these games is done at a much higher level because the graphics details are handled by the engine. If you need light, just plug a light into the map. Control it if you want (switch) but, basically, the engine handles the rendering of shadows and such (ray tracing).

I have 3 books on Unreal Tournament development (all obsolete) and they must total a few thousand pages. There are video tutorials all over the Internet.

I had forgotten about the Parallax Propeller. I have the game kit and played with it a few years ago. It was possible to build some fairly simple games with not a whole lot of effort.

JanJansen · « **Reply #24 on:** June 29, 2017, 02:37:33 pm »

This is a intresting topic.
I saw a YouTube movie once from a chinese who used the DSPICFJ128GP802 or DSPICFJ64GP802, he had Mario playing with sound on tv in 1 chip,
i cannot find the movie anymore, maybe he removed it ?

I am busy programming games also, whats so magic about it ?,
i,m also intrested in building a NES in 1 chip ( something Nintendo themself wont do, they using Linux overkill ),
there is little info available on how the NES roms exactly work with the hardware, i,m supprised about that.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Why I can not do it even after more than 30 years (Read 11287 times)

Share me