Author Topic: any simple blitter around? (looking for FPGA design)  (Read 11431 times)

0 Members and 1 Guest are viewing this topic.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
any simple blitter around? (looking for FPGA design)
« on: August 21, 2018, 03:51:43 pm »
The name - blitter - comes from the bit blit operation of the 1973 Xerox Alto for a coprocessor dedicated to the rapid movement and modification of data within a computer's memory;  in fact, a blitter can do a lot of useful tasks in parallel, and it can draw patterned lines with a variety of textures, and can even draw them in a special way for simple area fill

I am interested in these two features:
- to draw a line
- to fill an area



ummmm maybe the project "Amiga Minimig" is a reference, it includes the Amiga-500's blitter, the code is opensource and it's written in Verilog.

any other blitter around?  :popcorn:
« Last Edit: August 21, 2018, 07:57:08 pm by legacy »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: any simple blitter around? (looking for FPGA design)
« Reply #1 on: August 21, 2018, 09:28:01 pm »
This had me thinking, when was the big shift from bit-plane graphics (which really need a blitter to work) to the current "pixel per word"  graphics. 

I guess that happened when memory chip bandwidth exceeded the required display bandwidth, and 32-bit address spaces became the norm, rather than the likes of a 64k window at memory segment 0xA000...
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #2 on: August 21, 2018, 10:24:54 pm »
64k

I'd like to add more features to my VDU (video display unit); currently, it's only a text display (able to scroll), and I am not willing to implement an external DDR1 video memory controller; DRAM is huge large and cheap, but very complex to be handled, therefore I'd like to use an external static ram (asynchronous bus) for both the text memory and the video memory, and the chip happens to be limited to 128Kbyte with a granularity of 8bit.

Besides, I'd like to have a hardware unit that draws lines and fill areas  :popcorn:
 

Offline Kleinstein

  • Super Contributor
  • ***
  • Posts: 14196
  • Country: de
Re: any simple blitter around? (looking for FPGA design)
« Reply #3 on: August 22, 2018, 06:02:58 am »
This had me thinking, when was the big shift from bit-plane graphics (which really need a blitter to work) to the current "pixel per word"  graphics. 

I guess that happened when memory chip bandwidth exceeded the required display bandwidth, and 32-bit address spaces became the norm, rather than the likes of a 64k window at memory segment 0xA000...
Those separate bit planes were not very popular  more like a specialty of the Amiga. There might have been a few really old ones before 1985. Other computers sometimes had something like a 4 color mode with 4 pixels packed in 1 byte. The main point to give up bit planes was when memory got cheap enough to have 1 byte per pixel.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7733
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #4 on: August 22, 2018, 06:25:25 am »
Atari 8 bit computers, Commodore 64, Apple II, Macintosh, Next.  Any computer before the age or minimum 256 colors per pixel.  This even includes PCs 16 colors and 2 color graphics modes which all had multiple pixels compacted into each byte.

The newer full color Next computers had a special graphics processor which allowed mixed color bit plane modes and non-palleted byte/pixel 8/16/24 bit modes in each individual window allowing backwards compatibility with the first Next B&W grey modes with newer color applications.

I would say for simplicity sake, just do 32 bit per pixel and use the first 24 bits, or a 16 bit per pixel blitter.

As for drawing, I don't know how smart your ram controller is, but, when drawing, writing single bytes is a wasteful number of clock cycles on the memory unless your ram controller is very-very smart and has a large write cache.  Reading memory for drawing the display is easier as the reads are in straight lines and read bursts of 32-64-or-128 pixels in a single shot will be super efficient for the ram.  Drawing a vertical line will be the worst as you need to address a separate byte every time you draw 1 new pixel on the line, and, if your ram controller cant do write masking, this may include a read-modify-write cycle.

Ignore all the ram timing junk I just stated if you are working at 320x240, or, 640x480 as in these slow ass modes, you can get away with writing at any slow speed.

Drawing lines and fills are as simple as an x&y loop counters with a start, stop & increment size for each counter operating in either true floating point, or, a m.n integer counter.  (This is not for horizontal or vertical lines as you only need to increment by one for those, the m.n counters are lines at any angle!)  Creating a second A to B coordinate line counter running within the first which resets to the beginning after every pixel is drawn in the firs A-B line counter, drawing a line to the second A-B line engine allows you to render a filled any 4 sided polygon, or 3 sided one is on of the lines has a length of 0.

Creating circles and ovals gets more complicated fast, but using the m.n counters adapting the m.n values every increment with some simple clever integer math located elsewhere on this forum, you can draw perfect circles and ovals just as fast as rectangles with some effort.  (The magic of FPGA and hard coded functions wired to a task...)

I think these might go way beyond the Amiga's Agnus's blitter's capabilities as it was primarily a simple rectangle memory copy/fill device with a transparency stencil.  It's been too long since I've done development with an Amiga, but, to replicate just those rectangle functions in a 16bit/32bit per pixel environment, it would be nothing more than a page or 2 of simple Verilog coding.  It's everything else on how it's wired to your ram and how you feed it instructions which makes the difference.  You are talking about a source counter with an X1 start position, and an X2 destination counter with a Z size to copy.  If you want, you can also implement a Y loop counter, and for every Y count, you have an X1 source increment and a X2 destination increment which allows a 1 rectangle shape from a source bitmap being rendered into a destination bitmap with a different X number of pixels in width.
« Last Edit: August 22, 2018, 07:04:27 am by BrianHG »
 
The following users thanked this post: Fredderic

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21681
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: any simple blitter around? (looking for FPGA design)
« Reply #5 on: August 22, 2018, 09:36:09 am »
Those separate bit planes were not very popular  more like a specialty of the Amiga. There might have been a few really old ones before 1985. Other computers sometimes had something like a 4 color mode with 4 pixels packed in 1 byte. The main point to give up bit planes was when memory got cheap enough to have 1 byte per pixel.

EGA and VGA (except for mode 0x13) did bit planes as well, 16 color palette from 64 or 262k possible choices.  No hardware blitting, aside from bytewise access with logic operations (for masking and such).

Other crazy ideas: compositing layers, sprites, transforms (e.g., SNES mode 7), etc.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #6 on: August 22, 2018, 10:00:10 am »
Drawing lines and fills are as simple as an x&y loop counters with a start, stop & increment size for each counter operating in either true floating point, or, a m.n integer counter


this is the prototype in C to draw a line between two points on the display, just to see how the algorithm goes

Code: [Select]
private uint32_t quanto = 1;

/*
 * draw a line between two points on the display
 */
void DDA_doline
(
    uint32_t x1,
    uint32_t y1,
    uint32_t x2,
    uint32_t y2
)
{
    sint32_t dx;
    sint32_t dy;
    sint32_t stepx;
    sint32_t stepy;
    sint32_t fraction;
    uint32_t screen_x;
    uint32_t screen_y;
    uint32_t response;

    /*
     * calculate differential form
     *
     *  dy   y2 - y1
     *  -- = -------
     *  dx   x2 - x1
     *
     */

    /*
     * take differences
     */
    dy       = y2 - y1;
    dx       = x2 - x1;

    screen_x = x1;
    screen_y = y1;

    /*
     * dy is negative
     */
    if (dy < 0)
    {
        dy    = -dy;
        stepy = -1;
    }
    else
    {
        stepy = 1;
    }

    /*
     * dx is negative
     */
    if (dx < 0)
    {
        dx    = -dx;
        stepx = -1;
    }
    else
    {
        stepx = 1;
    }

    dx = dx shiftLeft quanto;
    dy = dy shiftLeft quanto;

    /*
     * draw initial position
     */
    screen_pixel_xy(screen_x, screen_y);

    /*
     * draw next positions until end
     */
    if (dx > dy)
    {
        /*
         * take fraction
         */
        fraction = dy - (dx shiftRight quanto);
        while (screen_x isNotEqualTo x2)
        {
            if (fraction >= 0)
            {
                screen_y += stepy;
                fraction -= dx;
            }
            screen_x += stepx;
            fraction += dy;

            /*
             * draw calculated point
             */
            screen_pixel_xy(screen_x, screen_y);
        }
    }
    else
    {
        /*
         * take fraction
         */
        fraction = dx - (dy shiftRight quanto);
        while (screen_y isNotEqualTo y2)
        {
            if (fraction >= 0)
            {
                screen_x += stepx;
                fraction -= dy;
            }
            screen_y += stepy;
            fraction += dx;

            /*
             * draw calculated point
             */
            screen_pixel_xy(screen_x, screen_y);
        }
    }
}



Code: [Select]
test1, screen size = 39 x 39
------------------[screen]------------------
|......................................*|
|.....................................*.|
|....................................*..|
|...................................*...|
|..................................*....|
|.................................*.....|
|................................*......|
|...............................*.......|
|..............................*........|
|.............................*.........|
|............................*..........|
|...........................*...........|
|..........................*............|
|.........................*.............|
|........................*..............|
|.......................*...............|
|......................*................|
|.....................*.................|
|....................*..................|
|...................*...................|
|..................*....................|
|.................*.....................|
|................*......................|
|...............*.......................|
|..............*........................|
|.............*.........................|
|............*..........................|
|...........*...........................|
|..........*............................|
|.........*.............................|
|........*..............................|
|.......*...............................|
|......*................................|
|.....*.................................|
|....*..................................|
|...*...................................|
|..*....................................|
|.*.....................................|
|*......................................|
 

Offline dmills

  • Super Contributor
  • ***
  • Posts: 2093
  • Country: gb
Re: any simple blitter around? (looking for FPGA design)
« Reply #7 on: August 22, 2018, 12:52:36 pm »
Thats Breshenans line drawing algorithm, there is also a circle drawing variant and an improved line drawing one that exploits the symmetry.

"Graphics Gems" by Glassner (Academic Press) is IMHO the bible for CPU based graphics stuff, dated by todays standards but very, very good stuff when you lack a modern graphics processor.

Regards, Dan.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7733
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #8 on: August 22, 2018, 08:20:58 pm »
Drawing lines and fills are as simple as an x&y loop counters with a start, stop & increment size for each counter operating in either true floating point, or, a m.n integer counter



this is the prototype in C to draw a line between two points on the display, just to see how the algorithm goes

Code: [Select]
private uint32_t quanto = 1;

/*
 * draw a line between two points on the display
 */
void DDA_doline
(
    uint32_t x1,
    uint32_t y1,
    uint32_t x2,
    uint32_t y2
)
{
    sint32_t dx;
    sint32_t dy;
    sint32_t stepx;
    sint32_t stepy;
    sint32_t fraction;
    uint32_t screen_x;
    uint32_t screen_y;
    uint32_t response;

    /*
     * calculate differential form
     *
     *  dy   y2 - y1
     *  -- = -------
     *  dx   x2 - x1
     *
     */

    /*
     * take differences
     */
    dy       = y2 - y1;
    dx       = x2 - x1;

    screen_x = x1;
    screen_y = y1;

    /*
     * dy is negative
     */
    if (dy < 0)
    {
        dy    = -dy;
        stepy = -1;
    }
    else
    {
        stepy = 1;
    }

    /*
     * dx is negative
     */
    if (dx < 0)
    {
        dx    = -dx;
        stepx = -1;
    }
    else
    {
        stepx = 1;
    }

    dx = dx shiftLeft quanto;
    dy = dy shiftLeft quanto;

    /*
     * draw initial position
     */
    screen_pixel_xy(screen_x, screen_y);

    /*
     * draw next positions until end
     */
    if (dx > dy)
    {
        /*
         * take fraction
         */
        fraction = dy - (dx shiftRight quanto);
        while (screen_x isNotEqualTo x2)
        {
            if (fraction >= 0)
            {
                screen_y += stepy;
                fraction -= dx;
            }
            screen_x += stepx;
            fraction += dy;

            /*
             * draw calculated point
             */
            screen_pixel_xy(screen_x, screen_y);
        }
    }
    else
    {
        /*
         * take fraction
         */
        fraction = dx - (dy shiftRight quanto);
        while (screen_y isNotEqualTo y2)
        {
            if (fraction >= 0)
            {
                screen_x += stepx;
                fraction -= dy;
            }
            screen_y += stepy;
            fraction += dx;

            /*
             * draw calculated point
             */
            screen_pixel_xy(screen_x, screen_y);
        }
    }
}



Code: [Select]
test1, screen size = 39 x 39
------------------[screen]------------------
|......................................*|
|.....................................*.|
|....................................*..|
|...................................*...|
|..................................*....|
|.................................*.....|
|................................*......|
|...............................*.......|
|..............................*........|
|.............................*.........|
|............................*..........|
|...........................*...........|
|..........................*............|
|.........................*.............|
|........................*..............|
|.......................*...............|
|......................*................|
|.....................*.................|
|....................*..................|
|...................*...................|
|..................*....................|
|.................*.....................|
|................*......................|
|...............*.......................|
|..............*........................|
|.............*.........................|
|............*..........................|
|...........*...........................|
|..........*............................|
|.........*.............................|
|........*..............................|
|.......*...............................|
|......*................................|
|.....*.................................|
|....*..................................|
|...*...................................|
|..*....................................|
|.*.....................................|
|*......................................|

You got the basics of it, in fact, in verilog/vhdl, just use a huge single 32 or 48 bit x and y counter.  Use the upper 10-12 bits to point to your pixel position and the lower bits for the fractional position.  The screen pointers to those top 10-12 bits are nothing more than wires into the right place.  An yes, that same chunk of C code will be about the same size in Verilog.

For the verilog code, I would have an input instruction & data port which allows you to set each variable register (leave instruction space to select fill colors and source/destination memory type fills as you expand your blitter, I figure 8 bits address space for 256 commands) and output port to point to ram address, output wire for busy and ready.  This will allow your onboard cpu to instruct your blitter.  Also, with a busy and ready line out, for instructing your blitter, you may add a simple FIFO and cache blitter instructions opening up the cpu to fill a bunch of line commands in advance while the blitter is busy drawing.  Don't forget the fill color register in the command list and output color.

(Further future optimization) More sophisticated techniques may be to have a 2 stage parallel bank of setup registers inside your blitter so while you are drawing, the FIFO can still fill all of the next set of source line registers, then once the current line is drawn, in a single clock, that next prepared bank of line drawing registers will all transferred to the active ones in a single clock and get drawing immediately, no wait state.  This also allows you to edit only some of the registers in the setup command instructions for your blitter retaining the un-changed registers as you instruct new line geometries making repetitious line feature updates even faster.

Other thing, a blitter like this can be made to hardware decompress a pre-fabricated RLE compressed 16/24bit color image.  Any non-picture based graphics with wide repetitious lines and some random pixels all over the place will decode at near video playback speeds with this.  At 640x480, perhaps full 16 bit playback at video speeds.
 
« Last Edit: August 22, 2018, 08:35:10 pm by BrianHG »
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #9 on: August 23, 2018, 03:08:20 pm »
how to draw & fill a triangle? hints?
 

Offline dave j

  • Regular Contributor
  • *
  • Posts: 128
  • Country: gb
Re: any simple blitter around? (looking for FPGA design)
« Reply #10 on: August 23, 2018, 04:00:27 pm »
how to draw & fill a triangle? hints?

Drawing a triangle is just drawing three lines.

Search for "triangle scan conversion". There are many ways to do it and it's something that is covered on computer graphics courses so there are loads of resources describing them. I don't know much about FPGAs so can't advise which method would be most suitable for you (try searching with FPGA as well).
I'm not David L Jones. Apparently I actually do have to point this out.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #11 on: August 23, 2018, 07:15:21 pm »
Drawing a triangle is just drawing three lines.

drawing is easy, but the problem is how to fill it with an efficient hardware algorithm.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7733
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #12 on: August 23, 2018, 08:39:35 pm »
drawing is easy, but the problem is how to fill it with an efficient hardware algorithm.

That's the magic thing.
Here is an easy un-optomized approach to create your own:
Can you expand your line code to hold 3 coordinates?

In an outer loop, work out the coordinates for a line from coordinates A to C, step 1 pixel.
Inside that line algorithm, draw a line beginning at the current pixel coordinates of the outer line generator to position B.  (Don't erase the position for the outer line algorithm)
Loop back to continue to the next pixel in line A to C, then draw the next inner line from the next coordinates to position B.

This will draw a filled triangle.  It however will be 1 color, or, you can create a simple 3 coordinate smooth fill blend color algorithm with 3 separate RGB values at each end point A,B, & C in the triangle.  This however wont fill a graphic texture.  That feature is beyond my knowledge and the size of your FPGA.

I prefer holding 4 coordinates & drawing 4 sided filled polygons instead of triangles.  It only requires 1 more register coordinates, and 1 more line generating counter in the outer loop algorithm & you can now draw rectangles at any angle, or, make coordinates C & D at the same point, and you have a triangle.

Though inefficient for triangles, (if you use 4 coordinates polygons, rectangles are completely efficient) this hardware triangle rendering will be done so fast that it would run circles around anything you can program the 68000 to do manually by software, especially if you include RGB faded transitioning.

Example, a system clock of 200Mhz means your algorithm will fill or draw 200 million pixels a second if your ram controller can keep up.
I suspect you will get 25-50 million RGB pixels per second.

Actually, if you always make the beginning of both lines in the 4 coordinate, 2 line filled polygon the center corner in your triangle, the filled triangle should be filled pretty efficiently.  I haven played with filling algorithms in years, so you may need some fiddling around to ensure there are no empty pixels inside the fill when doing the 4 side polygon trick.
« Last Edit: August 23, 2018, 09:32:40 pm by BrianHG »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: any simple blitter around? (looking for FPGA design)
« Reply #13 on: August 23, 2018, 09:55:52 pm »
Clipping to the visible screen is also a pain.

For filled objects I used to have one primitive shape, and to oversimplify it, each shape was defined by six numbers - x_left_start, x_left_end, x_right_start, x_right_end, y_top, y_bottom. Any triangle can be decomposed into at most two of these objects - Just cut across horizontally from the middle point in the Y value.

To un-simplify it a bit..

In the filling routines the x values were actually composite values, containing the four values needed to define the edge, including any DDA error values - the x_value,  x_whole_step, x_error_step, x_accumulated_error, max_error.  The variables x_left_end and x_right_end are  not helpful when actually drawing.

https://www.tutorialspoint.com/computer_graphics/line_generation_algorithm.htm

Also, one important thing is decide "where" your are sampling your shapes to get your pixels, and how your filling starts and end at the edges. This may seem like a very stupid statement, but back in the '80s, when writing 3D graphics involved assembler it was very, very important to get this right before you coded anything.

For example, using an image from the link above, the circles may be the sampling points used for get pixel values, and are aligned with the integer coordinates:


If you your pixels are aligned with your drawing coordinates (as in this picture), then drawing 1-pixel wide lines is easy. The toughest it gets is deciding if you draw the first or last pixel on a line. But if you fill shapes, and two shapes of different colours share the same vertical or horizontal boundary, then the shape that is drawn last will 'own' the pixels on the boundary causing all sorts of weird effects. A common solution was to offset your sampling by half a pixel.

Think about it like sketching a bitmap icon on graph paper - you take the colour at the center of the square not the average colour at where the lines cross, which might be the intersection of up to four different colours - you are offsetting the sampling point by (+0.5, +0.5) from the integer coordinates.

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: BrianHG

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7733
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #14 on: August 24, 2018, 02:22:31 am »
https://www.tutorialspoint.com/computer_graphics/line_generation_algorithm.htm

It's been so long, but for efficient fast single color, single angled line generation, I remember playing with something like this on my Atari 800 way, way, way back.  This is still something you can easily implement in Verilog with registers, adders and look ahead carry if statements and get 1 new pixel out every system clock cycle.

Now we just need to bug 'Legacy' to shove 10 of these home made modules into his FPGA and optimize it to  200Mhz and he will have a line rendering algorithm which can render 2 billion pixels a second.  ;)
How 'Legacy' decides to pipe those to his video memory now becomes the miracle...
How 'Legacy' gets his onboard 68000 to keep filling the registers of those 10 line generators will be another miracle in itself...
He will probably need to wire the command ports to his onboard dram, prepare in advance an entire string of line commands in sequence in that dram, and DMA in all of them to keep the line generators continuously active.
« Last Edit: August 24, 2018, 02:28:18 am by BrianHG »
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #15 on: August 24, 2018, 10:19:39 am »
in Verilog

VHDL

200Mhz ... 68000

Spartan3 @ 50Mhz at the moment, it will be switched to a Spartan6 @ 100Mhz
but it's not for 68000, it's for my own made Softcore Arise-v2 (sort of RISC-design without pipeline)

The above pic with 68SEC000 & FPGA was used just to quote a project that shows lines of Verilog about a documented (by Amiga) blitter.
 
need to wire the command ports to his onboard dram, prepare in advance an entire string of line commands in sequence in that dram, and DMA in all of them to keep the line generators continuously active.

currently, this is a big design problem. I have two FPGAs on my project, and they are not on the same PCB.

  • one FPGA is used for the softcore Arise-V2, the debugger Otaku-v3, PSX's Mouse and matrix keyboard controller(1) and basic devices (UART, timer, COPs, etc) ... COP0 is for exceptions, COP1 is for DMA, COP3 is for Cordic, COP4 is for DSP fixed-point saturated math ... this FPGA is 89% full
  • one FPGA is used for the VDU, the PHY (VGA, it will be replaced by LVDS for LCDs), font-ROM, video text-ram (by BRAM), interface to the external static-RAM for the framebuffer ram (up to 4 chip of 512Kbyte means 2Mbyte, but this is the max I can do) and here is where the blitter & cooper stuff will be implemented

(1) I can free up to 1.4% of resources by moving the Playstation1's PAD/Mouse (it uses a proprietary protocol, made by SONY, basically it's like SPI) and the matrix keyboard controller into an external CPLD, or it could be reimplemented in firmware, by a little MPU, like PIC/AVR8; connecting then it by serial port to the FPGA. Well, PC-Keyboards and mouses are connected via serial-PS/2, so ... 

Code: [Select]
____________             ____________
|            |           |            |_____
|            |  link1    |            |     |
|            |==========>|            | PHY |==VGA== LCD
|    SoC     |  link2    |    VDU     |_____|
|  Arise-v2  |<=========>|            |
|            |  IRQ      |            |
|            |<----------|            |
|____________|           |____________|
   |      |                 |      |
   | dbug |                 | SRAM |
   |______|                 |______|

Therefore the problem is the isolation between FPGA1 (the SoC) and FGPA2 (the VDU). Currently, these two FPGAs communicate by two super fast serial links that are "fast", but ... not so fast for video stuff.

Links are synchronous full duplex serial at 32bit data-size 5Mbps manchester-encoded

typedef struct
{
color_t color;
uint32_t x;
uint32_t y;
} point_t;

Via the first link, the SoC sends commands like
- outchar(char,row,col,color)
- scroll_x
- scroll_y
- draw_line(Point1,Point2)
- draw_rectangle(Point1, Point2, Point3, Point4, is_filled)

These commands are put into a queue and served as soon as the engine is able to do it.
And an interrupt can be issued to tell the CPU the queue has become empty, thus ready again to process new commands.

The second link is used by the CPU to directly access the video ram, thus it's not on CPU's bus, everything needs to go through the serial line, that is n-times slower (say, approximately 5Mbps vs 500Mbs, it's 100 times slower at least) than the local bus.

I have a few problems to solve, i.e. how to connect the framebuffer ram to the CPU in a better way, this implies how to put the two FPGAs on the same PCB, in order to use a parallel-bus approach but it's pin-resources consuming .... 32bit for the data_in, 32bit for the data_out, and 22 bit for the address, 4 bit for the control line ... not so good, but it would be faster than a serial line
« Last Edit: August 24, 2018, 06:15:27 pm by legacy »
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #16 on: August 24, 2018, 10:24:23 am »
Can you expand your line code to hold 3 coordinates?

done, implemented  :D

I am testing it on both the C prototype and VHDL and I am going to test it on the RTL simulator
There are now four mini-blitter in parallel, for the trick you suggested (thanks!)

Code: [Select]
------------------[screen]------------------
|.......................................*|
|.....................................**.|
|...................................**.*.|
|.................................**..*..|
|...............................**....*..|
|.............................**.....*...|
|...........................**.......*...|
|.........................**........*....|
|.......................**..........*....|
|.....................**...........*.....|
|...................**.............*.....|
|.................**..............*......|
|...............**................*......|
|.............**.................*.......|
|...........**...................*.......|
|.........**....................*........|
|.......**......................*........|
|.....**.......................*.........|
|...**.........................*.........|
|.**..........................*..........|
|*............................*..........|
|.*..........................*...........|
|..*.........................*...........|
|...*.......................*............|
|....*......................*............|
|.....*....................*.............|
|......*...................*.............|
|.......*.................*..............|
|........*................*..............|
|.........*..............*...............|
|..........*.............*...............|
|...........*...........*................|
|............*..........*................|
|.............*........*.................|
|..............*.......*.................|
|...............*.....*..................|
|................*....*..................|
|.................*..*...................|
|..................*.*...................|
|...................*....................|

(the solid filling part is not yet implemented, probably I will use a cooper to do it)
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2732
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #17 on: August 24, 2018, 04:40:49 pm »
Links are synchronous full duplex serial at 32bit data-size 5Mbps manchester-encoded
Why so slow? Most relatively modern FPGAs can do 600+ Mbps LVDS per diff pair, and they scale quite nicely onto multiple parallel lanes (though PCB routing becomes progressively harder as you add more lanes). If that isn't fast enough, a lot of FPGAs have versions with MGTs which can do 3-6Gbit/s per lane. Both options are trivial to implement if you have control over both sides.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #18 on: August 24, 2018, 05:27:14 pm »
Why so slow?

the FPGA is clocked @ 50 Mhz. Also, the PCB and the debugger stuff (i.e. LA) are problematic for me.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: any simple blitter around? (looking for FPGA design)
« Reply #19 on: August 24, 2018, 05:34:38 pm »
Code: [Select]
------------------[screen]------------------
|........................................|
|........................................|
|........................................|
|..+++++++++++++++++++++++++++++++++++*..|
|..++++++++++++++++++++++++++++++++++*+..|
|..++++++++++++++++++++++++++++++++***+..|
|..+++++++++++++++++++++++++++++++*.*++..|
|..++++++++++++++++++++++++++++++*..*++..|
|..+++++++++++++++++++++++++++++*..*+++..|
|..++++++++++++++++++++++++++++*...*+++..|
|..++++++++++++++++++++++++++**...*++++..|
|..+++++++++++++++++++++++++*.....*++++..|
|..++++++++++++++++++++++++*.....*+++++..|
|..+++++++++++++++++++++++*......*+++++..|
|..++++++++++++++++++++++*......*++++++..|
|..++++++++++++++++++++**.......*++++++..|
|..+++++++++++++++++++*........*+++++++..|
|..++++++++++++++++++*.........*+++++++..|
|..+++++++++++++++++*.........*++++++++..|
|..++++++++++++++++*..........*++++++++..|
|..++++++++++++++**..........*+++++++++..|
|..+++++++++++++*............*+++++++++..|
|..++++++++++++*............*++++++++++..|
|..+++++++++++*.............*++++++++++..|
|..++++++++++*.............*+++++++++++..|
|..++++++++**..............*+++++++++++..|
|..+++++++*...............*++++++++++++..|
|..++++++*................*++++++++++++..|
|..+++++*................*+++++++++++++..|
|..++++*.................*+++++++++++++..|
|..++**.................*++++++++++++++..|
|..+*...................*++++++++++++++..|
|..**..................*+++++++++++++++..|
|..++***...............*+++++++++++++++..|
|..+++++***...........*++++++++++++++++..|
|..++++++++**.........*++++++++++++++++..|
|..++++++++++***.....*+++++++++++++++++..|
|..+++++++++++++***..*+++++++++++++++++..|
|..++++++++++++++++**++++++++++++++++++..|
|........................................|

so, using the cooper-unit to fill the triangle/rectangle is not as simple and efficient as it seems ... 
I need to re-think about it
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2732
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #20 on: August 24, 2018, 06:18:42 pm »
the FPGA is clocked @ 50 Mhz.
That doesn't really matter - that's what PLL/MCMM is for. For MGTs you would need a dedicated jow-jitter clock source, but for LVDS it's not important.

Also, the PCB and the debugger stuff (i.e. LA) are problematic for me.
You can use ILA IP core for debugging. As for PCB traces - differential traces are quite forgiving as long as you stay below 1 Gbps area because of termination. Just about any 4+ layer stackup will do just fine, even without controlled impedance.

Offline Gribo

  • Frequent Contributor
  • **
  • Posts: 629
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #21 on: August 24, 2018, 07:07:10 pm »
Your fill rate is probably limited by the VDU's clock. You can replicate the rasterizer block 3 times (the LVDS/VGA PHY is the 4th) and divide the work between them at the expanse of a more complicated SRAM interface. Have the PHY access the SRAM in a round robin manner, and the rasterizers to use the other devices.
Older VGA cards usually used dual port RAM for this, but it is very expansive.


I am available for freelance work.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7733
  • Country: ca
Re: any simple blitter around? (looking for FPGA design)
« Reply #22 on: August 24, 2018, 08:22:21 pm »
    currently, this is a big design problem. I have two FPGAs on my project, and they are not on the same PCB.

    • one FPGA is used for the VDU, the PHY (VGA, it will be replaced by LVDS for LCDs), font-ROM, video text-ram (by BRAM), interface to the external static-RAM for the framebuffer ram (up to 4 chip of 512Kbyte means 2Mbyte, but this is the max I can do) and here is where the blitter & cooper stuff will be implemented

    The entire command port cache (say 32 bit command serial port connection with fifo as well), and the parallel buffer structure (4x2x3 x color x fill ) + active  buffer (4x2x3 + color + filled (around 410 bits)), nothing more than a if (ready to draw next triangle & go command) (triangle structure active(410 bits)) = (triangle structure filled from serial port(410 bits)) registers for the triangle filing engine are all on the same chip, so, yes, you can cache and buffer the geometric engine like this so while a slow filled triangle is being drawn, you are still able to fill the current triangle.  Once filled, 1 clock to shift over those 410 bits and continue immediately.  Only the command port fifo, optional on the GPU FPGA, may be whatever size you like since it might take multiple words to fill and instruct that 410bit structure.  It might be easier to skip the fifo and a 1 single prior 410bit structure which you actively fill, then shift along when ready.  The idea is the CPU shouldn't have to wait for a triangle to be filled as it sends out commands, then processes the next triangle.

    Like you said, filling is slow.  You need to paint each pixel.  There should be no problem rendering a line at a time if you simultaneously draw 2 sides of the triangle at the same time, but, this get really complicated when rendering the bottom line of the triangle.  I doubt you will be able to fit this in a small fpga achieving full clock speeds.  This exceeds my knowledge on optimal triangle engines.

    Another possible optimal way may be to do a paint style fill, but you need to examine the display memory contents and it wont work right if there are other items on the screen.
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #23 on: August 24, 2018, 09:10:04 pm »
    I've searched around, and, a different easier approach may be a 3D tracing rendered.  Basically, you scan a rectangle box of the area on the screen where any part of the triangle you want to fill is located (using the outer coordinates of the triangle) and as you scan that rectangle, for each pixel as you go, all you are trying to solve is if the current X,Y coordinate is located inside the triangle, if so, draw the pixel, otherwise don't.  The advantage her is in the future, if you want to go 3D, it is easier to have triangles larger than your screen coordinates and you don't need to scan of the screen active area when filling.  The waste here is a slim huge 45 degree triangle wastes a lot of time scanning the large rectangle which has mostly empty pixels.

    « Last Edit: August 24, 2018, 09:12:14 pm by BrianHG »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #24 on: August 27, 2018, 10:28:54 am »
    Basically, you scan a rectangle box of the area on the screen where any part of the triangle you want to fill is located (using the outer coordinates of the triangle)

    Amiga does a trick with the blitter by a simple circuit that does some bit manipulation when it copies the bit-plane into the DMA-buffer ready to go to the video memory, but this trick is ... not so good and full of defectives. E.g. it's not able to fill a triangle if one of the borders has even points  :palm:

    I have recently bought "Amiga System Programmer's Guide", it shows this kind of problem and tricks to mitigate it, but I believe the problem needs a radically different approach.

    i.e. it needs to think about what "fill a triangle" means, which well ... for a given triangle of vertexes { V1(x,y),V2(x,y),V3(x,y) }, and a point P(x,y), it's almost like the finding "sort-of-weights" (what?) that tell us how much of P's X coordinate is made of { V1, V2, V3 }, and also the same for P's Y coordinate:

    function( P(x,y), V1(x,y),V2(x,y),V3(x,y) ) ----> { W1, W2, W3 }

    Three numbers to tell:

    W1: how much of P(x,y)'s coordinates is made of V1(x,y) ?
    W2: how much of P(x,y)'s coordinates is made of V2(x,y) ?
    W3: how much of P(x,y)'s coordinates is made of V3(x,y) ?
    (with "sort-of-weights" I mean this)

    I don't know it can be generalized for a generic polygon, but for a triangle it's like solving a linear system of three equations, whose denominators of the first two equations are the same, whose expression of the third equation is the sum of W1,W2,W2, and whose solution { W1,W2,W2 } has an intriguing property:

    if P(x,y) is actually inside of the triangle, then
    range( W1 ) is { 0 .. 1 }
    range( W2 ) is { 0 .. 1 }
    range( W3 ) is { 0 .. 1 }

    if P(x,y) is actually outside of the triangle, then at least one of { W1, W2, W3 } will be negative!

    This can be used as an easy check to see if a point lies inside or outside of a triangle  :D







    All good, and exciting, except ...
    1) computing { W1,W2,W2 } require six dot-multiplications(1), and two divisions
    2) this must be applied to all the points for the rectangle that contains the triangle

    life is complex :palm: :palm: :palm:


    (1) e.g. dot(v1, v2) = (v1.x * v2.x + v1.x * v2.y) + (v1.y * v2.x + v1.y * v2.y)
    « Last Edit: August 27, 2018, 03:36:05 pm by legacy »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #25 on: August 31, 2018, 04:48:13 pm »
    Code: [Select]
    |........................................|
    |.....................................*..|
    |....................................**..|
    |..................................***...|
    |.................................*..*...|
    |................................*...*...|
    |...............................*...*....|
    |..............................*....*....|
    |............................**.....*....|
    |...........................*......*.....|
    |..........................*.......*.....|
    |.........................*........*.....|
    |........................*.........*.....|
    |......................**.........*......|
    |.....................*...........*......|
    |....................*............*......|
    |...................*............*.......|
    |..................*.............*.......|
    |................**..............*.......|
    |...............*...............*........|
    |..............*................*........|
    |.............*.................*........|
    |............*.................*.........|
    |..........**..................*.........|
    |.........*....................*.........|
    |........*....................*..........|
    |.......*.....................*..........|
    |......*......................*..........|
    |....**.......................*..........|
    |...*........................*...........|
    |..**........................*...........|
    |....****....................*...........|
    |........****...............*............|
    |............****...........*............|
    |................****.......*............|
    |....................****..*.............|
    |........................***.............|
    |........................................|

    Code: [Select]
    |........................................|
    |.....................................*..|
    |.................................... *..|
    |..................................* *...|
    |.................................   *...|
    |................................    *...|
    |...............................    *....|
    |..............................     *....|
    |............................*      *....|
    |...........................       *.....|
    |..........................        *.....|
    |.........................         *.....|
    |........................          *.....|
    |......................*          *......|
    |.....................            *......|
    |....................             *......|
    |...................             *.......|
    |..................              *.......|
    |................*               *.......|
    |...............                *........|
    |..............                 *........|
    |.............                  *........|
    |............                  *.........|
    |..........*                   *.........|
    |.........                     *.........|
    |........                     *..........|
    |.......                      *..........|
    |......                       *..........|
    |....*                        *..........|
    |...                         *...........|
    |..                          *...........|
    |....                        *...........|
    |........                   *............|
    |............               *............|
    |................           *............|
    |....................      *.............|
    |........................  *.............|
    |........................................|

    Code: [Select]
    |........................................|
    |........................................|
    |........................................|
    |..-----------------------------------+..|
    |..----------------------------------+-..|
    |..---------------------------------++-..|
    |..----------------------------------+-..|
    |..------------------------------------..|
    |..-----------------------------+---+--..|
    |..----------------------------+----+--..|
    |..---------------------------+--@-----..|
    |..-----------------------------@--+---..|
    |..----------------------------@@--+---..|
    |..-----------------------+---@@@------..|
    |..----------------------+--@@@@-------..|
    |..---------------------+--@@@@@--+----..|
    |..-----------------------@@@@@@-------..|
    |..----------------------@@@@@@@-------..|
    |..-----------------+---@@@@@@@--+-----..|
    |..----------------+--@@@@@@@@@--------..|
    |..---------------+--@@@@@@@@@@--------..|
    |..-----------------@@@@@@@@@@--+------..|
    |..----------------@@@@@@@@@@@--+------..|
    |..-----------+--@@@@@@@@@@@@@---------..|
    |..----------+--@@@@@@@@@@@@@--+-------..|
    |..---------+--@@@@@@@@@@@@@@--+-------..|
    |..-----------@@@@@@@@@@@@@@@----------..|
    |..----------@@@@@@@@@@@@@@@--+--------..|
    |..-----+--@@@@@@@@@@@@@@@@@--+--------..|
    |..----+--@@@@@@@@@@@@@@@@@@-----------..|
    |..---+----@@@@@@@@@@@@@@@@------------..|
    |..------------@@@@@@@@@@@@--+---------..|
    |..-+--------------@@@@@@@@------------..|
    |..----++--------------@@@-------------..|
    |..---------+---------------+----------..|
    |..------------++----------------------..|
    |..----------------++------------------..|
    |..--------------------++--+-----------..|
    |..------------------------------------..|
    |........................................|

    Code: [Select]
    |........................................|
    |........................................|
    |........................................|
    |..------------------------------------..|
    |..----------------------------------2-..|
    |..---------------------------------22-..|
    |..--------------------------------222-..|
    |..-------------------------------222--..|
    |..-----------------------------22222--..|
    |..----------------------------+22222--..|
    |..---------------------------+++@++---..|
    |..--------------------------+++@+++---..|
    |..-------------------------+++@@+++---..|
    |..-----------------------++++@@@++----..|
    |..----------------------+++@@@@+++----..|
    |..---------------------+++@@@@@+++----..|
    |..--------------------+++@@@@@@++-----..|
    |..-------------------+++@@@#@@@++-----..|
    |..-----------------++++@@@##@@+++-----..|
    |..----------------+++@@@###@@@++------..|
    |..---------------+++@@@####@@@++------..|
    |..--------------+++@@@#####@@+++------..|
    |..-------------+++@@@#####@@@+++------..|
    |..-----------+++@@@@######@@@++-------..|
    |..----------+++@@@########@@+++-------..|
    |..---------+++@@@########@@@+++-------..|
    |..--------+++@@@#########@@@++--------..|
    |..-------+++@@@##########@@+++--------..|
    |..-----11+@@@@@@@#######@@@+++--------..|
    |..----111@@@@@@@@@@@@###@@@++---------..|
    |..---111++@@@@@@@@@@@@@@@@+++---------..|
    |..--1111++++++@@@@@@@@@@@@+++---------..|
    |..-11111++++++++++@@@@@@@@+0----------..|
    |..----1+++++++++++++++@@@+00----------..|
    |..---------++++++++++++++000----------..|
    |..------------++++++++++000-----------..|
    |..----------------+++++0000-----------..|
    |..--------------------00000-----------..|
    |..------------------------------------..|
    |........................................|

    Code: [Select]
    ------------------[screen]------------------
    |........................................|
    |........................................|
    |........................................|
    |..----------------------------------- ..|
    |..---------------------------------- -..|
    |..---------------------------------  -..|
    |..--------------------------------   -..|
    |..-------------------------------   --..|
    |..-----------------------------     --..|
    |..----------------------------      --..|
    |..---------------------------      ---..|
    |..--------------------------       ---..|
    |..-------------------------        ---..|
    |..-----------------------         ----..|
    |..----------------------          ----..|
    |..---------------------           ----..|
    |..--------------------           -----..|
    |..-------------------            -----..|
    |..-----------------              -----..|
    |..----------------              ------..|
    |..---------------               ------..|
    |..--------------                ------..|
    |..-------------                 ------..|
    |..-----------                  -------..|
    |..----------                   -------..|
    |..---------                    -------..|
    |..--------                    --------..|
    |..-------                     --------..|
    |..-----                       --------..|
    |..----                       ---------..|
    |..---                        ---------..|
    |..--                         ---------..|
    |..-                         ----------..|
    |..----                      ----------..|
    |..---------                 ----------..|
    |..------------             -----------..|
    |..----------------         -----------..|
    |..--------------------     -----------..|
    |..------------------------------------..|

    I have found, already implemented, simulated and synthesized an interpolation method to
    - identify points inside the triangle
    - identify points outside the triangle
    - interpolate colors among three vertices

    the implementation takes
    - 12 MULs in parallel
    - 3 DIVs in parallel
    - 80 clock's ticks to compute a point
    « Last Edit: August 31, 2018, 09:32:24 pm by legacy »
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #26 on: September 01, 2018, 11:30:10 pm »
    - 3 DIVs in parallel
    - 80 clock's ticks to compute a point

    What are the divs for?  You should be able to do everything with add/sub/multiply.
    You may have a delay of a few clocks, like 4 to 8, until the first point, but then every single clock should then create a new point.

    With divs, this should only add around 10 more clocks until the first new point, but then, once again, a new point should be coming out every single clock.

    Waiting 80 clocks for a point, the another 80 for the next in an FPGA?  You would do much better with a cheap DS-Pic MCU.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #27 on: September 02, 2018, 01:54:48 am »
    What are the divs for?

    to solve the last stage of the system of three linear equations to interpolate a point(x,y) inside/outside the triangle, you need to normalize the barycentric vector in order to find the correct RGB color of the final; the dividend to do that is not a precomputable constant, hence you need div.
    « Last Edit: September 02, 2018, 02:18:49 am by legacy »
     

    Offline Bruce Abbott

    • Frequent Contributor
    • **
    • Posts: 627
    • Country: nz
      • Bruce Abbott's R/C Models and Electronics
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #28 on: September 02, 2018, 03:58:16 am »
    I have found, already implemented, simulated and synthesized an interpolation method to
    - identify points inside the triangle
    - identify points outside the triangle
    - interpolate colors among three vertices
    What exactly do the 5 ascii diagrams represent?
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #29 on: September 02, 2018, 11:34:14 am »
    What exactly do the 5 ascii diagrams represent?

    It's the local buffer memory x-y of the pixel plane with a poor resolution of color since I am only representing white, black, and a couple of tone of grey. In a real device, it would be DMA-copied into the framebuffer as soon as the whole "le'ts trace and fill a triangle with interpolation" process ends.

    It's used in the HDL simulation, and it's one of the outputs from the test-bench, but I am willing to develop a module for GNU-plot to have x11-support and colors.

     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #30 on: September 02, 2018, 12:07:01 pm »
    Now I have a different circuit, definitely simpler and smaller, but also able unable to interpolate color-pixels, it just fills a triangle with a solid color as fast as it can.

    it's composed by
    - a circuit that calculates the min(x,y) set of points of the triangle
    - three line-fillers in parallel and each line-filler fills a table
    - a horizontal line-filler that use the table as input


    accordingly:
    - you give the vertices v1(x,y), v2(x,y), v3(x,y) of the triangle, coordinates are positive numbers)
    - the first stage calculates y.min as min({ v1.y, v2.y, v3.y })
     - the line-filler1 draws line { v1-v2 }, for each calculated point {x,y} it fills table[y-y.min]={x,_,_}
    - the line-filler2 draws line { v2-v3 }, for each calculated point {x,y} it fills table[y-y.min]={_,x,_}
    - the line-filler3 draws line { v3-v1 }, for each calculated point {x,y} it fills table[y-y.min]={_,_,x}
    - line-fillers work in paralel, and this draws the outline of the triangle
    - once completed, the last circuit scans the table, finds x_start,x_stop, and it draws horizontal lines

    each table line or is empty, or it always only shows these configurations
    Code: [Select]
    table[i]={x'1,x'2,__} <--- v1-v2: x_start=x'1, x_stop=x'2, y=i+y.min
    table[i]={x'1,__,x'3} <--- v3-v1: x_start=x'1, x_stop=x'3, y=i+y.min
    table[i]={__,x'2,x'3} <--- v3-v1: x_start=x'2, x_stop=x'3, y=i+y.min

    x'1 means x-point calculated by the line-filler1
    x'2 means x-point calculated by the line-filler2
    x'3 means x-point calculated by the line-filler3


    pro: damn faster, it takes a max of 3 clocks per pixel, and the implementation takes less area
    con: it requires an internal buffer for the table, and circuits to clear/set it, it doesn't interpolate pixel-colors
    « Last Edit: September 03, 2018, 06:03:29 am by legacy »
     

    Offline Gribo

    • Frequent Contributor
    • **
    • Posts: 629
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #31 on: September 02, 2018, 03:14:39 pm »
    Can't you do the color interpolation in the line drawing step? I assume all 3 lines have the same color.
    I am available for freelance work.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #32 on: September 02, 2018, 04:53:59 pm »
    I assume all 3 lines have the same color.

    correct  :D

    indeed this algorithm is simpler and faster, but can't do color-interpolation; vertices can only be of the same color and the triangle can only be filled with a solid color
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #33 on: September 02, 2018, 08:10:35 pm »
    You can think of it as 2 different steps.  Use your existing line filler.  Feed that output coordinates to a colorizer module which uses a blend of the set colors at outer coordinates of the triangle based on location to create a final paint color.

    I still don't see why this step needs any divides inside the filler, they are just needed 1 shot in the setup to create the X&Y transition/vs/position factors.  (You may be approaching the colorizing math in a different way than I am...)  I created a full 2D video scaling system with bilinear and bicubic video scaling and blending without any divides in any of the color processing.  The only divides where used in the setup, computed once by a CPU, to initialize the geometry for the graphics rendering system.  The rendering system was totally based on multiply, add, and, bit shifting.

    « Last Edit: September 02, 2018, 08:26:50 pm by BrianHG »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #34 on: September 02, 2018, 09:09:46 pm »
    Feed that output coordinates to a colorizer module which uses a blend of the set colors at outer coordinates of the triangle based on location to create a final paint color

    in my previous algorithm, a vertex was defined by position (x,y), and color.

    the line-filler is not aware of the color, it's only able to calculate line's pixels. To describe the pixel's color you need to know the propagation of the color made by each vertex, hence you need an interpolator.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #35 on: September 02, 2018, 10:04:00 pm »
    the interpolator needs to solve a system of three linear equations
    how do you solve A x = b ?

    - by Crammer?
    - by Gauss-Jordan Elimination?
    - by Gauss-Seidel ?
    - by Jacobi?
    - by LU-Factorization?
    - ...?
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #36 on: September 02, 2018, 10:54:04 pm »
    Feed that output coordinates to a colorizer module which uses a blend of the set colors at outer coordinates of the triangle based on location to create a final paint color

    in my previous algorithm, a vertex was defined by position (x,y), and color.

    the line-filler is not aware of the color, it's only able to calculate line's pixels. To describe the pixel's color you need to know the propagation of the color made by each vertex, hence you need an interpolator.

    I was thinking of a linear blend with 4 RGB/YUV references from 4 coordinates on the screen.  One method, the position X/Y sent from your vertex algorithm, of the screen's coordinates determines what weight of each of the 4 screen coordinates reference colors to blend together.  When using a triangle, the first 3 are taken from the triangles corners and the 4th, finalizing a rectangle of that triangle, is interpreted.  You can also deliberately mess with the RGB values in the fourth imaginary point to create an illusion of a shadow or light source towards the center of the triangle.

    All the math would be geared so that your fractions results are first multiplied, then divided by 65536, either before or after the summing depending on the number of bits you want your adders to be.  All fractions are actually also integers / 65536.  The goal is to get everything done in 1-2 clock operations avoiding true division at all costs.  If you are using a huge FPGA with nice pipelined fixed 5-6 clock floating point multiply/divide, then, divide away!

    However, this is a simple fast implementation for a small FPGA.  You will not get fancy non-linear filtering effects this way unless you implement curved lookup-tables, or, apply full floating point mathematics.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #37 on: September 03, 2018, 05:00:53 am »
    what weight of each of the 4 screen coordinates reference colors to blend together

    and to do that, you need to solve a system of linear equations, hence an interpolator whose implementation intrinsically needs a division unit.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #38 on: September 03, 2018, 06:34:54 am »
    nice pipelined fixed 5-6 clock floating point multiply/divide

    currently, I have implemented my own algorithm to multiply and divide fixed-point 32bit numbers with saturation. This algorithm does never overflow but it's not pipelined, it takes 1 clock cycle per bit, hence 32 clocks to complete the MUL/DIV. They are both required to solve the system of linear equations, for whose implementation I am using a modified version of Jacobi.

    The simplicity of the MUL-DIV unit (in term of VHDL code) is both good and bad: good, because it is relatively easy to understand and simulate; bad, because it is not typically good performant in practice if compared to faster MUL-DIV unit, e.g. a pipelined version (the link offers a good example of 16 bit pipelined MUL for Xilinx FPGAs)

    I have two FGPAs in my system, the first implements a Softcore, the second is dedicated to graphics, but a good point, I could recycle the Jacobi unit currently used for the blitter, making it available for the Softcore to compute a generic(1) system of linear equations.

    This needs a redesign of resources, but it would be nice  :D

    (1) generic? well ... not so confident it could be used for a general purpose algebra engine ... the original Jacobi's prerequisite for the convergence is that the matrix A is a diagonally dominant matrix for the problem A x = b. This is a sufficient but not necessary condition for the method to converge, hence I have modified the algorithm to estimate the best initial condition possible, and then to accelerate the convergence (in term of the number of interactions), and practically, for the interpolation problem, it seems converging very fast even if the Jacobi's prerequisite conditions are not satisfied. The interpolator has some particular characteristic in its matrix A, and the trick works, but the algorithm might blow up for other kinds of matrix A  :-//
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #39 on: September 03, 2018, 08:06:21 am »
    what weight of each of the 4 screen coordinates reference colors to blend together

    and to do that, you need to solve a system of linear equations, hence an interpolator whose implementation intrinsically needs a division unit.
    Are you talking about working out the necessary weights based on relative XY drawing coordinates inside the triangle?

    Though there is a bit of trickery to generate the 4 weights, it can be done using the fractional integer m.n math trick.

    Since I didn't generate your triangle fill algorithm,  I cannot see if your code can keep track of the fractional location inside the triangle going from 0/65535 (at triangles first left edge coordinates) to 65535/65535 (at the triangle's right edge coordinates).  (same thin for top and bottom coordinates)  Since every time I move a pixel on my raster scan fill, I can subtract either part of my main coordinate counters, or as part of separate fractional counters creating a floating point subtract dot by dot, or, line by line, which is exactly what a divide is, yet, no wasteful divide function.  These 4 fractional counters hold the 'weights' and using the top 8 bits of them when mixed with the appropriate RGB references at the ends of each line once again only requires an 8bit X 8bit integer number multiply, then use the top 8 bits of the 16 bit result.

    (This is an example, I'm using RGB as a 3 8 bit integer matrix, the weight# would be identical for each color channel)
    (1 clock example, slow FMAX)
    Code: [Select]
    RGB_out <= ((weight1 * RGB_point1) << 8) + ((weight2 * RGB_point2) << 8) + ((weight3 * RGB_point3) << 8) + ((weight4 * RGB_point4) << 8);
    To make the FMAX top speed, like 200Mhz, each weight multiply with RGB would first be stored in a temp register, then next, you would do a 4 way add of the temps to get a final color.  Or even faster, you would have 2 by 2 way adds to 2 new temps, then a final add of those 2 temps to your results.
    « Last Edit: September 03, 2018, 08:12:13 am by BrianHG »
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #40 on: September 03, 2018, 08:10:52 am »
    The first triangle-interpolator drawing algorithm was to look at every pixel in a bounding box around the triangle, then for each pixel, it calculated a vector to check if the point lies inside or outside of a triangle, and if inside, of which color it should be.

    This is neat and nice, but it wastes a lot of cycles for all the pixels not lining inside the triangle  :palm:

    Hence, an improvement to this comes by combining the three line-fillers in parallel to fill a table and the horizontal line-filler that use the table as input, with the interpolator.

    - a circuit that calculates the min(x,y) set of points of the triangle
    - three line-fillers in parallel and each line-filler fills a table
    - a horizontal line-filler that use the table as input

    vertices A,B,C ---> line fillers ---> table filled with the points of the triangle's outlines
    triangle's outlines ---> horizontal line-filler ---> table filled with triangle's internal points
    table of triangle's internal points ---> interpolator ---> bitplane(x,y,color)

    Ohhhhh, you are not rendering a filled triangle beginning on each new scan line by working out the the beginning and ending coordinates of the 2 opposing edges of the triangle.  You are doing things the hard way...  Including the use of placing items into a table then reading that data back work out the fill.  And even the hassle of using multiple line fillers in parallel.  Your work must be turning into a monster...
    « Last Edit: September 03, 2018, 08:16:06 am by BrianHG »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #41 on: September 03, 2018, 08:12:02 am »
    The first triangle-interpolator drawing algorithm was to look at every pixel in a bounding box around the triangle, then for each pixel, it calculated a vector to check if the point lies inside or outside of a triangle, and if inside, of which color it should be.

    This is neat and nice, but it wastes a lot of cycles for all the pixels not lining inside the triangle  :palm:

    Code: [Select]
    |........................................|
    |........................................|
    |........................................|
    |..------------------------------------..|
    |..----------------------------------2-..|
    |..---------------------------------22-..|
    |..--------------------------------222-..|
    |..-------------------------------222--..|
    |..-----------------------------22222--..|
    |..----------------------------+22222--..|
    |..---------------------------+++@++---..|
    |..--------------------------+++@+++---..|
    |..-------------------------+++@@+++---..|
    |..-----------------------++++@@@++----..|
    |..----------------------+++@@@@+++----..|
    |..---------------------+++@@@@@+++----..|
    |..--------------------+++@@@@@@++-----..|
    |..-------------------+++@@@#@@@++-----..|
    |..-----------------++++@@@##@@+++-----..|
    |..----------------+++@@@###@@@++------..|
    |..---------------+++@@@####@@@++------..|
    |..--------------+++@@@#####@@+++------..|
    |..-------------+++@@@#####@@@+++------..|
    |..-----------+++@@@@######@@@++-------..|
    |..----------+++@@@########@@+++-------..|
    |..---------+++@@@########@@@+++-------..|
    |..--------+++@@@#########@@@++--------..|
    |..-------+++@@@##########@@+++--------..|
    |..-----11+@@@@@@@#######@@@+++--------..|
    |..----111@@@@@@@@@@@@###@@@++---------..|
    |..---111++@@@@@@@@@@@@@@@@+++---------..|
    |..--1111++++++@@@@@@@@@@@@+++---------..|
    |..-11111++++++++++@@@@@@@@+0----------..|
    |..----1+++++++++++++++@@@+00----------..|
    |..---------++++++++++++++000----------..|
    |..------------++++++++++000-----------..|
    |..----------------+++++0000-----------..|
    |..--------------------00000-----------..|
    |..------------------------------------..|
    |........................................|

    points marked with '-' belong to the bounding box around the triangle but they lie outside the triangle, and the interpolator doesn't know it until it has wasted precious cycles to calculate it.

    Hence, an improvement to this comes by combining the interpolator with the three line-fillers in parallel to fill a table and the horizontal line-filler that uses the table as input for the interpolation.

    - a circuit that calculates the min(x,y) set of points of the triangle
    - three line-fillers in parallel and each line-filler fills a table
    - a horizontal line-filler that use the table as input

    vertices A,B,C ---> line fillers ---> table filled with the points of the triangle's outlines
    triangle's outlines ---> horizontal line-filler ---> table filled with triangle's internal points
    table of triangle's internal points ---> interpolator ---> bitplane(x,y,color)
    « Last Edit: September 03, 2018, 08:45:42 am by legacy »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #42 on: September 03, 2018, 08:26:27 am »
    are you talking about working out the necessary weights based on relative XY drawing coordinates inside the triangle?

    given
    - three vertices { Vertex1(x,y) is color1, Vertex2(x,y) is color2, Vertex3(x,y) is color3 }
    - an arbitrary point, P(x,y), inside of the triangle
    I am talking about the problem of how do you mathematically define its color
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #43 on: September 03, 2018, 08:32:07 am »
    Since you are not rasterizing a filled triangle in real time, you cannot perform a divide by subtracting a small integer as each dot/line is drawn, reseting those counters after reaching the edge of the triangle, and using those results as blend weights to interpolate the final color using my aforementioned matrix 4sum 8x8 multiply, shift by 8.  Ignore my recommendations unless you re-write your triangle rasterizer render fill directly like this from the beginning.

    Basically, as you move either across, or down the screen during rasterization, following the line p1(x,y) to p2(x,y), to p3(x,y), plus imaginary p4(x,y) when drawing, you have a 16 bit counter following the relative position between each possible X,Y vertices, beginning at 0 when closest to P1, then 65535 at P2, (being a matrix, there are 2 counters for x&y).  You may need more than 16 bits for really high resolutions.  Then use these 4x2 integers, the top 8 bits which go from 0-255, and feed them to the color mixer.
    « Last Edit: September 03, 2018, 08:41:52 am by BrianHG »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #44 on: September 03, 2018, 08:39:14 am »
    you cannot perform a divide by subtracting a small integer as each dot/line is drawn, reseting those counters after reaching the edge of the triangle, and using those results as blend weights to interpolate the final color using my aforementioned matrix 4sum 8x8 multiply, shift by 8.  Ignore my recommendations unless you re-write your triangle rasterizer render fill directly like this from the beginning.

    frankly, it not clear what you are talking about, and I haven't understood *HOW* you calculate the weight  :-//

    Code: [Select]
    RGB_out <= ((weight1 * RGB_point1) << 8)
             + ((weight2 * RGB_point2) << 8)
             + ((weight3 * RGB_point3) << 8)
             + ((weight4 * RGB_point4) << 8);
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #45 on: September 03, 2018, 08:55:46 am »
    Sorry, but if your triangle rasteriser isn't designed to loop and fill/draw the lines using a fractional counter, which I though you were, this becomes difficult to not possible.

    Basically, as you move either across, or down the screen during rasterization, following the line p1(x,y) to p2(x,y), to p3(x,y), plus optional imaginary p4(x,y) when drawing, you have a 16 bit counter following the relative position between each possible X,Y vertices, beginning at 0 when closest to P1, then 65535 at P2, (being a matrix, there are 2 counters for x&y).  You may need more than 16 bits for really high resolutions.  Then use these 4x2 integers, the top 8 bits which go from 0-255, and feed them to the color mixer.

    In other words, before you begin to draw, you solve once the how many times the number of X and Y screen pixels in each line of the triangle can divide into 65535 depending on X&Y screen pixels drawn.  When rendering the triangle, as you advance the X,Y painting position, you add/subtract those X&Y figures to the relative X&Y location counters.

    The 65535 is using multiple 16 bit counters and should deliver good results up to 640x480.  Anything larger and I would increase that to 20 or 24 bits.
    « Last Edit: September 03, 2018, 08:58:32 am by BrianHG »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #46 on: September 03, 2018, 09:01:01 am »
    You are doing things the hard way

    I am aware of cases where all the simplified intuitive algorithms I looked at were for likely nicer because easy to implement, but, under certain conditions, they tend to expose their bad side effects, e.g. when I shifted my triangle a bit, I saw the major problem with a simplified approach, specifically when the arbitrary point P lied directly on the straight line between two vertices and colors were all messed up  :palm: :palm: :palm:
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #47 on: September 03, 2018, 09:33:07 am »
    Damn, too bad you cannot always slice up a source triangle into right angle triangles with a vertical and horizontal face.  That would simplify the algorithm and life...
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #48 on: September 03, 2018, 02:22:46 pm »
    That would simplify the algorithm and life

    maybe, but I don't see how it solves the color-interpolation problem, haven't seen any mathematical evidence of your "tricks", and you haven't yet told how you calculate the weight  :-//


    hence I am going ahead with my solution, that at least is working on the paper, on the simulator with a rich test-cases, and on the FPGA  :popcorn:
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #49 on: September 03, 2018, 02:34:42 pm »
    you are not rendering a filled triangle beginning on each new scan line

    the interpolator can do it, but it's not convenient since it wastes 80 cycles for nothing when the pixel is not inside the triangle. By scanning a line the interpolator doesn't know if the pixel belongs to the triangle until it has calculated it.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #50 on: September 04, 2018, 02:09:18 pm »
    Code: [Select]
    |........................................|
    |........................................|
    |........................................|
    |.....................................x..|
    |....................................x*..|
    |..................................*xx...|
    |.................................*xxx...|
    |................................*xxx*...|
    |...............................xxxxx....|
    |..............................xxxxxx....|
    |............................*xxxxxx*....|
    |...........................*xxxxxxx.....|
    |..........................*xxxxxxxx.....|
    |.........................xxxxxxxxx*.....|
    |........................xxxxxxxxxx*.....|
    |......................*xxxxxxxxxxx......|
    |.....................*xxxxxxxxxxx*......|
    |....................*xxxxxxxxxxxx*......|
    |...................xxxxxxxxxxxxxx.......|
    |..................xxxxxxxxxxxxxx*.......|
    |................*xxxxxxxxxxxxxxx*.......|
    |...............*xxxxxxxxxxxxxxxx........|
    |..............*xxxxxxxxxxxxxxxxx........|
    |.............xxxxxxxxxxxxxxxxxx*........|
    |............xxxxxxxxxxxxxxxxxxx.........|
    |..........*xxxxxxxxxxxxxxxxxxxx.........|
    |.........*xxxxxxxxxxxxxxxxxxxx*.........|
    |........*xxxxxxxxxxxxxxxxxxxxx..........|
    |.......xxxxxxxxxxxxxxxxxxxxxxx..........|
    |......xxxxxxxxxxxxxxxxxxxxxxx*..........|
    |....*xxxxxxxxxxxxxxxxxxxxxxxx*..........|
    |...*xxxxxxxxxxxxxxxxxxxxxxxxx...........|
    |..xxxxxxxxxxxxxxxxxxxxxxxxxx*...........|
    |....**xxxxxxxxxxxxxxxxxxxxxx*...........|
    |........**xxxxxxxxxxxxxxxxxx............|
    |............**xxxxxxxxxxxxx*............|
    |................**xxxxxxxxx*............|
    |....................**xxxxx.............|
    |........................**x.............|
    |........................................|

    so, digging into the GE of the consoles made in the 90s, I 've found another way to fill the triangle by a module able to determine whether a point is "on", "above" or "below" a line, defined by 2 points; The module uses two multipliers, several adders, and combinational logic, and three of these modules are used to determine which side of the line formed by 2 of the vertices of the triangle the third point is on. By doing this for all three vertices, a set of three sides is used by the line finders in the drawing stage to check if the processed pixel is also on the same side of the three lines as the third point, and, If this is true for all the vertices of the triangle, the point is within it.

    bits market as '*' represent the correct outline of the triangle
    bits market as 'x' represent the output of the new module

    pro: it doesn't require a table, and it's fast
    con: it doesn't interpolate, and it's a bit defective on borders since it loses some bits.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #51 on: September 04, 2018, 02:14:52 pm »
    bits market as '*' represent the correct outline of the triangle
    bits market as 'x' represent the output of the new module

    I have pre-drawn the outline of the triangle with the correct method, and then over the same bitplane, I allowed the new module to redraw, and you can see it eats some bits, but not all the bits of the outline.

    I have to investigate for the reason for this  :-//
     

    Offline BrianHG

    • Super Contributor
    • ***
    • Posts: 7733
    • Country: ca
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #52 on: September 05, 2018, 01:22:30 am »
    Code: [Select]
    |........................................|
    |........................................|
    |........................................|
    |.....................................x..|
    |....................................x*..|
    |..................................*xx...|
    |.................................*xxx...|
    |................................*xxx*...|
    |...............................xxxxx....|
    |..............................xxxxxx....|
    |............................*xxxxxx*....|
    |...........................*xxxxxxx.....|
    |..........................*xxxxxxxx.....|
    |.........................xxxxxxxxx*.....|
    |........................xxxxxxxxxx*.....|
    |......................*xxxxxxxxxxx......|
    |.....................*xxxxxxxxxxx*......|
    |....................*xxxxxxxxxxxx*......|
    |...................xxxxxxxxxxxxxx.......|
    |..................xxxxxxxxxxxxxx*.......|
    |................*xxxxxxxxxxxxxxx*.......|
    |...............*xxxxxxxxxxxxxxxx........|
    |..............*xxxxxxxxxxxxxxxxx........|
    |.............xxxxxxxxxxxxxxxxxx*........|
    |............xxxxxxxxxxxxxxxxxxx.........|
    |..........*xxxxxxxxxxxxxxxxxxxx.........|
    |.........*xxxxxxxxxxxxxxxxxxxx*.........|
    |........*xxxxxxxxxxxxxxxxxxxxx..........|
    |.......xxxxxxxxxxxxxxxxxxxxxxx..........|
    |......xxxxxxxxxxxxxxxxxxxxxxx*..........|
    |....*xxxxxxxxxxxxxxxxxxxxxxxx*..........|
    |...*xxxxxxxxxxxxxxxxxxxxxxxxx...........|
    |..xxxxxxxxxxxxxxxxxxxxxxxxxx*...........|
    |....**xxxxxxxxxxxxxxxxxxxxxx*...........|
    |........**xxxxxxxxxxxxxxxxxx............|
    |............**xxxxxxxxxxxxx*............|
    |................**xxxxxxxxx*............|
    |....................**xxxxx.............|
    |........................**x.............|
    |........................................|

    so, digging into the GE of the consoles made in the 90s, I 've found another way to fill the triangle by a module able to determine whether a point is "on", "above" or "below" a line, defined by 2 points; The module uses two multipliers, several adders, and combinational logic, and three of these modules are used to determine which side of the line formed by 2 of the vertices of the triangle the third point is on. By doing this for all three vertices, a set of three sides is used by the line finders in the drawing stage to check if the processed pixel is also on the same side of the three lines as the third point, and, If this is true for all the vertices of the triangle, the point is within it.

    bits market as '*' represent the correct outline of the triangle
    bits market as 'x' represent the output of the new module

    pro: it doesn't require a table, and it's fast
    con: it doesn't interpolate, and it's a bit defective on borders since it loses some bits.
    If it is significantly faster than 4x the previous method, you can operate internally at double X&Y resolution, then output.  Though, if you are painting ontop of other graphics, it now becomes valuable to maintain an at least 2 bit depth stencil so when you draw ontop of other items, you can have soft edges.
     

    Offline KE5FX

    • Super Contributor
    • ***
    • Posts: 1891
    • Country: us
      • KE5FX.COM
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #53 on: September 05, 2018, 01:30:43 am »
    Thats Breshenans line drawing algorithm, there is also a circle drawing variant and an improved line drawing one that exploits the symmetry.

    "Graphics Gems" by Glassner (Academic Press) is IMHO the bible for CPU based graphics stuff, dated by todays standards but very, very good stuff when you lack a modern graphics processor.

    Regards, Dan.

    Also there's Mike Abrash, whose older VGA-era books are now free

     
    The following users thanked this post: edavid, hamster_nz, BrianHG

    Offline hamster_nz

    • Super Contributor
    • ***
    • Posts: 2803
    • Country: nz
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #54 on: September 05, 2018, 03:33:45 am »
    Thats Breshenans line drawing algorithm, there is also a circle drawing variant and an improved line drawing one that exploits the symmetry.

    "Graphics Gems" by Glassner (Academic Press) is IMHO the bible for CPU based graphics stuff, dated by todays standards but very, very good stuff when you lack a modern graphics processor.

    Regards, Dan.

    Also there's Mike Abrash, whose older VGA-era books are now free.

    Those books (and the series of Dr Dobbs articles) were great. For those who wan't there:

    - It was the at the time that Desktop OSs went from 16-bit ot 32-bit - 32-bit DOS extenders were required to run 32-but code

    - It was also the time of the 486 -> Original Pentium transition, where super-scalar CPUs first hit the desktop.

    - It was also the time when Graphics cards were starting to escape the VGA 320x200 8-bit graphics mode (enabled by larger address space of the 32-bit OSs)

    - It was the time that First Person Shooter games first appeared (Wolf3d gave way to Doom and Descent)

    Optimization from the "pretty bland" 486 code that compilers generated to the foibles of the Pentium's architecture allowed for great improvements in speed. Compilers took a while to catch up.

    The rate of change was amazing to look back on.

    Ah, nostalgia isn't what it used to be.
    Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #55 on: September 05, 2018, 10:38:58 am »
    Quote
    "Graphics Gems" by Glassner (Academic Press) is IMHO the bible for CPU based graphics stuff, dated by todays standards but very, very good stuff when you lack a modern graphics processor.

    Quote
    Mike Abrash

    I still have to buy the Graphics Gems, I have already downloaded Mike Abrash, and I have found good hints in this book:

    Quote
    Real-Time Collision Detection - Christer Ericson - ott 2004

    available in Kindle format, which I prefer  :D

    it's funny to know the Sony Playstation's GTE implemented a trick, consisting of three modules in parallel of two multipliers each, plus adders, for a total of six multipliers (all is fixed-point).

    But it's not clear to me how the GTE did color interpolation.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #56 on: September 05, 2018, 02:30:49 pm »
    Code: [Select]
    |........................................|
    |........................................|
    |........................................|
    |.....................................*..|
    |....................................**..|
    |..................................***...|
    |.................................*..*...|
    |................................*...*...|
    |...............................*...*....|
    |..............................*....*....|
    |............................**.....*....|
    |...........................*......*.....|
    |..........................*.......*.....|
    |.........................*........*.....|
    |........................*.........*.....|
    |......................**.........*......|
    |.....................*...........*......|
    |....................*............*......|
    |...................*............*.......|
    |..................*.............*.......|
    |................**..............*.......|
    |...............*...............*........|
    |..............*................*........|
    |.............*.................*........|
    |............*.................*.........|
    |..........**..................*.........|
    |.........*....................*.........|
    |........*....................*..........|
    |.......*.....................*..........|
    |......*......................*..........|
    |....**.......................*..........|
    |...*........................*...........|
    |..**........................*...........|
    |....****....................*...........|
    |........****...............*............|
    |............****...........*............|
    |................****.......*............|
    |....................****..*.............|
    |........................***.............|
    |........................................|

    well, the triangle filler trick is fast, nice and neat, but it is defective at drawing just the edge, hence I am back to the pipe of with three line-filler-modules in parallel; each module computes one edge of the triangle, and the whole doesn't fill anything, it just draws the triangle's edge.

    I have modified the HDL simulator to show the bitplane at each step, also showing if a blitter unit has completed the job, and the whole process stops when all the blitters have completed their job.

    and here it is the output of the simulator, you can see how it evolves
    « Last Edit: September 05, 2018, 10:03:48 pm by legacy »
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #57 on: September 06, 2018, 12:13:09 am »
    Code: [Select]
    Benchmark data: Pentium III 450Mhz, Visual C++ 6.0 Release Build, Optimizations on...

    Algorithm         Lines Drawn (x1000)   Seconds     Lines Per Second
    DDA               1600                  529.101      3023.997
    Bresenham         1600                   83.220     19226.147
    Wu                1600                   75.949     21066.767
    EFLA Variation D  1600                   65.914     24274.053


    so, on google group, there was this discussion on the world's fastest line algorithm, and it seems EFLA beats the Wu and Bresenham algorithms  :popcorn:
     
    The following users thanked this post: BrianHG

    Offline T3sl4co1l

    • Super Contributor
    • ***
    • Posts: 21681
    • Country: us
    • Expert, Analog Electronics, PCB Layout, EMC
      • Seven Transistor Labs
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #58 on: September 06, 2018, 01:50:38 am »
    Funny, doing it with fixed point ("EFLA") is exactly equivalent to Bresenham's, the latter is just normally written as a conditional check.

    Apparently changing a branch instruction to a carry instruction counts as novel enough to license a half page of code?  Well, I bet they aren't selling many, anyway...

    Tim
    Seven Transistor Labs, LLC
    Electronic design, from concept to prototype.
    Bringing a project to life?  Send me a message!
     

    Offline KE5FX

    • Super Contributor
    • ***
    • Posts: 1891
    • Country: us
      • KE5FX.COM
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #59 on: September 06, 2018, 04:05:19 am »
    Funny, doing it with fixed point ("EFLA") is exactly equivalent to Bresenham's, the latter is just normally written as a conditional check.

    They aren't actually 100% equivalent.  A straightforward fixed-point solution will never write more than one pixel at each coordinate on the major axis, so it won't add an extra pixel at the stairstep when the direction changes.  This will result in a less "solid"-appearing line compared to what you get from a classical Bresenham implementation. 

    It doesn't usually matter that much, but a naive polygon-fill routine might rely on that extra pixel to fill in holes at T-junctions.

    Quote

    Apparently changing a branch instruction to a carry instruction counts as novel enough to license a half page of code?  Well, I bet they aren't selling many, anyway...

    Tim

    Yeah, that's pretty funny.  Somebody hasn't read their Abrash.
     

    Offline T3sl4co1l

    • Super Contributor
    • ***
    • Posts: 21681
    • Country: us
    • Expert, Analog Electronics, PCB Layout, EMC
      • Seven Transistor Labs
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #60 on: September 06, 2018, 04:25:23 am »
    I've never seen a plain line drawing algorithm implemented with doubling up of pixels at minor-axis steps, of course if you want to do it that way you can.

    Whether poly fills will "leak" simply depends on if they use a von Neumann / Moore neighborhood.

    Tim
    Seven Transistor Labs, LLC
    Electronic design, from concept to prototype.
    Bringing a project to life?  Send me a message!
     

    Offline KE5FX

    • Super Contributor
    • ***
    • Posts: 1891
    • Country: us
      • KE5FX.COM
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #61 on: September 06, 2018, 05:18:53 am »
    I've never seen a plain line drawing algorithm implemented with doubling up of pixels at minor-axis steps, of course if you want to do it that way you can.

    The distinction is known as "4-connected" versus "8-connected", although I never encountered that terminology before I Googled it just now.

    Quote
    Whether poly fills will "leak" simply depends on if they use a von Neumann / Moore neighborhood.

    More terms I had to look up :) -- but I think we're referring to the same thing. 



    Of course, relying on 8-connectivity to fix T-junctions is going to cause endless grief when you introduce alpha blending into the equation.  AFAIK there is really no substitute for fixing T-junctions at the geometry stage rather than the rasterizer.
     

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #62 on: September 06, 2018, 09:51:12 am »
    So, yesterday night I wanted to check if better solutions exist for drawing a line, and I ended up by confirming the Bresenham's algorithm because it's the best (in terms of simplicity and efficiency) to be rewritten in VHDL.

    as proof of concept, this is the last C prototype

    Code: [Select]
    void gfx_draw_line_bresenham
    (
        uint32_t x0,
        uint32_t y0,
        uint32_t x1,
        uint32_t y1,
        color_rgb_t color
    )
    {
        sint32_t  dx;
        sint32_t  dy;
        uint32_t  x;
        uint32_t  y;
        sint32_t  sx;
        sint32_t  sy;
        sint32_t  err;
        sint32_t  magic;
        boolean_t is_done;

        /*
         * bounding box's width and height
         * set the loop's sign
         */
        dx = x1 - x0;
        dy = y1 - y0;
        if (dx < 0)
        {
            dx = -dx;
            sx = -1;
        }
        else
        {
            sx = 1;
        }
        if (dy > 0)
        {
            dy = -dy;
            sy = 1;
        }
        else
        {
            sy = -1;
        }

        magic = 0;
        err   = dx + dy;
        x     = x0;
        y     = y0;

        is_done = False;
        while (is_done isEqualTo False)
        {
            /*
             * plot(x, y,color);
             */
            pixel_set_xy(x, y,color);

            /*
             * Reached the end
             */
            is_done = ((x isEqualTo x1) logicalAnd(y isEqualTo y1));

            /*
             * Loop carried
             */
            magic = err shiftLeft 1;
            if (magic > dy)
            {
                err += dy;
                x   += sx;
            }
            if (magic < dx)
            {
                err += dx;
                y   += sy;
            }
        }
    }


    This algorithm is good, neat, fast, and it doesn't use any MUL or DIV for the setup, and it can be implemented in three or four units to calculate points in parallel.

    This algorithm is good, neat, fast, and it doesn't use any MUL or DIV for the setup, and it can be implemented in three or four units to calculate points in parallel.

    The SONY Playstation's GTE has four units of this kind (it's not specified which algorithm is used) to draw up to four lines in parallel once given vertices.

    A defect of this algorithm (or of this implementation) is that it cannot handle any cases where the line endpoints do not lie exactly on integer points of the pixel grid: in this case, either it eats the pixel or it adds an extra pixel. This little bad effect has been shown by the edges calculated by the previous method I used where the algorithm tests directly the points using a test-in-polygon. There are marginal differences on the edges, but it's acceptable.

    Definitively: approved  :D
     

    Offline DJohn

    • Regular Contributor
    • *
    • Posts: 103
    • Country: gb
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #63 on: September 06, 2018, 01:07:50 pm »
    You should read Fabian Giesen's series "A Trip Through the Graphics Pipeline (2011)" https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/.  And definitely follow the links.  This is how it's done in real hardware.
     
    The following users thanked this post: oPossum

    Offline legacyTopic starter

    • Super Contributor
    • ***
    • !
    • Posts: 4415
    • Country: ch
    Re: any simple blitter around? (looking for FPGA design)
    « Reply #64 on: September 10, 2018, 05:43:50 am »
    I have done a ton of personal research for a new algorithm, and I have developed my own, again with pro and con.

    pro: it interpolates, and it's extremely faster since it removes MULs during the loop, it only uses ADDs, that speeds up a lot
    con: it adds 12 MULs and 3 DIVs for the initialization of the context, and it's numerically weak

    I am still with the C prototype, I haven't implemented it yet on HDL, and I am with fixedpoint16.16, but this is critical

    Code: [Select]
            /*
             * step2
             * 1/2 is added as a workaround
             * to compensate the numerical error
             * caused by the loss of bits that are
             * amplified by step3 and during the loop
             */
            color_v[0] = fx1616_add_fp(color_v[0], 0.5);
            color_v[0] = fx1616_add_fp(color_v[1], 0.5);
            color_v[0] = fx1616_add_fp(color_v[2], 0.5);
            color_v[0] = fx1616_div(color_v[0], scale);
            color_v[1] = fx1616_div(color_v[1], scale);
            color_v[2] = fx1616_div(color_v[2], scale);


    Code: [Select]
            /*
             * step3
             */
            magic_TW_0 = fx1616_mul(magic_TW_0, color_v[0]);
            magic_TW_1 = fx1616_mul(magic_TW_1, color_v[1]);
            magic_TW_2 = fx1616_mul(magic_TW_2, color_v[2]);
            magic_U_12 = fx1616_mul(magic_U_12, color_v[0]);
            magic_U_20 = fx1616_mul(magic_U_20, color_v[1]);
            magic_U_01 = fx1616_mul(magic_U_01, color_v[2]);
            magic_W_12 = fx1616_mul(magic_W_12, color_v[0]);
            magic_W_20 = fx1616_mul(magic_W_20, color_v[1]);
            magic_W_01 = fx1616_mul(magic_W_01, color_v[2]);

    life is complex  :palm: :palm: :palm:
     


    Share me

    Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
    Smf