Low Cost PCB's Low Cost Components

Author Topic: No bitbanging necessary, or How to Drive a VGA Monitor on a PSoC 5LP w/Verilog  (Read 22645 times)

0 Members and 2 Guests are viewing this topic.

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Before hand, I wont take any responsibility if you follow this tutorial and fry your VGA monitor. There can always be a risk on using the wrong frequency values that could destroy you monitor's controller. So look at the signals and voltages to make sure it's all within specs before connecting anything to your monitor.

I use this type of cable to check it on my scope first.

I will update the reserved posts as the project progresses. Take into account we are near the holidays and I'll be out of town, so I might only get to the 2nd part of the project before the break and will follow up next year with the rest.

With all that out of the way, let me start:


In this project I want to show how to drive a VGA monitor at 800x600 60Hz (Edit: and other resolutions and frequencies) using the PSoC hardware implemented in Verilog and driven by the system clock.

For that purpose I'm going to use this cheap R2R VGA module, it only has 3 bits per color but that would be more than sufficient.

Also I'm using this prototyping kit available from Cypress for $10:

Alternatively you could do your own R2R DAC but I'm not covering that and
if you put the wrong voltages to your VGA monitor who knows what can happen to the monitor's controller.


Initially I was going to use existing components (counters, pwms, comparators, etc). Then I started to look at using the UDBs and even the Datapaths directly but that would be too elaborated and confusing, although I might do another tutorial doing so, specially if adding video ram and the resources are scarce using non-optimized Verilog code.

I'm not going to optimize the Verilog code either, I'll let the toolchain do it for me, but it's possible to map the usage of the UDB accumulators, registers and fifos in Verilog, but that would be a more advanced implementation. Then again, depending how this tutorial goes and if I can get the video ram I ordered to interface with this board I might revisit the Verilog implementation to make better use of the resources.

Edit: I will include the project as it progresses per post in a zip file. At least that's the plan.

RGB signals are from 0V to 0.7V with 75 Ohm termination.

There is an exception on the Green channel that could range from 0.3V to 1.0V if the synch signal is present on that channel, but we are not going that path.

HSync (Horizontal Sync) and VSync (Vertical Sync) are TTL level signals, 0V to 5V although CMOS 0V to 3.3V is compatible as well, so as long as we reach the 2.0 Voltage Input High on the monitor we are good to go.


3.3V CMOS:

Next we need to get the timings:
http://tinyvga.com/vga-timing/[email protected]

I selected that, because the 40MHz needed for the pixel clock is easy to achieve using the PSoC5LP prototype board IMO clock.

I was thinking in doing a [email protected] example but I found conflicting information on the timings. Apparently VGA was supposed to be double of NTSC-M and the vertical frequency was a bit lower than 30 Hz per interlaced field, regardless we will stick with the tinyvga timings. Also as a note, some FPGA examples place the pixel frequency at 25MHz which is good enough for monitors to sync to, but we are going to try to get closer to 25.175MHz, most monitors will tolerate the difference, maybe not some older analog monitors. Anyways I could get close to 25.175MHz using the PLL and it worked but since I can get bang on 40MHz for the pixel clock I'm going to drive the monitor at 800x600.

Using the PSoC5LP internal clock (IMO) is not going to be precise anyway, our best hope is 1% error using the internal oscillator, So even if the pixels are pretty stable, I've noticed some slight horizontal banding if you are displaying images with high frequency changes (as in each pixel having a different color that all it's neighbours.

I will add an external oscillator via a DSI (Digital signal input) later on in the project to make the clock very stable using a 26MHz OCXO (Oven Controlled XTal Oscillator) that I have at hand and I trimmed with passives after using a pot to get as close as I could to the 26MHz clock, but I don't have a good reference clock so it's going to be a bit off.

An alternative would be add a XTAL (4MHz to 25MHz) on P15_2 and P15_3 if you could bodge the caps to a near by ground. But I'm not going to try that.

So for now we will stick with the jittery but still very functional internal clock.

Code: [Select]
SVGA Signal 800 x 600 @ 60 Hz timing
Screen refresh rate 60 Hz
Vertical refresh    37.878787878788 kHz
Pixel freq.         40.0 MHz

Horizontal timing (line)
Polarity of horizontal sync pulse is negative.
Scanline part   Pixels      Time [┬Ás]
Visible area     800          20
Front porch       40           1
Sync pulse       128           3.2
Back porch        88           2.2
Whole line      1056          26.4

Vertical timing (frame)
Polarity of vertical sync pulse is negative.
Frame part      Lines       Time [ms]
Visible area     600          15.84
Front porch        1           0.0264
Sync pulse         4           0.1056
Back porch        23           0.6072
Whole frame      628          16.5792

Note that the Front and Back Porch are in reference to the Sync pulse, otherwise they seem backwards.

We can start the signals based on the Visible area, both Horizontal and Vertical as shown in here:

Because of timings of the DMA transfer that would be introduced in a future update, that might make our timing constrains a bit to tight with no chance to offset our visible area to meet the VGA constrains. So we are going to need a signal within the horizontal timings to tell us when we can start the per line DMA transfer (once we implement that functionality that is). So we might as well start on the Front Porch for both Horizontal and Vertical signals instead of waiting for the next frame to catch things up.

So the plan is making a component with pixel values for horizontal timings and line values for vertical timings. This way we can just plug in values from http://tinyvga.com/vga-timing as long as we change the input pixel clock to match those entries for other resolutions.

Note also that the sync pulses could be positive or negative, I'm not sure if that matters because I've been using just negative pulses for other modes, but I'll add controls to accommodate for that as well.

Also, there are ways to add visualization tabs and custom UIs for the module, but I'm not going to even consider that since the parameter list should be more than adequate for this demo. I'll leave the [email protected] parameters as the default values.

This PSoC 5LP only has 64KB of ram, so there are going to be some limitations as for the size of the frame buffer, but we might expand it using external memory.

All this comes from a previous attempt (successful I might add) to drive a CGA monitor directly and I started with VGA output so I know it's posible to do

But also to show that to learn Verilog you don't need an expensive FPGA (although you can find capable cheap ones) But this $10 prototyping board has tons more including a built in Arm Cortex M3 processor. They have modules to do pretty much anything you'll need from communications (USB, SPI, I2C, CAM..) to external memory, lcd displays (even graphical ones) can supply constant current and/or voltage, it has a ton of analog stuff in there that is also programmable (ADCs) DACs, OpAmps.

So as elaborate as making a Verilog VGA controller might seem, this is just scratching the surface of what can be done. Granted, it doesn't have as many digital block resources a an FPGA, it's even shy when compared with low end CPLDs, but still plenty for a lot of tasks.

And No, I don't get a commission from Cypress, not even free kits.
« Last Edit: December 21, 2015, 10:42:31 PM by miguelvp »
The following users thanked this post: obiwanjacobi, ebclr, newbrain, jwasys

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Well, I had a work emergency so I'll split this 1st part in two update parts. I might modify the post if I find any erratas or gross misspellings since I did rush this post.


I was thinking in doing a library that you could bring to your project, but to keep it simple I'm just going to create the module on a project.

It's not that hard to create a library and that information can be found on Cypress website, so I will proceed to just create the Verilog based module within the project that will use it.

I'm using PSoC Creator 3.3 (the latest) and they did change the project wizard. Also I'm going to target the PSoC 5LP prototyping kit (CY8CKIT-059), since I'm not optimizing the code I don't think this will fit in a PSoC 4 and as implemented I could barely fit it in this kit.

So, to create a Creator project, you just launch Creator 3.3 once it's up.
Select File->New->Project

On the dialog select Design project->Target hardware->Kit:CY8CKIT-059(PSoC 5LP) and hit Next.

Select Empty Schematic and hit Next.

On the Workspace pulldown select "Create new workspace"
On the Workspace name type: "VideoWorkspace" it can be anything you want.
On the Location I left the default.
For the Project Name I used: "PSoC5LPVGA"

Click on Finish when you are satisfied with your selections.

Once the new project opens click on the components tab:

Right click on the "Project 'PSoC5LPVGA' [CY8C5888LTI-LP097]" and select "Add Component Item"
Scroll to the Symbol section and select "Symbol Wizard" (you could select Implementation -> Verilog file but this is easier)
On the Component name I used: "VideoCtrl_v1_0" Cypress recommends the version number being part of the name.
Click: "Create New"

On the symbol creation wizard fill in the following:
Symbol Label: Video Controller
Terminal Name, type:
    reset, Digital Input (This will handle the component reset)
    clock, Digital Input (This is the pixel clock)
    line_cnt[9:0], Digital Output (This is the visible vertical line counter 9:0 means bits 9 to 0 for a total of 10 bits so we can handle 1024 vertical lines max)
    line_dma, Digital Output (This will signal to initiate a line DMA)
    blank_n, Digital Output (This indicates if we are on a blanking state, I use _n to indicate negative logic, 0 blank, 1 not blanking)
    hsync, Digital Output (This will be the Horizontal Sync signal)
    vsync, Digital Output (This will be the Vertical Sync signal)
Click OK to proceed.

Right click on the empty canvas (not on the symbol) and select "Properties", we need to setup a couple of fields in there.
In the symbol section change the Doc.APIPrefix to VideoCtrl and the Doc.DefaultInstanceName to VideoCtrl as well.
The Doc.Placement (Collection) determines where the component will be available at. Not changing it will leave it to a new default tab in the project.
Click OK to proceed.

Right click on the empty canvas again (not on the symbol) and select "Symbol Parameters" In here we define configurable parameters.
The Hardware value will expose the parameter to Verilog so when you use the component you can override the default parameters.
We will use the default values to be the ones defined here for our default mode:
http://tinyvga.com/vga-timing/[email protected]
Code: [Select]
Name                Type    Value, Misc: Hardware Description
HorizVisibleArea    uint16  800          true     Horizontal Visible Area
HorizFrontPorch     uint16  40           true     Horizontal Front Porch
HorizSyncPulse      uint16  128          true     Horizontal Sync Pulse
HorizBackPorch      uint16  88           true     Horizontal Back Porch
HorizPulsePostive   bool    true         true     Horizontal Pulse Positive
VertVisibleArea     uint16  600          true     Vertical Visible Area
VertFrontPorch      uint16  1            true     Vertical Front Porch
VertSyncPulse       uint16  4            true     Vertical Sync Pulse
VertBackPorch       uint16  23           true     Vertical Back Porch
VertPulsePositive   bool    true         true     Vertical Pulse Positive
Do make sure the Misc Hardware value is set to true, the Description is not important but if you want to access these values from Verilog, the Hardware bool set to true is important
Click OK to proceed

Edit: Vert Pulse Positive is supposed to say true on that image, not sure how it became just 1. The attached code has the problem but I will update it in future post releases.

Right click on the empty canvas once more and select "Generate Verilog", This will create the Verilog template that we can add our code to drive our output signals based on the input ones. On the dialogue box just click Generate (nothing to change there, so I'm not including the picture)

This will generate a template called VideoCtrl_v1_0.v, there is a block in there where you can insert your code that wont be affected if you "Generate Verilog" from the symbol again (if you need to add or remove parameters or signals.
So anything you add in between the //`#start body` and the //`#end` will be preserved.
Here is the generated template.
Code: [Select]
//`#start header` -- edit after this line, do not edit this line
// ========================================
// All Rights Reserved
// WHICH IS THE PROPERTY OF your company.
// ========================================
`include "cypress.v"
//`#end` -- edit above this line, do not edit this line
// Generated on 12/21/2015 at 01:39
// Component: VideoCtrl_v1_0
module VideoCtrl_v1_0 (
output  blank_n,
output  hsync,
output [9:0] line_cnt,
output  line_dma,
output  vsync,
input   clock,
input   reset
parameter HorizBackPorch = 88;
parameter HorizFrontPorch = 40;
parameter HorizPulsePositive = 1;
parameter HorizSyncPulse = 128;
parameter HorizVisibleArea = 800;
parameter VertBackPorch = 23;
parameter VertFrontPorch = 1;
parameter VertPulsePositive = 1;
parameter VertSyncPulse = 4;
parameter VertVisibleArea = 600;

//`#start body` -- edit after this line, do not edit this line

//        Your code goes here

//`#end` -- edit above this line, do not edit this line
//`#start footer` -- edit after this line, do not edit this line
//`#end` -- edit above this line, do not edit this line

Now it would be a good time to do a "Save All" with File->Save All (Ctrl+Shift+S)

Now lets put our Verilog code substituting the line (and only that line because of the 20,000 characters post limits won't let me put both this and the fully implemented version).
//        Your code goes here

With this block (On part two of this post I will try to explain what this code does, Although Is fully commented so It might be ok to leave it unless someone has particular questions):

Code: [Select]
    // Horizontal and Vertical states for Finite State Machine (FSM).
    localparam STATE_FP   = 2'd0;
    localparam STATE_SYNC = 2'd1;
    localparam STATE_BP   = 2'd2;
    localparam STATE_VIS  = 2'd3;
    // Horizontal and Vertical state registers.
    reg [3:0] h_state_r;
    reg [3:0] v_state_r;

    // Current Horizontal and Vertical counters.
    // These values count the pixels on each current state and don't carry the overall line or pixel count.
    reg [9:0] h_count_r;
    reg [9:0] v_count_r;

    // Total line count output register to make it accessible to native code, this will only show the last visible line.
    reg [9:0] line_cnt_r;
    reg newline;
    // Horizontal section
    [email protected](posedge clock)
        // We need to see how we can force a reset even if the reset pin is wired to a digital constant of value 0
        // Maybe a _Start routine via the API could be used to initalize the module as well as the reset.
        // Otherwise make sure the state machine will eventually get to the right state no matter what the random initial values are.
            // Initialize horizontal registers with initial state set to Front Porch
            h_state_r <= STATE_FP;
            h_count_r <= 10'd1;
            newline <= 1'b0;
            // Horizontal FSM
            case (h_state_r)
                    newline <= 1'b0;
                    // Check if we are still within the Horizontal Front Porch
                    if (h_count_r == HorizFrontPorch)
                        // If we reached the end of the Front Porch.
                        // Reset the counter and jump into the Sync State.
                        h_count_r <= 10'd1;
                        h_state_r <= STATE_SYNC;
                        // Increment the horizontal pixel count
                        h_count_r <= h_count_r + 10'd1;
                    newline <= 1'b0;
                    // DMA signaling only can happen during the Back Porch so set the signal to low
                    // Check if we are still within the Horizontal Sync Pulse
                    if (h_count_r == HorizSyncPulse)
                        // If we reached the end of the Sync Pulse.
                        // Reset the counter and jump into the Back Porch State.
                        h_count_r <= 10'd1;
                        h_state_r <= STATE_BP;
                        // Increment the horizontal pixel count
                        h_count_r <= h_count_r + 10'd1;
                    newline <= 1'b0;
                    // Check if we are still within the Horizontal Back Porch
                    if (h_count_r == HorizBackPorch)
                        // If we reached the end of the Back Porch.
                        // Reset the counter and jump into the Visible State.
                        h_count_r <= 10'd1;
                        h_state_r <= STATE_VIS;
                        // Increment the horizontal pixel count
                        h_count_r <= h_count_r + 10'd1;
                    // Check if we are still within the Horizontal Visible state
                    if (h_count_r == HorizVisibleArea)
                        // If we reached the end of the Visible state.
                        // Reset the counter and jump into the Front Porch State.
                        h_count_r <= 10'd1;
                        h_state_r <= STATE_FP;
                        newline <= 1'b1;
                        // Increment the horizontal pixel count
                        h_count_r <= h_count_r + 10'd1;
                        newline <= 1'b0;
                    // We should never end up here since our state machine is only two bits and we have 4 states covered.
                    // Set initial state set to Front Porch
                    h_state_r <= STATE_FP;

    // Vertical section
    [email protected](posedge clock)
            // Initialize vertical registers with initial state set to Front Porch
            v_state_r <= STATE_FP;
            v_count_r <= 10'd1;
            if (newline)
                // Deal with the line counter register here, only increment it
                // if we are on the vertical visible state, otherwise reset it to zero
                if (v_state_r == STATE_VIS)
                    line_cnt_r <= line_cnt_r+10'd1;
                    line_cnt_r <= 10'd0;

                // Vertical FSM
                case (v_state_r)
                        // Check if we are still within the Horizontal Front Porch
                        if (v_count_r == VertFrontPorch)
                            // If we reached the end of the Front Porch.
                            // Reset the counter and jump into the Sync State.
                            v_count_r <= 10'd1;
                            v_state_r <= STATE_SYNC;
                            v_count_r <= v_count_r+10'd1;
                        // Check if we are still within the Vertical Sync Pulse
                        if (v_count_r == VertSyncPulse)
                            // If we reached the end of the Sync Pulse.
                            // Reset the counter and jump into the Back Porch State.
                            v_count_r <= 10'd1;
                            v_state_r <= STATE_BP;
                            v_count_r <= v_count_r+10'd1;
                        // Check if we are still within the Vertical Back Porch
                        if (v_count_r == VertBackPorch)
                            // If we reached the end of the Back Porch.
                            // Reset the counter and jump into the Visible State.
                            v_count_r <= 10'd1;
                            v_state_r <= STATE_VIS;
                            v_count_r <= v_count_r+10'd1;
                        // Check if we are still within the Vertical Visible state
                        if (v_count_r == VertVisibleArea)
                            // If we reached the end of the Visible state.
                            // Reset the counter and jump into the Front Porch State.
                            v_count_r <= 10'd1;
                            v_state_r <= STATE_FP;
                            v_count_r <= v_count_r+10'd1;
                        // We should never end up here since our state machine is only two bits and we have 4 states covered.
                        // Just in case we add more bits to the state, reset the module as if a reset had occurred.
                        // Initialize vertical registers with initial state set to Front Porch
                        v_state_r <= STATE_FP;
                        v_count_r <= 10'd1;
    // Assign the current registers to the actual outputs
    // Horizontal and Vertical Blanking are only off in Visible state, blank signal has negative logic (0 blanking, 1 non-blanking)
    assign blank_n = (h_state_r == STATE_VIS) & (v_state_r == STATE_VIS);
    // hsync will be low or high during the horizontal synch pulse state depending on the HorizPulsePositive value.
    //       If pulse positive is true the hsync will be low (zero) when not active, otherwise it will be high (one) when  not active.
    assign hsync = (h_state_r == STATE_SYNC)?HorizPulsePositive: ~HorizPulsePositive;
    // vsync will be low or high during the vertical synch pulse state depending on the VertPulsePositive value.
    //       If pulse positive is true the vsync will be low (zero) when not active, otherwise it will be high (one) when  not active.
    assign vsync = (v_state_r == STATE_SYNC)?VertPulsePositive: ~VertPulsePositive;
    // line_cnt will allow the module to know what current visible line needs to be fetched.
    assign line_cnt = line_cnt_r;

    // line_dma will go high when we need to fetch another line, vertical blank state will be checked so no dma is requested on non visible lines.
    // HorizDMAAdjust should never exceed HorizBackPorch since line_dma would only be triggered during the Horizontal STATE_BP (Back Porch) state.
    assign line_dma = (h_state_r == STATE_BP)&(h_count_r+HorizDMAAdjust == HorizBackPorch)&(v_state_r == STATE_VIS);

Do a Save All again. File->Save All (Ctrl+Shift+S) to save the current progress.

Now you can go to the Source Tab and open the TopDesign.cysch (Schematic) by double clicking on it, Select the Default tab on the Component Catalog and drag an instance of your newly created module

File->"save all" again, after you added the component to the schematic view.
Ignore the errors on the right bottom of the screen, that's because we didn't hook up any inputs (2 errors for each input terminal) then again I didn't have a chance to test this particular module although I did make a test one before that I hooked up and worked.

Sorry for leaving this so halfway for now. I'm including what I have so far in a zip file linked below.
I might need to revisit it if there are problems going from my proof of concept that worked to generating this project. I will address more details and add notes if people have questions and need some clarifications on part 2 of this post.

Otherwise, I will continue with post 3 on using this module to display some generic video pattern. I will however revisit this post to explain the code by sections in a part 2 of this post.

To Be Continued on the next post. Do feel free to ask questions if something is not clear. I'll do my best to clarify things.

« Last Edit: December 22, 2015, 08:06:50 AM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
On part 2 I was going to talk about the source code but I think the comments in the code explains what it does, so let's get some signals out to make sure the component works as expected.

We only need two output pins for the HSync and VSync at TTL levels.
We also need a pixel clock running at 40MHz for the [email protected] mode.
And a Logic Low constant to keep the reset pin low.

Adding HSync and VSync:
In the Component Catalog on the right, select the Cypress tab, expand the "Ports and Pins" and drag two "Digital Output Pins" into the TopDesign.cysch schematic attaching them to our component's hsync and vsync ports.

(Optional) If you don't want to attach them directly to the ports you can always use a wire to connect them. Once connected if you move the pin a wire will show up as well.

Right click on the pin connected to hsync and select configure.
(alternatively you can double click on the pin, but sometimes it might move the pin if you are not quick and you tend to move the mouse while double clicking)

We only need to change the name to something sensitive HSYNC will do.
We are going to leave the defaults for the output pin as Digital output, with Hardware Connection (allows you to connect it on the schematic otherwise it would be only software driven).

On the Output tab (pictured below on the vsync pin) we are going to leave the defaults as well: Fast Slew rate, Vddio Drive level, 4mA source, 8mA sink Current and Transparent Output mode. Click OK to proceed.

Right click on the pin connected to vsync and select configure, and change the name to VSync

Optional: If you want to learn more than you'll ever need about the pin configuration, click on that Datasheet button, but expect to spend hours on that doc, includes everything related to pins even clock pin modes, inputs, outputs, analog, bidirectional, how to access them by software, etc...

Forcing reset to low
For that we just need to connect a Logic Low to the reset pin.
It's located in the component catalog under Cypress->Digital->Logic
Alternatively you can type "logic low" in the search and it will filter the components matching that search.
Same as before, drag it into the canvas and connect it to the reset signal.

Connecting the Pixel clock
The clock component is located under Cypress->System.

The value defaults to 12MHz and we need 40MHz. Also we want to change the name to PIXEL_CLOCK.
There is one problem, and that is that by default the PSoC clock would be set to 24MHz for both the Master clock and the Bus clock (drives the CPU).

Lets save what we have so far with File->Save All

The system clock settings are accessible by opening the .cydwr file on your project (PSoC5LPVGA.cydwr in our case)
Double click on the file to open it and since we are in there let's assign the physical pins to HSYNC and VSYNC.

In my case I'm using P1[4] for HSYNC and P1[6] for VSYNC, note that when you select them from the dropdown port selector it checks the Lock tick mark by itself.

Also I'm overlaying a picture of the physical board to show where the actual pins are in the board. That part is not the UI doing it for you :)

Edit: I forgot, you need to hook the two grounds from the VGA module to two grounds on the board if you want to see any signals. One ground might do, but might as well hook both.
Don't worry about the VCC pins on the VGA module, that's for the PS2 connector.

Save often.

If you look at that picture you'll notice there is a Clock tab under it. That's the one we want to change the board clock.

Select that tab and double click on the yellow area to open the "Configure System Clock" dialog window

Let's use the 24 MHz IMO even if it has 4% accuracy. Also the PLL should have the input set to IMO and we want the PLL to produce 40MHz.

Set it up and click OK

Save files, you should also click on the pins tab for when we need to go back to the pin settings in the cydwr file.

Now go back to the schematic view (.cysch tab, TopDesign.cysch)

Right click on the clock so we can configure it to 40MHz and change the name of the clock

We are ready to test it (well not really because I forgot to add one parameter in the component symbol)
Click on the Build Icon and you'll see what I mean:

I guess the DMA gets no love on this tutorial, first I forgot the pin, then the parameter!

This shows I'm doing this a bit pre-prepared but really on the fly, So let's fix that:
Open the component symbol file (VideoCtrl_v1_0.cysym)
Right click on an empty place in the symbol canvas and select "Symbol Parameters"

Add the missing parameter:
Name: HorizDMAAdjust
Type: uint16
Value: 16
Edit: Don't forget to set hardware to true and add a description if you are so inclined.

This is the number of pixels before the horizontal back porch ends to allow for time to start a DMA transfer so we have data buffered by the time we are on the 1st displayed pixel. It's going to be handy when we actually transfer the video from a frame buffer.

Right click on the empty canvas and select "Generate Verilog" click Generate on the dialog box that comes up.
And say OK to overwrite the existing VideoCtrl_v1_0.v file.
Our previous code is safe within the //`#start body` and //`#end` block.

Save All and build!


Let's program the board and check how the signals look in the scope.

Hook up the kit to your computer (I use a usb extender, actually a couple so I can disconnect both extenders not wearing out the contacts of the board nor the PC).

And Debug->Program (Ctrl+F5)

And lets check the output on the scope:

Hmm I forgot to turn off the inverse image on my scope, kind of like it. Also I have both channels set to X1.
Top signal is HSync, bottom one is VSync.

Note that the pulses are positive as we specified in the instance configuration. Also note there are 4 HSyncs per VSync as expected.
37765.7 Hz divided by 628 lines for the full frame is ~60.14 Hz
The spec:
37878.8 Hz/628 gives ~60.32 Hz

If you want to play with the values, open your TopDesign.cysch. Right click on the VideoCtrl_1 instance and select Configure.

You can get to other modes with values found here:
But dont forget to change your pixel clock (The PLL has no problem with 40MHz but might not get close enough to other clock frequencies) Also the a PLL signal based on the IMO can't go above 67MHz or so, don't quite recall. With an XTAL or OCXO it will allow you to go to the full 80MHz that is rated at.

In the PSoC5LPVGA.cydwr on the clock tab, you can see the desired requested clock and the nominal value (what it will give you) so you can see how close you can get to other pixel clocks for other resolutions.

Also note that on the tinyvga timings page it will show if the sync pulses are positive or negative, maybe the monitors don't care but you can specify them independently

My jumper wires to the VGA connector for HSYNC and VSYNC are next to each other so we can see some crosstalk

Also we are getting ~37.763 kHz in the HSYNC with the 24 MHz 4% IMO. The specs showed 37.87878788 kHz.

Change the IMO to 6 MHz 2% (nothing else since the PLL will still generate the 40 MHz based on that clock)
Save and program the device and the output I get is ~37.82 kHz, much better.

Btw my monitor syncs to both so is close enough.

The attached project has the 6 MHz 2% IMO, also I revised the verilog comments a bit and changed the horizontal and vertical states to use 2 bits instead of 4 bits. (had [3:0] should be [1:0])
Code: [Select]
    // Horizontal and Vertical state registers.
    reg [1:0] h_state_r;
    reg [1:0] v_state_r;

I'll add more scope pictures in the thread not in the tutorial section, showing for example the HSYNC vs the line_dma by moving my VSYNC pin to that port, and also with the blank_n vs HSYNC.

Next thing to do is to connect the VGA pins and display some kind of pattern in a monitor.
If you plug it now as is, the monitor will sync and show you the resolution but nothing will show on the screen.

Careful using random values, some monitors don't handle out of range signals well.

« Last Edit: December 22, 2015, 09:39:35 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
I didn't have much time today and it's already past midnight, So I will keep it simple (since capturing screen-shots and documenting the steps might take me at least a couple of hours (actually over 3 not counting breaks), for something that really takes just minutes to do).

So I'll do the VGA ports and the pattern generator for now to at least display something on the screen.

I'll see if I can change the values to a different resolution with negative pulses, like the [email protected]

Lets start creating the RGB output pins.
The module I have only supports 3 bits per color (VGAX0, VGAX1, VGAX2) where X is either R for red, G for green and B for blue.

You can see in the following picture that there are single R, G & B pins. Those have the combined resulting analog signals, so they are no use to us.

So lets create the pins for the Red channel. We could just create separate output pins like we did for the Sync signals, but Creator allows you to make groups of pins.

First we drop a digital output pin into the canvas, but don't connect it to anything.
The Digital Output Pin is the Component Catalog under Cypress->Ports and Pins (you can refer to the previous post where we placed the HSync pin).

Right click on the pin and select "configure".
We only need to change the name to VGA_R, this time we will change the number of pins from 1 to 3.
Don't click OK yet.

Select the Mapping Tab and clear the Contiguous check box.
If that is selected, it will use adjacent pins when mapping them to the physical pin on the device.
That might be ok, but we rather have the flexibility to split them if we want.
Now you can click OK.

Repeate the same thing for VGA_G and VGA_B, or right click on the VGA_R select Copy, and paste to copies with the continuous pin checkbox already unchecked and the number of pins already set to 3, and just configure them to change the names to VGA_G and VGA_B respectively. You should end up with something like this:

Since we are limited on what memory we have on the chip (64KB) we can't have a huge frame buffer and use 9bits per pixel. So I'm going to choose just 4bits per pixel so we will have an rgb_out[3:0] with:
rgb_out[3] holding an intensity bit.
rgb_out[2] holding Red
rgb_out[1] holding Green
rgb_out[0] holding Blue
This is similar to the same approach systems on the 80s dealt with the lack of resources for computers back then, but we do have the same limitations with current MCUs so the methods used back then apply to this mid-end MCUs.

We will get back to the rgb_out wires in a bit. Since we are going to display a pattern, we need to control when to let the colors through and when not (because during blanking we don't want to output any colors).

We have the blank_n signal that is low (0) when we are not supposed to display a thing, and goes high when we are supposed to display colors during the visible area.

To achieve the blanking we are going to use a Mux (Multiplexer). What a Mux does is it has multiple inputs (base 2, so: 2, 4, 8, 16, 32, etc) but only one output. That output is selected via a control line that matches how many bits it needs to match that selection.

So for 2 inputs you just need one bit (values 0 or 1)
for 4 inputs you need 2 bits (00b, 01b, 10b, 11b) well in verilog the would be (2'b00, 2'b01, 2'b10, 2'b11 <format is: bits'base(b:binary,d:decimal,...)value in that base>) but we are digressing. Back to the Mux.

We need only two inputs, one for blanking and one that will the the colors go through.
So lets go to the library under Cypress->Digital->Logic and select the Multiplexer and drop it in the canvas.

Right click and configure it.
We don't need to change the name and if you leave the default one no name will be displayed.
But let's change the number of input terminals to two and the width to 4 (since our rbg out values are going to be 4 bits per pixel) But after we are done, you can experiment with the project and use all 9 bits to display all the colors possible, but I don't want to deviate for the goal to use the limited frame buffer.

Click OK, also don't forget to save all.

On the output of the Mux (which is 4 bits wide) we need to put a floating wire (actually a bus)
In order to do that, select the wire tool, click on the canvas where to start the wire, and click again where to end the wire. Once done it will automatically changed it to a bus because the output port is 4 bits wide.

The reason we start away from the pin to create the floating wire, is that if you did it the other way around, you can't click on the empty canvas to end it.

We are not done with that wire. We need to name it because later on we need independent wires of that bus to connect to our rgb outputs.

You can double click on the wire, or right click on it and select "Edit Name and Width"

On the dialog, uncheck the "Use computed name and width"
Change the name to "rgb_out"
We can leave the rest alone.

I'm going to speed this up a bit for the following wires, create floating wires for the VGA pins.
You can move the Mux out of the way, also you can click on the rgb_out[3:0] wire name and move the label from the default location.
Also go ahead and connect the Mux control to the blank_n signal

Now we are going to rename those single wires.
This is an image of the first one, we are going to select a single line from the rgb_out.
bit 2 corresponds to red.

Do the rest so it looks like this (you can also select the 1st one, copy and paste then rename them as needed):
RIR GIG BIB, the intensity (I) applies to all three channels to boost the color.

On the Mux, when the value is 0 we want to blank the rgb_out, so we need to add a 4bit wide 0 digital constant:

It's 8 bits wide by default with the value 0xff (255) we need to change the width to 4 bits and the value to 0.
Right click on the constant and select configure.
Change the value to 0 and the width to 4.

Lets add a floating wire to  the line_cnt[9:0] pin, remember to start away from the pin, then click again on the wire and select "Edit Name and Width", uncheck the "use computed name and width" and change the name to "line_count"

Remember you can select the label and move it around to make it look better.
Save all! (I shouldn't say this anymore but it's important to save your work).

All that we have done so far, we are going to keep on the future progress of the project.

But just to display something we are just going to select 4 bits from the vertical line counter. Not too exciting since the line value remains constant through the whole line, but since it's temporary it would be easy to do and undo and we get to see some video on our VGA monitor :)

So add a floating wire to the remaining input pin in the mux.
Rename it line_count[3:0] (so it's 4 bits wide as needed). This will change the color every line since we are using the lower 4 bits. If you instead use line_count[4:1] it will change color every 2 lines.
[5:2] (still 4 bits wide) will change color every 4 lines, etc until [9:6] wich will change colors every 2^6 lines (64).

You can copy and paste the 10 bit one from the device, connect it, move the label out of the way and change the indices on the copy.

We are ready for some video, but first let's look at least at the red channel on the scope.
Save all and compile it.

(Oooops) I misspelled my Mux out wire, change the name to rgb_out. And compile it again.

Also we need to assign the pins for the VGA RGB pins. So go to the PSoC5LPVGA.cydwr and assign them there.

Note that I'm using pins 15[0] and 15[1] amongst my selection, Those two are important because if you want to use an XTAL you have to connect it there, so you will have to move your Green signal to some other pins.
Also P15[2] and P15[3] are use for the Real time low frequency XTAL if you need a 32.768KHz accurate RTC clock for other features.

I do have that VGA to BNC adapter so it's easy for me to probe the signals.
On my scope I'm going to trigger on VSYNC (Black Cable of the VGA to BNC) via the external trigger (I only have a two channel scope), Then Channel 1 is going to have the HSYNC (White BNC cable) and Channel 2 is going to have the Red channel (well... Red cable).

Program the device and let's see what we see in the scope. I have the trigger delayed by 1.2ms so we clear all the blanking area. Channel 1 for the HSYNC set to 2V per division (Since it's TTL 0V-5V) And Channel 2 for the red at 500mV per division, since the range is 0V - 0.7V, I do have my ratios set to X1 and the timing to 200us per division.

The actual pattern doesn't correspond to the final output, but the voltage levels are right, I had the pins mixed up and it's late so I'm not going to take another set of scope captures.

Things look good, let's take a close up at the 4 red levels, we are well under 1V which is considered safe for VGA signals, and I'm not using a 75 Ohm terminator

Here it is with a 75 Ohms terminator.

I did also check the Green and Blue signal and it's all within spec, so let's see what we get on the screen.

I did find some small timing violation that I'll investigate later, but, Yeah! we have video!
I added the Zip file of the project so far to this point.

External clock
Optional: Using DSI Signal as a precise Clock Input

The internal oscillator is not the most precise way to generate a pixel clock. The problem with that is that on the video you'll see some horizontal banding and some video tearing depending on the video resolution and what is displayed on the screen. Meaning if every pixel is a different value you will see more noise in the output image.

PSoC 5LP supports two alternatives to the IMO (Internal Main Oscillator) as a clock source.

The first one will be the use of the MHz ECO (MHz External Crystal Oscillator) supporting both crystal resonators and ceramic resonators in the 4-25MHz range.
There is also a kHz ECO for a kHz clock domain but that's more for a real time clock (RTC) which we don't use.

More details about this first option that we are not going to use here can be found in this cypress app-note:

The second option is to use the Digital Signal Interconnect (DSI) in the 0-33MHz range. This is the optional method we are going to implement, and more details can be found on this app-note:

I do have a 26MHz OCXO that I'm going to use. In it's simplest configuration you give it power/ground and it will spew a clock. You can trim the clock but that's beyond the scope of this tutorial. What we want is to show how to make use of an external signal clock (could come from another MCU or anything else that could provide a clock in that range).

The trimming is pretty much is a resistor divider and an inductor to clean the signal.
In my implementation I'm taking the external clock output of the OCXO into p3[0] of the dev board. But it could be any other port since the DSI clock is not restricted to any given pin.

I did choose this, because I didn't want to deal with using the XTAL, but the 1st method using a Crystal resonator is the best option because the OCXO consumes too much power and it's not cheap. A Ceramic Resonator like this one:
This will be better than the internal one and easy to configure without needing load capacitors. I pointed to a 20MHz resonator because our target is 40MHz, but if you wanted to achieve other frequencies you would have to select an XTAL that the PLL could derive your target frequenc(y/ies) from.

As for precision:
Very low PPM use OCXO or other temperature controlled oscillators or any other external clock via DSI.
< 1000 PPM use a Crystal Resonator adding load capacitors via XTAL.
< 50000 PPM use a Ceramic Resonator with built in load capacitors XTAL.
> 100000 PPM (1%) using the internal main oscillator (IMO) will be fine.

You can have one or more implemented and select the input clock later on. On this tutorial we are going to define the DSI clock at 26MHz, because that is the OCXO I have around, but I will leave the project to use the IMO.

Ok, time to bring that DSI clock into our project. For that we just need to bring a digital input pin to our schematic (To be found in the component catalog under Cypress->Ports and Pins.

Name it DSI_CLK_IN by right clicking on the input pin and selecting configure. Also go to the Input tab and set the Sync mode to be Transparent, because you can't sync on the input clock with the master clock that is derived from it. Note: I left the Drive mode under the General Tab, to be the default "High impedance digital".

Create a wire to it and name the wire dsi_clock_in

Also go to the .cydwr tab and assign the DSI_CLK_IN pin to the right port (in my case P3[0] on pin 29.

Let's configure the Digital Signal, for that you need to open the .cydwr file, click on the clocks tab, double click on the yellow area to bring the system clocks configuration, then enable the Digital Signal check mark. Open the "..." to select the source clock. select the dsi_clock_in signal and type the external signal frequency. I'm leaving the Accuracy at 0% (as in perfect), click OK.

Now on the PLL you can select the input from the IMO to the Digital Signal. I will leave the 2nd zip of this post to use the IMO but you can change the input clock for the PLL in here, this will change the Master Clock and the Bus clock as well but our desired frequency remains unchanged.

This is a capture using the IMO at 3 MHz with just 1% accuracy 17.6 ms after the vertical sync trigger with infinite persistence on to show the jitter (12us) just one video frame after the trigger.

This is a capture using the external OCXO after 60 frames (almost a full second after trigger) with infinite persistence showing no jitter at all.

---------------------------- Non optional update -----------------------------------------------------------------

Since I'm updating the code, I'm changing the h_count_r in VideoCtrl_v1_0.v to have 11 bits:
    reg [10:0] h_count_r;

Reason being is that if we try to use a horizontal 1024 visible mode, the 10 bits were not enough, so with this change we can handle up to 2047 horizontal visible lines.
Code for the counter is 1 based, to optimize when we reach the count, because I didn't want the module to require placing a 0 based pixel value, I could have added a constant but that would eat resources that we don't have.

Also we are going to change the temperature constrains (Thanks to Stephen AKA skench for finding this out) from Industrial to a commercial range. That could have been resolved by optimizing the Verilog mapping to the system resources but for this tutorial this is a good enough solution, so we can't operate now from -40C to 0C, oh well.

Updated code on Archive05 zip file attached.
Support images for this part of the update are going to be on the post with the update, since there is a limit of 25 files per post.
« Last Edit: December 28, 2015, 07:00:53 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Back from the break, so I had time this weekend to work on this 4th part.

Lets implement a frame buffer and set the DMA transfer to a control register that can be hooked to our VGA output.

-- Boring research stuff (to some, not to me):
The PSoC 5LP has 64KB of SRAM the actual implementation is 2 32KB blocks (this will be important later on)
By default the .ram section is defined as:
    ram (rwx) : ORIGIN = 0x20000000 - (65536 / 2), LENGTH = 65536
Essentially going from 0x1FFF8000 to 0x20007FFF.

This file is located in your Project under the Generated Source->PSoC5->cy_boot->cm3gcc.ld linker script file after you compile the project.

The lower 32KB 0x1FFF8000-0x1FFFFFFF is part of the code block or c-bus, and it's also present to the DMA controller in 0x20008000 - 0x2000FFFF. The mapping is handled automatically by the API so the lower portion is used when setting the DMA address.

The upper part 32KB 0x20000000 - 0x20007FFF in the sram block or s-bus is common for both the CPU and the DMA controller.

More info here (Direct Memory Access component datasheet): http://www.cypress.com/file/130631/download

More DMA info here:

When we declare a buffer, by default it will be placed on code section of the SRAM (0x1FFF8000) and you can find where it was placed after we compile by looking at the Results tab under CortexM3->ARM_GCC_493->Debug->PSoc5LPVGA.map that is if the buffer was declared and used.

The proper way to split the 64KB bus would be to use a custom linker and reducing the ram section to 32KB and adding a 2nd 23KB section, more information can be found here:
-- Hopefully end of boring research stuff.

The reason the memory split is important is because we want to avoid contention between the CPU and the DMA controller since by default both are accessing the same memory block so ideally we want to declare a CPU frame buffer on the lower 32KB and a duplicate DMA frame buffer in the upper 32KB or vice-versa. So the CPU will write on it's frame buffer and the DMA interrupt will copy that buffer to the DMA buffer during the vertical sync so that the line DMA transfer occurs between the upper memory block and the control register component. Also, in order to know what line the DMA needs to transfer we need access to line_count, more on that later.

We are going to cheat and instead of splitting the memory with a custom linker we are going to define another section with the linker options. The problem of this approach is that the compiler might overlap memory so you want to make sure that your variables declared in the SRAM code section don't go past 32KB because I'm not sure if the linker will catch the overlap.

So let's declare the upper 32KB section.

On the workspace top menu, under "Project" select "Build Settings...". Once that opens navigate to:
PSoc5LPVGA->ARM GCC 4.9-2015-q1-update->Linker->Command Line
and under Custom Flags type: -Wl,--section-start=.ram2=0x20000000
This tells gcc that we have a section named ram2 starting at 0x20000000 (The upper 32KB memory block)

Now that we have that we need to declare the buffers, but for that we need to know the resolution stated in the component. We only need the visible area, so let's create an API header to define those.

Go to the components tab, right click on the VideoCtrl_v1_0 and select API->API Header File.
Name it VideoCtrl.h and click on Create New

In the template add the following code:

Code: [Select]
#include <cytypes.h>
#include <cyfitter.h>
#include <CyLib.h>

#if !defined(`$INSTANCE_NAME`_H)
#define `$INSTANCE_NAME`_H

#define `$INSTANCE_NAME`_H_RES `$HorizVisibleArea`
#define `$INSTANCE_NAME`_V_RES `$VertVisibleArea`
// We could also declare these ones if we needed to access them.
// #define `$INSTANCE_NAME`_H_FP `$HorizFrontPorch`
// #define `$INSTANCE_NAME`_H_HP `$HorizSyncPulse`
// #define `$INSTANCE_NAME`_H_BP `$HorizBackPorch`
// #define `$INSTANCE_NAME`_V_FP `$VertFrontPorch`
// #define `$INSTANCE_NAME`_V_HP `$VertSyncPulse`
// #define `$INSTANCE_NAME`_V_BP `$VertBackPorch`
// #define `$INSTANCE_NAME`_H_DMA `$HorizDMAAdjust`
// #define `$INSTANCE_NAME`_H_PP `$HorizPulsePositive`
// #define `$INSTANCE_NAME`_V_PP `$VertPulsePositive`
`$INSTANCE_NAME` will be substituted with your instance name, so in our case it will be replaced with VideoCtrl_1, so with this our main.c can access VideoCtrl_1_H_RES and VideoCtrl_1_V_RES to obtain the actual needed resolution.

I didn't add the rest of the parameters because we don't need them, so I did leave them commented out. The preprocessor will change the values but will leave them commented out.

So, our target is 800x600 at the moment, that is 480,000 pixels, we don't have anywhere near 480,000 bytes with 8 colors per pixel, even with our 4 bits per color that would require 240,000 bytes per frame buffer, the max we have is 32KB per frame, so we are going to use 1 bit per pixel and duplicate the lines.

480,000 divided by 8 is 60,000. That divided by 2 is 30,000 which is as much as we can spare for a raw frame buffer. Of course we could take another approach and just buffer 8 lines at a time with 4 buffers, so while the DMA is transferring 8 lines, the CPU is preparing the next 8 lines based on a character buffer, but I'm going to keep it simple for this project.

So we are going to end up with a 100x300 byte buffer, each byte will hold 8 pixels and eventually the colors will be determined by another buffer that is just 100x38 that will hold a 4bit foreground color and a 4 bit background color per 8x16 cell (8x8 with duplicate lines).

--- Side note ---
Well, I think I see a problem already so we might need to go eventually to 640x480. 30,000 + 3800 is bigger than our 32KB buffer. No worries, we do have Flash memory that we could use, but it's a bit slower (not by much). I think when the time comes I'll switch to 640x480, maybe 720x400 or perhaps 768x576 or I'll never finish this tutorial :). Then again maybe using smaller line buffers and flipping between them is still an option. We'll see.
--- End side note ---

Let's stick with 800x600 for now since we are not going to introduce colors yet so, black and white, or any fixed foreground and background will do.

So we need to add a DMA component and an interrupt, also a Control Register for where the DMA transfers to and a couple of Status Registers to get the line counter.
(I've been trying to figure out how to get it from the verilog file and even declaring the line counter as a register but it doesn't show up in the fitter, this can be done by defining datapaths within verilog but it's a bit too advanced and it will take me a long time to go through that. Used to be that just declaring "reg" on a verilog output it will bring it automatically to cyfitter.h without having to declare datapaths, I might be missing something but I can't make it work that way, or maybe that was never the case).

Anyways, lets move on.

On the Schematic view lets drop a DMA and attach an Interrupt to it.
Also drop a Control Register (where the DMA will output data to the schematic.
And finally two Status Registers (because we need 10 bits and they only can handle 8 bits max each)

We are going to move things around after we configure all of these modules.

Right click on the DMA_1 component and select configure.
We are just going to change the name to simply DMA and enable the hardware request (drq) to be set to Rising Edge. This will be driven by the line_dma signal of the video controller. Remember we can adjust when this occurs with the video controller HorizDMAAdjust to fine tune when the data will be available. Btw I did change the value to 0 from the default 16 as an starting point.

Right click on the Interrupt component (isr_1) and select configure.
We are just going to change the name to SCANLINE so we can handle it better on our code, since that is more descriptive. That interrupt will happen at the end of every scanline so we can setup the next DMA transfer (as in selecting the new line), also on the last line (during vertical retrace) we can place code in there to copy the current CPU frame buffer into the video frame.

Next is the Control Register (Control_Reg_1) configure it to change the name to DMA_OUT and change it to display as a bus instead of individual bits, this will make it more convenient to use in the schematic.

This will hold the next byte of the DMA transfer (8 pixels either on or off), the output is not latched so later we are going to need to feed this to a D Flip-Flop that is clocked at 8 times less than the pixel clock.

Now lets address the line count problem. Since there is no easy way to get the line_cnt from our device to be registered and appear in the cyfitter.h file, we are using two 8bits status registers (we are only using 10 of those) So let's configure one of those as 8 bits and name it LINE_CNT_LO and the other one LINE_CNT_HI with just two bits. It doesn't matter which one you configure to what values.

Note: sticky didn't work, so leave them as transparent.

We are going to change the values to be Sticky so they stay set to the last value until we read the register from the code. Easiest way to do that is to select Sticky from that drop down and click Set all modes.

Note: sticky didn't work, leave them as transparent.
Transparent might work too since the register should hold the value, so if we need to read the line counter more than once per change we probably change these back to transparent.

Since our DMA transfer is 8 bits at a time and we need to serialize those bits, we need a clock that is 8 times slower than our pixel clock. right click on the PIXEL_CLK and select Copy, then paste the new clock instance in the schematic and configure it so that the divider is 8 instead of 1. Let's also name this clock CHAR_CLK

Forgot to mention, keep on saving your project :)

Next we are going to use a D Flip Flop to retain the DMA_OUT value (the 8 pixels to be displayed) and lets reorganize the canvas and wire what we have so far.

Let's configure that flip flop so it's 8 bits wide. Right click, configure:

We don't need presets or resets but I change the MultiPresetReset to false so if we need to add it, it will be a single bit instead of a full array width bus.

We'll get to coding soon, but now we need the output of that flip flop to be serialized and fed into our blanking Mux.

Lets add an output wire from the flip flop and call it pixel_bits with a range from 7 to 0. This is how the schematic should look so far:

To serialize those pixel bits we are going to use a 3 bit counter that drives a mux which selects the current pixel to be displayed.

So lets drop a basic counter and a mux into the canvas

Configure the counter to be a 3 bit counter (so it rolls over every 8 counts) that will drive the mux to select the current pixel.

Configure the mux to have 8 input terminals (one per pixel to select) with a terminal width of 1. We want the highest pixel first for the mux at count 0.

The output of this mux will be 1 if we want a lit pixel, or 0 if not. We will address colors way later.

So this mux will control yet another mux that has a terminal width of 4 and has two input terminals. I'm not doing individual steps because by now you should be able to configure those components.

This is the final wiring for the schematic.

Now we just need to write the code in main.c
I did put comments in the code.
I couldn't include it because of the 20000 characters limit so I'll attach it to the post, but it's in the project as well.

There are several problems to address at a later date.
1) The line counter is falling behind so I had to adjust it in the interrupt.
2) I had to use my OXCO, using the IMO didn't want to lock, I think it's time to specify datapaths in verilog and use the UDBs to help the timmings.
3) There seems to be a problem with the byte prior to the last value and it shows a double line at the end.
4) Our flip-flop clock needs to be 1/8th of the pixel clock, so the DMA adjust only can adjust by 8 pixel clocks.

The attached code will try to use the IMO but I don't think it will lock, it's too jittery I recommend that you get an external oscillator or an XTAL.

At least we do have the frame buffer mostly working, except that double line on the right :(

The main.c code is on the attachments named main_c.txt
« Last Edit: January 04, 2016, 02:10:01 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Let's get a Character-Set so we can display something better than just lines.

We are going to define the glyphs needed for a character set and load them into the PSoC EEPROM.

We have 2KB of EEPROM available which is exactly the size needed to put the original IBM PC CGA character set.

I found a picture of the character set via google images and used a paint editor (old Jasc Paint Shop Pro that I still use) to organize it as a 2048x8 pixel png 2 bit black and white image.

Then I exported it as a Sun Raster Image (.ras), feed it to Notepad++ that has a hex editor plug in to convert it to hex values and finally edited that to create a Comma Separated Value (.csv) needed to import the hex values into the EEPROM.

I'm attaching both the png file (cgafont.png) and the csv file (cgafont.csv.txt that needs to be renamed to cgafont.csv once downloaded).

The EEPROM implementation is really straight forward, so this post should not take too long.

First we want to drop an EEPROM componet instance into the schematic canvas.

Then right click and configure it to change the name to just EEPROM

To import the csv, open the PSoC5LPVGA.cydwr file and select the E2 EEPROM tab on the bottom.

The Import... button will be disabled until you check the "Initialize EEPROM in HEX file"

Once that is checked, you should be able to click the import and look for your CSV file. I did place mine in the current PSoC5LPVGA.cydsn project folder, but it could be anywhere I guess.

Click Open and that should be it to initialize the EEPROM, once programmed the datasheet states that it will hold the values for at least 20 years.

I decided to format it as all the upper rows continouslly, but it could have been organized differently as an 8x2048 instead with the first 8 bytes defining the full first character. The way I did it, you'll have to jump 256 bytes (0x0100) to get to the second row as illustrated for the 2nd character (smiley face)

That's all there is to program the EEPROM.
To use it we need to add the instance name plus _Start() in our main code.


And to access the contents for a given character you can use:
    CY_GET_REG8(CYDEV_EE_BASE + index + (y%8)*256);
    index   Is the sprite to be displayed from 0 to 255.
    y       Is the current frame buffer line.

So lets add a test code block to display the character set.

By the way, I did find out the vertical sync problem, it seems the memory copy (memcpy) done by the CPU to copy the CPU frame buffer into the DMA frame buffer takes longer than the vertical refresh time. I wished the BUS_CLK (CPU) was derived from the PLL instead of the MASTER_CLK. That way we could run the CPU at 80MHz while the schematic was still at 40MHz in our current case.

Oh well, In order to fix the problem we will need to add a second DMA memory to memory transfer doing 32 bits per clock so we can fill the DMA buffer quicker. We will address that on a 2nd part of this post.

I'm blanking the last line since that will be repeated while we are finishing the frame refresh corrupting the upper portion of the display, instead it will blank the top portion until the 2nd DMA is implemented.

With the current 800x600 timings it means we have 0.7392 ms to transfer 7,500 bytes 4 bytes at a time, at 40MHz per 32bits (four bytes) transfers that will take 0.5333<3> ms so I think we are going to be all right with over 0.2 ms to spare.

But we will implement that second cpu buffer to dma buffer transfer later in a second part to this post with a memory to memory DMA transfer doing full 4 bytes at a time.

This is the code that will display the characters.
Code: [Select]
#include <project.h>

// Get the resolution from the Video Controller instance.
#define VGA_RES_X VideoCtrl_1_H_RES
#define VGA_RES_Y VideoCtrl_1_V_RES
// Our buffer will be one bit per pixel so we only need 1/8th for our horizontal dimension.
#define VGA_X_FACTOR 8
// We don't have enough memory so we are going to duplicate the vertical lines
// to save on memory requirements.
#define VGA_Y_FACTOR 2
// This is our final dimmensions for our frame buffers.

// Define our frame buffers making sure the X dimension is continuous in memory.
// CPU frame, by default this will be on 0x1FFF8000 Code SRAM space (i.e. section .ram)
uint8 cframe[VGA_Y_BYTES][VGA_X_BYTES];
// DMA frame, Declare the DMA video frame  to go in our new section .ram2 located at 0x2000000
uint8 dframe[VGA_Y_BYTES][VGA_X_BYTES] __attribute__ ((section(".ram2")));

// Declare our DMA channel and our DMA Transaction Descriptor.
uint8 dmaCh, dmaTd;

// Set up a refresh signal so the CPU can refresh the DMA buffer.
volatile int refresh = 1;

// ScanLine Interrupt
// This gets called everytime our DMA transfer is done, so we can setup the next line
// or if we are on the last line we will refresh the dma frame with the current cpu frame.
    // Get our line count from both status registers
    // LINE_CNT_HI holds the upper 2 bits
    // LINE_CNT_LO holds the lower 8 bits
    uint16 line = ((LINE_CNT_HI_Status<<8))|LINE_CNT_LO_Status;

    // We don't want to change anything unless we are past line 0.
    if (line)
        // Check if we are within the visible area.
        if (line < VGA_RES_Y)
            // Update the next DMA transfer for the next line.
            // adusting the line by the Y skip factor.
            if ((line % VGA_Y_FACTOR) == 0)
                CY_SET_REG16(CY_DMA_TDMEM_STRUCT_PTR[dmaTd].TD1, LO16((uint32) dframe[line / VGA_Y_FACTOR]));
        else if (line == VGA_RES_Y)
            // On the last line since we are going to enter vertical sync
            // Indicate the CPU that it's ok to refresh the screen.
            CY_SET_REG16(CY_DMA_TDMEM_STRUCT_PTR[dmaTd].TD1, LO16((uint32) dframe[0]));
            refresh = 1;

int main()

    CyGlobalIntEnable; /* Enable global interrupts. */

    // DMA setup
    // Alocate a transaction descriptor.
    dmaTd = CyDmaTdAllocate();
    // Initialize the DMA channel to transfer from the dframe base address to the control base address.
    // This indicates the high 16 bit address that will apply to the low addresses set on the TD.
    dmaCh = DMA_DmaInitialize(1, 0, HI16((uint32) dframe), HI16(CYDEV_PERIPH_BASE));
    // Configure the transaction descriptor for the first transfer.
    // Transder VGA_X_BYTES with auto increment and signalling the end of the transfer.
    // use the single transaction descriptor as our next TD as well.
    CyDmaTdSetConfiguration(dmaTd, VGA_X_BYTES, dmaTd, DMA__TD_TERMOUT_EN | TD_INC_SRC_ADR);
    // Set the destination address to be our DMA_OUT control register in the schematic.
    CyDmaTdSetAddress(dmaTd, LO16((uint32) dframe[0]), LO16((uint32) DMA_OUT_Control_PTR));
    // Set the channel transaction descriptor that we just configured.
    CyDmaChSetInitialTd(dmaCh, dmaTd);
    // Finally enable the DMA channel.
    // This will start the first transfer and call the interrupt after every line.
    CyDmaChEnable(dmaCh, 1);

    // Interrup Setup.
    // Set our interrupt for SCANLINE to the interrupt function declared above.
    // Initialize EEPROM
    // Character set resides here and it is accessible with the following expression:
    //      CY_GET_REG8(CYDEV_EE_BASE + index + (y%8)*256);
    // Where:
    //      index   Is the sprite to be displayed from 0 to 255.
    //      y       Is the current frame buffer line.

    // Lets just setup something to display in here.
    // for now just setup a border to see if we get it all in frame.
    // lest set it as a define, so we can easily compile out the code
    // when we don't need this test.
    // The first test takes priority if the rest are defined.
#define TEST_CHAR_SET 1
#define TEST_BORDER 0
    int x = 0, y = 0;
    for (y = 0; y < VGA_Y_BYTES; y++)
        for (x = 0; x < VGA_X_BYTES; x++)
            // Leave blanks in between characters to place graphical characters separators
            int index = ((y/16)*(VGA_X_BYTES/2)+x/2)%256;
            // Temporary vertical timing fix.
            // Apparently the memory copy done by the CPU takes longer than the vertical retrace
            // So to fix this we need to add a second DMA
            // To be implemented later
            // Leave the last line blank until we add a 2nd DMA
            // Otherwise that last line will appear on the top of the screen
            // when the memory copy is done and repeated until we enable the DMA.
            if (y == (VGA_Y_BYTES-1))
                index = 0x00;
            else if ((y%16)/8 == 0)
                if ((x%2) == 1)
                    // Separator in cross spaces '+'
                    index = 0xc5;
                    // Separator between vertical characters '-'
                    index = 0xc4;
            else if ((x%2) == 1)
                // Separator between horizontal characters '|'
                index = 0xb3;
            // Fill the current frame buffer with the selected character
            // row of pixels.
            cframe[y][x] = CY_GET_REG8(CYDEV_EE_BASE + index + (y%8)*256);
    int x = 0, y = 0;
    for (y = 0; y < VGA_Y_BYTES; y++)
        for (x = 0; x < VGA_X_BYTES; x++)
            // On the first and last line we are setting all the pixels on.
            if ((y == 0) || (y == (VGA_Y_BYTES-1)))
                cframe[y][x] = 0xff;
                // On the rest of the lines.
                if (x == 0)
                    // Set the highest nibble for the left border.
                    cframe[y][x] = 0x80;
                else if (x == (VGA_X_BYTES-1))
                    // Set the lowest nibble for the right border.
                    cframe[y][x] = 0x01;
                    // The rest is all blank (or background color.
                    cframe[y][x] = 0x00;

    // We could update the CPU frame buffer (cframe) within the for loop.
    // like for example implement a Pong game.
    // The DMA interrupt and hardware will take care to update the DMA frame buffer.
        // Refresh the screen when the interrupt sets the refresh bit on.
        if (refresh)
            // Apparently the memory copy takes longer than the vertical retrace
            // So to fix this we need to add a second DMA
            // To be implemented later
            // Disable the DMA channel
            // Copy the CPU frame buffer into the DMA frame buffer
            memcpy(dframe, cframe, VGA_Y_BYTES * VGA_X_BYTES);
            // Enable the DMA channel
            CyDmaChEnable(dmaCh, 1);
            // We are done refreshing so reset refresh to 0
            refresh = 0;
            // Here we can put code that modifies the CPU frame when we are not busy updating
            // the DMA buffer.

And the result:

Edit: resized image

Well, we did get something, but not quite what we expected.

So one more problem, seems our characters are shifted, and that makes sense. If we look at the schematic, the PIXEL_SELECT 3-bit counter is free running, and even if it's synched with the PIXEL_CLK it's quicker than the Video Controller so it's off by what it seems a couple of pixels.

Easy fix, we hook the counter reset to the line_dma and use the DMA Adjust value to sync it.
To do that I'll just name the wire going to the DMA transfer so we can replace the constant in the counter reset pin with that signal.

While we are at it, I'm going to add a 2nd flip-flop on the DMA output control register to help with the vertical timings, even if eventually we will implement the memory to memory 2nd DMA.

And for fun I'll change the background color from black to blue.

Also on the VideoCtrl_1 we need to change the HorizDMAAdjust to 27 pixels, with the 2nd flip-flop, this is a band-aid to get the full screen at this particular resolution.

Only thing left is to sync the CHAR_CLK with the line_dma, maybe using a Frequency Divider instead.
We will do that later.

The result:

As you can see the vertical timing issue went away, but that's no substitute to the 2nd DMA to free the CPU from any heavy lifting.
Edit: also since we are skipping every other vertical line at the current 800x600 video mode we get 300 pixel resolution, that divided by 8 is 37.5 characters, that's why the last row only shows half a character, so that's on purpose.

Since the CHAR_CLK is not synced to the line_dma we can't fine adjust the 1st character so it's chopped off.

I also attached the current state of the project, this time is using the Digital Signal clock because the IMO can't cope with timings yet, I'll address that on the next full post.

-- Continuation

To fix the memcpy latency I decided to only update 1/10th of the screen per frame, so that will put us at 6fps even if the video is still updating at 60fps.

I implemented the 2nd DMA transfer but its ifdef out at the moment while I adjust and find out why it's not working as expected.

The code is in the last archive (08) but I'm attaching the main.c as a separate attachment.

I did replace CHAR_CLK with a Frequency Divider component that syncs the pixel stream with line_dma, and added a couple of delays using D-Flip-Flops for more fine adjustments.

If I needed to revise the project I would add more sync signals to the video controller component.

The memcpy via DMA is coded and I did place comments in there for who want to know the progress of that attempt. I'll fix it and hopfully that will up our screen update rate to something better than 6 frames per second.

On the main loop I'm flipping the characters in the CPU buffer not the grid characters at max current frame rate of 6fps.
You can update it faster but since only 1/10th of the screen is updated to the DMA buffer visually you will see them flipping in unison at 6fps.

--- Fixed DMA and we are at 60 fps  :)
Archive 09 has the latest project
DMAmain.c is the final code using the 2nd DMA channel for the memcpy
And I was able to change the DMAAdjust value to 18 and still meet the clock requirements so one less flip flop

Edit: so this means that we have at least 636,942.7 CPU cycles left when it's not refreshing because the DMA copy is probably less than the 28 lines required at 800x600 at 60 Hz.
That means we have at least 15.924 ms left out of the 16.667 per frame to do things to the CPU frame buffer.
« Last Edit: January 11, 2016, 08:15:14 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
6) Optimizing the Verilog code to use PSoC UDBs and Datapaths to make better use of the resources available.

I'm not going to release the full project from now on, but I'll put all the necessary steps to get you there. You can refer to the component creation at the beginning if you need help and based on the pictures you'll be able to reproduce this. Reason being is that just opening the actual zip file and looking around doesn't teach you much so now you'll have to work at it. Plus this hopefully will end up with the same functionality but just optimized to use less resources.

There is a lot of information in Creator itself, under Help->Documentation->Component Author Guide
Under the Documentation you'll find also the Warp Verilog Reference Guide as well as the PSoC Creator Datapath Configuration Tool User Guide and many other documents.

Also there is good information on the Cypress website and YouTube channel, also look up for "PSoC Sensei Blogs" oldies but goodies.

PSoC UDB Description

A PSoC UDB contains a few things, two PLD macrocell blocks (where the synthesized Verilog lives), an 8bit Datapath where the ALU lives (more on this later), an 8bit Control register that allows CPU/DMA input to be passed to the UDB, an 8bit Status register that allows the UDB to pass information back to the CPU/DMA, a dedicated 7bit downcounter that can be used as well if the ALU is not enough for the task or to balance the total use of resources (I think we don't need to use this for our task). There are also two clock busses a user clock and a bus clock, but we are not going to get to any details on the clocks in this tutorial.

Here are two different block diagrams that shows one UDB:

Each of those two PLDs has 12 inputs and 8 product terms (P-Terms) and 4 outputs, There is an AND array and an OR array that are configured to synthesize the Verilog code, this is also where the UDB state machines reside.

Since the PSoC 5LP has 24 UDBS that means it has 48 PLDS and with 4 outputs each for a total of 192 Macrocells and 384 product terms.
All of that is not that important since we will never code the Macrocells directly, but the thing is, that there are not many of them and you can run out of them very fast.

This shows the PLD architecture and structure:

On our current implementation you can check out on the Results Tab of the project there is a file named PSoC5LPVGA.rpt that is generated after compiling, this shows our current UDB usage

Code: [Select]
Resource Type                 : Used : Free :  Max :  % Used
UDB                           :      :      :      :       
  Macrocells                  :  103 :   89 :  192 : 53.65 %
  Unique P-terms              :  180 :  204 :  384 : 46.88 %
  Total P-terms               :  191 :      :      :       
  Datapath Cells              :    0 :   24 :   24 :  0.00 %
  Status Cells                :    2 :   22 :   24 :  8.33 %
    Status Registers          :    2 :      :      :       
  Control Cells               :    1 :   23 :   24 :  4.17 %
    Control Registers         :    1 :      :      :       
So we are using 103 Macrocells and only have 89 left over, also the Verilog made usage of 180 P-terms only 12 of them are reused it seems and no Datapaths.

We are using 2 Status Registers (on the schematic for the line counter) and one Control Register (in the schematic as well that feeds the line DMA with the screen buffer). All in all we are well within the resource usage, but we can do way better. After all for any MCU since they are so limited on resources is a good thing to simplify things so it won't require as much power and allows extensions later on as well.

Datapath and ALU:

The Datapath is 8 bits wide and can do simple arithmetic and logic operations. they can be chained to 16, 24 and 32 bit wide processors.

The Datapath has an 8 bit configuration Ram which contains 3 bits to store the current instruction address, up to 8 instructions per Datapath can be configured. I'm not covering the rest of the control registers except that we will use one of those later on.

Then we have the actual ALU that can do any of the following 8 functions:
Increment, Decrement, Add, Subtract, AND, OR, XOR and NOP (Pass through)

The Datapath has 8-bit working registers (chainable).  We have two accumulators (A0 & A1), two data registers (D0 & D1) and two 4-byte deep FIFOs (F0 & F1). The FIFOs can be configured to act as extra registers (we will use of that feature). There are two comparators that can perform "less than" and "equal to" operations between the accumulator and a data register or the other accumulator. The accumulators also have a zero detect and a all ones detect. Lastly there is a shifter and a mask (we won't be using those).

There are two ways to make use of the Datapaths:
1) There is the Datapath Configuration tool that will modify your Verilog and offers the most flexibility,
2) The new (since Creator 3.0) UDB Editor that is more graphical and does generate Verilog as well, but you can't modify the generated Verilog.

We are going to use the former because we do want to be able to modify the Verilog, but the other one will work as well since you can create state machines using that tool.

We are going to upgrade our component. We could modify the other one but instead we are going to create a new version, so when we are done we just need to update the component, if we leave the shape and connections alone we won't need to rewire anything.

On the component tab right click on the Project and add a new Component Item. Select the Empty Symbol (Under Symbol) and name it VideoCtrl_v1_1.

Instead of clicking on "Create New" click on the arrow next to it and on the drop-down choose "Add Existing".

On the left you'll see 3 dots and navigate to where the previous version is at and open VideoCtrl_v1_0.cysym. In my case it's located here:

C:\Users\Miguel\Documents\PSoC Creator\VideoWorkspace\PSoC5LPVGA.cydsn\VideoCtrl_v1_0

Double click on the title and rename it from VideoCtrl_v1_0 to VideoCtrl_v1_1. The symbol properties and the parameters should have come over with all the right settings, like the APIPrefix and the parameter's hardware flags. This is the easiest way to create a new version of the component.

Right click on the empty part of the canvas and Generate Verilog, click on Generate on the dialog and save your progress.

In order to interface with the UDBs we are going to need a different kind of a state machine. One will drive the state, and based on that state another part is going to drive the instruction out of 8 total to be executed at that state.

Also since the UDB is only 8 bits we need to do delay the visible area in a different way because 800 for example doesn't fit on 8 bits. Since all resolutions are divisible by 8 we are going to cycle through the Visible state 8 times so it will take 8 times the number of the count. We could have chained two UDBs and make it 16 bit wide, but why waste resources and use 4 UDBs when we can do it with just 2 of them.

So our horizontal and vertical states now need to be 4 bits wide instead of 2 since we ended up with 11 states.

A datapath only has 2 inputs (the data registers) since the accumulators are going to be used internally, but it has also 2 FIFOs that we can setup as a single buffer entry.

We are going to use two UDBs, one for the Horizontal and one for the Vertical. For each UDB we will use the following registers/FIFOs adjusting them by 1 because we are going to count until we reach 0.
So for 88 it will go from 87 until 0. The Visible part will be divided by 8 (shift 3 to the right) and then adjusted by 1.

D0 for the Front Porch - 1
D1 for the Sync Pulse - 1
F0 for the Back Porch - 1
F1 for the Visible part>>3 - 1

For the AdjustDMA that drives the line_dma signal, since we don't have more inputs, we'll use a Count7 from one of those UDBs and activate it when we enter the Horizontal Back Porch.

Total usage should be 2 UDBs and a Count7 instance, that should reduce our macrocell usage.

Let's put the following Verilog code in the modifiable code section:

Code: [Select]
Partial code attached in post, no space due to post size limit.
filename PartialVideoCtr_v1_1_v.txt

This has the logic for the state machines, but the actual counting is done by the Datapaths with two identical configurations.

Save your verilog file (Important) and open the Datapath Config Tool (in the Tools menu)
Say OK to the warning about being an advanced tool for experts that have studied the technical reference manual, or go study it, actually I would recomend you do because I can't explain everything in this post.

Anyways, Go to File->Open and locate your Verilog file, mine is under: C:\Users\Miguel\Documents\PSoC Creator\VideoWorkspace\PSoC5LPVGA.cydsn\VideoCtrl_v1_1

One thing, Do Make sure your Verilog file is saved, Both Creator and the Datapath Config Tool will modify the same file so you only want to be changing it with either tool, not both at the same time.

And you'll be greeted with this scary interface full of overwhelming settings, but don't worry, we are just going to use very simple capabilities.

It's empty right now, so lets add a configuration by selecting Edit->New Datapath...

Name the instance name as HorizDP, select the type to be a cy_psoc3_dp8 (don't worry psoc3 it's just a legacy name they still use) and click OK.

Now it will have all the default values, all the functions are set to PASS (meaning NOP), so now we want to add our instructions to match the operations we need in the order we specified in the code.
Code: [Select]
    localparam EXEC_NOP       = 3'd0;   // Pass
    localparam EXEC_LOAD_FP   = 3'd1;   // Load FP in D0 into A0
    localparam EXEC_LOAD_SYNC = 3'd2;   // Load Sync in D1 into A1
    localparam EXEC_LOAD_BP   = 3'd3;   // Load BP in F0 into A0
    localparam EXEC_LOAD_VIS  = 3'd4;   // Load Vis in F1 into A1
    localparam EXEC_DEC_A0    = 3'd5;   // Decrement A0
    localparam EXEC_DEC_A1    = 3'd6;   // Decrement A1
As shown in this image:

The Reg value has to match our localparam value.
The comment could be the descriptions on the comment but I decided to use the last part of the localparam names.

Reg0 does nothing (PASS), it just passes A0 and doesn't store anything anywhere, we just put the comment NOP.
Reg1 does nothing, but it moves D0 into A0 as depicted by "A0 WR SRC" on top, and we changed the comment to LOAD_FP
Reg2 does nothing, but it moves D1 into A1 as depicted by "A1 WR SRC" over it, comment LOAD_SYNC
Reg3 does nothing, but it moves F0 into A0, comment LOAD_BP
Reg4 does nothing, but it moves F1 into A1, comment LOAD_VIS
Reg5 has a DEC Function with SRCA being A0 and writes the output of the ALU into A0, decrementing A0, comment DEC_A0
Reg6 Same, but SRCA is A1 and writes the output of the ALU result into A1, decrementing A1, comment DEC_A1

That's it for the Horizontal one, told you nothing to worry about, we even have a spare entry that we didn't use.

We could just leave it at that and reuse the configuration for the vertical Datapath, but to keep it simple let's create a second Datapath for the vertical with it's own configuration.
In any event even if we did reuse the Datapath configuration we will still need two Datapaths.

First lets copy this Datapath with Edit->Copy Datapath
Then create one with Edit->New Datapath...
Name it VertDP, select the type to be a cy_psoc3_dp8 click OK.

The current configuration should have VertDP_a( 8 ) selected.
So do: Edit->Paste Datapath.

And we are done with the vertical as well, not that scary after all, huh?

Save it because we are going back to the Verilog editor.
It will ask you to reload the file because it has changed, so click OK.
Close the Datapath Configuration Tool, since when you change the Verilog file it will keep on asking if you want to reload it, since we are done with the Datapath just close the tool.

The Datapaths are at the end of the Verilog file, now we just need to hook up our signals.
On HorizDP modify the following lines so they read like this:

        /*  input                   */  .clk(udb_clock),
        /*  input   [02:00]    */  .cs_addr(h_cs_addr),
        /*  output                  */  .z0(h_count_z0),
        /*  output                  */  .z1(h_count_z1),

On VertDP same thing with this:

        /*  input                   */  .clk(udb_clock),
        /*  input   [02:00]    */  .cs_addr(v_cs_addr),
        /*  output                  */  .z0(v_count_z0),
        /*  output                  */  .z1(v_count_z1),

clk is just our synced clocked (comments in the portion of the code attached)
cs_addr selects what instruction is going to be executed and both the horizontal and vertical state machines select when to do a NOP or the appropriate Load or Decrement instructions.
z0 and z1 are outputs that will be true if A0 or A1 are zero (which will happen when the countdown is over)

And that's it for the Verilog part of it (unless there is a bug, which is likelly)

Now, we need to write our API, so go to the component and Add Component Item.
Select API Header File, don't fill in the Item name, and use Add Existing. Navigate to our previous VideoCtrl.h file and click on Add Existing.

We are going to need to modify our API to initialize the DataPaths. So this component is still not ready to be used.

If you close the project and later start it and it prompts you to upgrade your module, just cancel out of it since this one wont work just yet. It will however compile and run, but you won't see anything on the screen (don't even try to hook it because it acts randomly for now.)

I did update mine just to check the usage (and to make sure I didn't screw up the new Verilog implementation (well I might have but it does compile), The rest of the resources stay the same, but this is what we get now on the UDB section:
v1.1 (new one)
Code: [Select]
UDB                           :      :      :      :       
  Macrocells                  :   68 :  124 :  192 : 35.42 %
  Unique P-terms              :  134 :  250 :  384 : 34.90 %
  Total P-terms               :  142 :      :      :       
  Datapath Cells              :    2 :   22 :   24 :  8.33 %
  Status Cells                :    3 :   21 :   24 : 12.50 %
    Status Registers          :    2 :      :      :       
    Routed Count7 Load/Enable :    1 :      :      :       
  Control Cells               :    2 :   22 :   24 :  8.33 %
    Control Registers         :    1 :      :      :       
    Count7 Cells              :    1 :      :      :       
vs v1.0 (older one)
Code: [Select]
UDB                           :      :      :      :       
  Macrocells                  :  103 :   89 :  192 : 53.65 %
  Unique P-terms              :  180 :  204 :  384 : 46.88 %
  Total P-terms               :  191 :      :      :       
  Datapath Cells              :    0 :   24 :   24 :  0.00 %
  Status Cells                :    2 :   22 :   24 :  8.33 %
    Status Registers          :    2 :      :      :       
  Control Cells               :    1 :   23 :   24 :  4.17 %
    Control Registers         :    1 :      :      :       

We are using one extra Status Cell and Control Cell (because of the count7 component for the line_dma)
But that's ok because of the two Datapaths we used we didn't need to use their Status or Control registers.

We are also using 2 Datapath Cells out of the 24 available.
But our macrocell and P-Terms got reduced drastically, maybe not enough to fit on a PSoC4 without further optimization.
Still we reduced our Macrocell count and P-terms considerably and we have now plenty of resources left.

Maybe using the UDB Editor drawing the state machine might have been a better option since they also have a way to use F0 and F1 as buffers instead of FIFOs (still to come). But the exercise at hand is more of showing how things hook up and an introduction to the Datapaths.

Next we will initialize our Datapaths and configure them via the API, so remember this is not ready to be used so reject the upgrade notice when you open the project again, we will do that when it's finished on the next post.

Might not hurt your Monitor but surelly wont display a thing as is just driving the horizontal sync at 357.148kHz

I'm reaching the limit of the post length, so I'll finish this hopefully tomorrow on the next post.
If I find bugs on the yet untested new Verilog implementation I'll add the changes on the next post as well.
« Last Edit: January 18, 2016, 06:18:08 AM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
So we need to expand our API to do the following:

1) Set the FIFO into single buffer mode.
2) Setup Macros to access the two datapaths data registers and FIFOs to set the adjusted values.

Information about FIFO single buffer mode:

FIFO single buffer mode:

TRM page 187 FIFO Control Bits, FIFO0 CLR & FIFO1 CLR bits:

In order to change the FIFO into single buffer mode we need to access the UDBs Auxiliary Control Registers for each UDB.

Open the new component's header (VideoCtrl.h) and lets put some definitions and a function prototype
Make sure you are on the right VideoCtrl.h for this version, since you probably still have the v1_0 one still opened.
I recomend you go to the top menu Window->Close All Documents then open it from the component tab for version v1_1
Code: [Select]
#include <cytypes.h>
#include <cyfitter.h>
#include <CyLib.h>

#if !defined(`$INSTANCE_NAME`_H)
#define `$INSTANCE_NAME`_H

// Don't adjust the existing resolution parameters definitions
// used in main.c so this updated version is compatible with the previous one.
#define `$INSTANCE_NAME`_H_RES `$HorizVisibleArea`
#define `$INSTANCE_NAME`_V_RES `$VertVisibleArea`

// Instead make a new VIS definition for the UDB configuration
// The API will use these defines internally to set the UDBs
// Use the parameters as unsigned values.
// Adjust the visible parameters by dividing them by 8 minus 1 since it's 0 based
// This is done so we can fit them into the 8 bit single buffer FIFO
#define `$INSTANCE_NAME`_H_VIS ((`$HorizVisibleArea`u>>3u)-1u)
#define `$INSTANCE_NAME`_V_VIS ((`$VertVisibleArea`u>>3u)-1u)
// The rest will fit in the 8 bits registers/FIFOs so just adjust them to be 0 based.
#define `$INSTANCE_NAME`_H_FP  (`$HorizFrontPorch`u-1u)
#define `$INSTANCE_NAME`_H_SP  (`$HorizSyncPulse`u-1u)
#define `$INSTANCE_NAME`_H_BP  (`$HorizBackPorch`u-1u)
#define `$INSTANCE_NAME`_V_FP  (`$VertFrontPorch`u-1u)
#define `$INSTANCE_NAME`_V_SP  (`$VertSyncPulse`u-1u)
#define `$INSTANCE_NAME`_V_BP  (`$VertBackPorch`u-1u)

// We don't need these ones yet.
//#define `$INSTANCE_NAME`_H_DMA `$HorizDMAAdjust`
//#define `$INSTANCE_NAME`_H_PP `$HorizPulsePositive`
//#define `$INSTANCE_NAME`_V_PP `$VertPulsePositive`

// Define the Horizontal and Vertical access to the UDB various registers and FIFOs
// Add defines to access them both directly or as a pointer
// All of these are in cyfitter.h but we want to give them a more comprehensive name.

// Define the access to the UDB Auxiliary Control Register needed to set the FIFOs in single buffer mode

// Define access to the Front Porch, Sync Pulse, Back Porch and Visible.
// So they match where our ALU expects those to be at as stated in our Verilog implementation.
//    localparam EXEC_LOAD_FP   = 3'd1;   // Load FP into D0
//    localparam EXEC_LOAD_SYNC = 3'd2;   // Load Sync into D1
//    localparam EXEC_LOAD_BP   = 3'd3;   // Load BP into F0
//    localparam EXEC_LOAD_VIS  = 3'd4;   // Load Vis into F1
#define `$INSTANCE_NAME`_H_FP_REG           (*(reg8 *) `$INSTANCE_NAME`_HorizDP_u0__D0_REG)
#define `$INSTANCE_NAME`_H_FP_PTR           ( (reg8 *) `$INSTANCE_NAME`_HorizDP_u0__D0_REG)
#define `$INSTANCE_NAME`_H_SP_REG           (*(reg8 *) `$INSTANCE_NAME`_HorizDP_u0__D1_REG)
#define `$INSTANCE_NAME`_H_SP_PTR           ( (reg8 *) `$INSTANCE_NAME`_HorizDP_u0__D1_REG)
#define `$INSTANCE_NAME`_H_BP_REG           (*(reg8 *) `$INSTANCE_NAME`_HorizDP_u0__F0_REG)
#define `$INSTANCE_NAME`_H_BP_PTR           ( (reg8 *) `$INSTANCE_NAME`_HorizDP_u0__F0_REG)
#define `$INSTANCE_NAME`_H_VIS_REG          (*(reg8 *) `$INSTANCE_NAME`_HorizDP_u0__F1_REG)
#define `$INSTANCE_NAME`_H_VIS_PTR          ( (reg8 *) `$INSTANCE_NAME`_HorizDP_u0__F1_REG)
#define `$INSTANCE_NAME`_V_FP_REG           (*(reg8 *) `$INSTANCE_NAME`_VertDP_u0__D0_REG)
#define `$INSTANCE_NAME`_V_FP_PTR           ( (reg8 *) `$INSTANCE_NAME`_VertDP_u0__D0_REG)
#define `$INSTANCE_NAME`_V_SP_REG           (*(reg8 *) `$INSTANCE_NAME`_VertDP_u0__D1_REG)
#define `$INSTANCE_NAME`_V_SP_PTR           ( (reg8 *) `$INSTANCE_NAME`_VertDP_u0__D1_REG)
#define `$INSTANCE_NAME`_V_BP_REG           (*(reg8 *) `$INSTANCE_NAME`_VertDP_u0__F0_REG)
#define `$INSTANCE_NAME`_V_BP_PTR           ( (reg8 *) `$INSTANCE_NAME`_VertDP_u0__F0_REG)
#define `$INSTANCE_NAME`_V_VIS_REG          (*(reg8 *) `$INSTANCE_NAME`_VertDP_u0__F1_REG)
#define `$INSTANCE_NAME`_V_VIS_PTR          ( (reg8 *) `$INSTANCE_NAME`_VertDP_u0__F1_REG)

// Function Prototype to Initialze the UDBs
// We will follow the convention of using _Init as other components do
// And make it PSoC3 compatible (for Keil)
void  `$INSTANCE_NAME`_Init(void) `=ReentrantKeil($INSTANCE_NAME . "_Init")`;

The first set of defines are to get the adjusted values for 0 based counts, also making the visible count fit in the 8 bit register by dividing it by 8 with shift operation. Every shift to the right divides the value by two, so three shifts divides it by 8.

Then we added definitions to access the UDB registers and FIFOs, we could use the names generated in cyfitter.h directly but this way when we write the code it will be more readable. Those registers allows the CPU to be able to configure the UDBs during the component initialization.

Last we added a function prototype that will initialize the module and the UDBs.
When used by a program, `$INSTANCE_NAME` will be substituted with your instance name and everything should work fine.

Now we need to add the implementation of the _Init function where all the magic happens.
So on the Component Tab right click on the VideoCtrl_v1_1 component and select "Add Componet Item".

Make sure you change it from Use Existing to Create New, since it remmembes the last setting.
Select "API C File" and name it VideoCtrl.c since after all is the code implementation for VideoCtrl.h

The code to put in there is fairly simple
Code: [Select]
// Include the header of our instance, all the needed definitions will be there.
#include "`$INSTANCE_NAME`.h" 

// Initialize the component
void  `$INSTANCE_NAME`_Init(void) `=ReentrantKeil($INSTANCE_NAME . "_Init")`
// Store the current interrupt state here.
    uint8 interruptState;

    // Set both the Horizontal and Vertical FIFOs (F0 & F1) in single buffer mode.
    // Pretty much this just stops the FIFOs from changing to the next entry
    // making it act just like a register.
    // Enter critical section
    interruptState = CyEnterCriticalSection();
    // Set the Count Start bit for both FIFOS (lower two bits)
    // On both the Horizontal and Vertical DataPaths.
    `$INSTANCE_NAME`_H_DP_AUX_CTL_REG |= (0x03);
    `$INSTANCE_NAME`_V_DP_AUX_CTL_REG |= (0x03);
    // Exit critical section
    // Setup all the UDB Data registers and FIFOS
    // For both the Horizontal and Vertical UDBs

In order to change the auxilary control register, we have to do that in a Critical Section as the link about setting the FIFO in single buffer mode explains.

We need to set both F0 and F1 for both the Horizontal and Vertical Datapaths.
Then thanks to all those defines we move the parameters into the UDB instances with our adjusted values.

And that's all there is to it, now time to test it, NOT on the display but by looking at the scope to see if we goofed.

Updating Component
So let's update our componet to the new version.
On the Source Tab right click on your Project PSoC5LPVGA and select "Update Components..."

You'll notice that it shows that there is a new version, so click Next.

The second dialog has a checkbox on, to "Create workspace archive before updating" you can uncheck it if you don't want extra archives around, but it's there so if something goes wrong, you still have a backup of your previous version. Click Finish and we have now our upgraded version.

Only one thing left to do, we need to call the _Init function in our main.c file.
So anywhere in the main.c before the for loop add the following line:
And compile the code. If you are curious you can look at the Project->Generated_Source->VideoCtrl_1 and you will see the API for our particular instance both the header and the C file with all the final values.

Testing the new UDB based component

Well, it didn't work, I just get a 108.994kHz HSync with a positive pulse of 3.240us

I have to make a Verilog test harness to debug what is going on. So I think for now we'll go back to v1_0
And remove the VideoCtrl_1_Init() from main.c

But at least (other than not working) I hope this shows how the UDBs can be used.
Also how a component can be updated to a new version, so it's not all lost.

I might revisit this at a later day.

I think also I'm going to skip talking with another MCU and the external memory for now, I'll shift things around.
Maybe I'll do the analog input to make a simple scope or the USB HID, or better yet a color buffer.

Edit: I forgot to start the Count7 in the critical section, that's not the whole problem but just noting that there is a bit that enables the Count7 and has to be set in there.

7) Future expansion, Using available pins to communicate with other MCUs, Parallel or serial.
« Last Edit: January 19, 2016, 06:27:12 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
I think this is the end of the line for this project. Reason being that I have to re-organize it all.

Let me explain.

The DMA transfer takes 7 cycles setup plus one cycle per byte transfer on our line DMA transfer.
Since we are transferring just one byte it takes 8 cycles per byte or 1 cycle per pixel, which is all good so far as long as you don't do other DMA transfers that interfere and that's where we are at.

More info on DMA usage:
and on the TRM:

So, on the current configuration the only window to do any other DMA work is on the horizontal retrace before the next DMA transfer at the end of the Back Porch or whatever time we have at the end of the vertical retrace.

Meaning I don't have enough bandwidth for the color byte transfer without interfering with the line DMA, yes I could put it in the Flash memory space but that's not the right solution.

I could try to circumvent it using free spokes and it might be posible, but the right way to do this is to redesign the whole project and make better use of the line DMA transfer, after all it can transfer 32 bits per cycle and I could bump up the clock frequency to double the bandwidth on top of that.

I could use a UDB using the 2 4 byte FIFOS in the to transfer 8 bytes in 11 cycles, or even using a Shift Register (ShiftReg) component directly with DMA transfers without the need to do one single 8bit control register. If you look at the DMA memory copy that happens during vertical refresh I'm transfering 64bytes per burst, so I could to the same and buffer them to allow other concurrent DMA transfers to not interfere. Also there is a Tab for setting up DMA priorities in the .cydwr file.

Without the redesign I don't have enough bandwidth with the SRAM and the UDB spoke, so I can't even do the scope demo since the WaveDAC component (for generating the signal uses DMA transfers too and that's causing all kinds of problems) I needed that to do a sawtooth wave to drive the horizontal time base at the vertical refresh frequency.

I could do the External memory since that spoke is unused and the ADC/DAC/Peripheral one is free too, but there is no real purpose on bodging it just to make something work.

So the only use it has would be to do a serial communication terminal since the USB spoke is free as well, or use it as is with the CPU updating the frame buffer without extra features, but again, there is no reason of following this path since we can do better.

The original purpose was to show the use of Verilog on this chip, and that was accomplished with the CPU free for 95.5% so it's still useful for many purposes.

The main thing is that this is a normal development cycle, you come up with a concept and if you hit a wall like this, you use what you learned to iterate. Of course documenting all of the iterations will take too much of my time, so I'm done with this tutorial.

I might update the Simple scope entry just using the component for the needed timing signals and using Analog components, but without the DMA frame buffer, to show how you could draw directly on the screen by pure hardware using the analog programable blocks. They even have an analog Mixer and you can put Hysteresis in the analog comparators.

I hope you got enough information about this pretty amazing chip.
« Last Edit: January 22, 2016, 02:42:47 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
9) Future expansions USB HID module to control it.
« Last Edit: January 22, 2016, 02:15:42 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
10) Future expansions Adding Analog inputs for a simple scope.

I will do this, but without the frame buffer because of my under use of the DMA capabilities.
Timing synchronous DMA transfers on the same spokes is not a simple task, but with the lessons learned we can look more into how to achieve optimal performance of the available resources. But that's an iterative process. Which is what a development cycle really is.
« Last Edit: January 22, 2016, 02:20:23 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
11) Future expansion, whatever I can think after doing all this.
« Last Edit: January 22, 2016, 02:16:17 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
One more just in case, whatever else doesn't fit I will reply on the thread and put links on the OP to those.

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Well, Part 2 is up in here:

I did reserve a couple of extra posts just in case but I think post 2 is the biggest post I'll need on this project.

If anyone has any questions about details left behind (hopefully none) let me know and I'll do my best to clarify what is going on.

Part 3 (doing a simple video pattern based on that newly video controller module might happen tomorrow, not sure)

Offline nickarsow

  • Newbie
  • Posts: 3
  • Country: bg
Hi Miguel,
Is it possible to use the project as an OSD....capture composite video signal and add the overlay on it?
Maybe a sync separator like LMH1881 or 1981 has to be used? Any thoughts?
Thanks in advance with best regards

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Hi Miguel,
Is it possible to use the project as an OSD....capture composite video signal and add the overlay on it?
Maybe a sync separator like LMH1881 or 1981 has to be used? Any thoughts?
Thanks in advance with best regards

I thought I replied last night. I don't think that is possible, this chip can't do video capture since the only ADCs it has are only in the 1 mega sample per second, which is not enough to be able to capture the composite video.

For that you need an FPGA with dedicated hardware as a good ADC that can handle the acquisition. Meaning it has to be able to sample probably 5 times faster than the color burst signal.

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 9977
  • Country: gb
    • Mike's Electric Stuff
And No, I don't get a commission from Cypress, not even free kits.
Write it up as an appnote and they might...
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs

Offline nickarsow

  • Newbie
  • Posts: 3
  • Country: bg
Hi Miguel,
Some misunderstanding.....a typo mistake. I mean to capture just the sync from the video signal by LMH1981, then use the sync by the PSoC for synchronization.
The video signal will be then mixed outside the PSoC.
« Last Edit: December 22, 2015, 04:20:25 AM by nickarsow »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Hi Miguel,
Some misunderstanding.....a typo mistake. I mean to capture just the sync from the video signal by LMH1981, then use the sync by the PSoC for synchronization.
The video signal will be then mixed outside the PSoC.

Check out this thread:

Not for the capture of the video but I did put some links relating to usage of the LM1881 if the sync signals were not available. But I was able to capture the sync from the device. I also did look into the LMH1981 and was planing to use that but once I was able to get the video I did put the project to sleep. Hopefully I revive it one of this days because it's all in a little box that keeps staring at me.

What I was doing in there is taking the sync signal and generating a HSync and VSync to convert the output to a VGA monitor.

Anyways, looking at this particular post

with the FPGA running at 50MHz (I think not sure) I was getting 50ns delay on the signal. I think this chip could perform the same if it clocked by an XTAL or some precise oscillator, not using the internal one because at high frequencies it has a lot of jitter.


Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
And No, I don't get a commission from Cypress, not even free kits.
Write it up as an appnote and they might...

Thanks! didn't even think about that.

Offline nickarsow

  • Newbie
  • Posts: 3
  • Country: bg
Hi Miguel,
I'm the designer of a lot of flight controllers for multicopters and planes ( http://arsovtech.com/?page_id=1502   and   https://pixhawk.org/modules/pixracer ) but we suffer from lack of a good OSD for our video transmissions . All we have as an OSD is the old MAX7456....horrible, high current consuming chip, with just a character set. I'm thinking of a graphical OSD, which will give us more capabilities.
We have a frozen project using STM32F4xx MCU, but I think PSoCs are more flexible for the purpose.

Offline skench

  • Contributor
  • Posts: 16
  • Country: gb
Hi Miguel,

Great work.

I have been religiously following your instructions but I think you have missed out a line in the following.

Symbol Label: Video Controller
Terminal Name, type:
    line_dma, Digital Output


Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Thanks, added it, couldn't put a longer description since that post is already at 20000 characters :)

I guess is a good thing I added images so that could be caught.

Also I need to expose all the values via a C API code so we can get the line counter or even change the video mode on the fly. I'll do that next (maybe tonight I have time for that), Also add more comments to the code, I did rush it towards the end and for example I don't explain what the newline register does (allows the vertical FSM to trigger)

Also my vertical and horizontal state registers only need 2 bits, I made them 4 wide by mistake, still should work but no need to waste a total of 4 bits. I'll update that as well.
« Last Edit: December 22, 2015, 08:13:34 AM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Added another part (3rd post) we can run it now and look at the HSync and VSync signals, modify the parameters and we are very close to display something.

Hooking it up to a VGA monitor in this state will make the monitor sync so you can see the resolution and frame rate using the OSD of your monitor, but nothing more since we are not driving the actual RGB pins yet.


I guess adding more output pins to check on the blank_n and dma signals would be nice. Maybe even the line counter to see how it changes.

Next up I'll use the line counter and those unused signals to display some colors, also I will add the RGB pins (3 per color).
« Last Edit: December 22, 2015, 10:15:42 PM by miguelvp »

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5549
  • Country: us
Played with the DMA Adjustment value after I did wire the VSync pin to the line_dma output.

The default I had was 16, at 25ns per pixel (inverse of 40MHz) it will potentially trigger the DMA 400ns before the line becomes visible.

Here are some captures (Top signal is the hsync out, Bottom is the line_dma out)
First one with the adjust at 0 (showing the full Horizontal Back Porch and allowing just 25ns on the rising edge to do the DMA or 0ns on the falling edge.
25ns*88 = 2.2us so it's spot on.

Second one with the adjust at 44 (half of the Back Porch) and it's bang on 1.1us as expected.

The DMA pulse is just one pixel clock (should really say 25ns in that capture, but with all that ringing is not measuring right) I blame the jumper cables between the board and the VGA R2R DAC being long and next to each other which I can't avoid right now.

Anyways, I can move the DMA adjustment at 25ns increments (or at pixel clock increments to be more precise) and decide if to trigger the DMA on rising or falling edge. max value can't exceed the back porch value in pixels the way I wrote the code it's active only on that region. But I will doubt the first pixel takes longer than 2.2us to get from memory to the register so it should be sufficient.

line_cnt seems to work as well, at least bit 0 and bit 1 does:

26.40us per bit 0 toggle tells me the lines are counting at 37.87878788 kHz spot on. Of course scope measurement values, there is no way i'm getting the exact frequency as the VGA specs with the internal oscillator.

Looking at when it doesn't count I get a total of  768 us vertical blanking, the spec is 739.4 us that is 28.6us with is a period since we count at 0 or it might end at 0 we have to subtract 26.4 from that, since the 768 is a visual measurement using cursors it's probably better than that.

Strange thing is that after deleting the generated files to make the archive, I'm getting a timing violation now (not apparent on the output it gives me the full 40MHz I want but it says I'm above the limit of 39.917 MHz) It wasn't happening before I got rid of the original generated code in order to make the zip file smaller on the 3rd post.
Edit: Never mind, the timing violation happened when I connected the line counter to the outputs. putting them back to the hsync and vsync the timing violation goes away.

« Last Edit: December 22, 2015, 08:55:38 PM by miguelvp »

Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo