Author Topic: Which DMA microcontroller engines you like the most?  (Read 13839 times)

0 Members and 1 Guest are viewing this topic.

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13761
  • Country: gb
    • Mike's Electric Stuff
Re: Which DMA microcontroller engines you like the most?
« Reply #25 on: January 17, 2017, 12:11:50 am »
That's odd - write speed will probably be affected more by variations in memory devices, but I'm usually only interested in reads. that speed looks a bit disappointing, but if it's anything like the library for the older versions, there's probably a lot of scope for optimisation.
The old SD card code was doing stupid stuff like checking for end of cluster on every single byte transferrred, and not using the SPI in 32 bit mode during sector reads, but easy to optimise without needing to understand much of their code. USB may take a bit more work however.

 

Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline bktemp

  • Super Contributor
  • ***
  • Posts: 1616
  • Country: de
Re: Which DMA microcontroller engines you like the most?
« Reply #26 on: January 17, 2017, 06:46:58 am »
I did some tests using an STM32F469 @ 180MHz for comparing HS-USB and 4bit SD card running at 24MHz:
Code: [Select]
SD read test...
reading 1 bytes...1 bytes done in  0.000s ( 2.224kByte/s)
reading 2 bytes...2 bytes done in  0.000s ( 5.710kByte/s)
reading 4 bytes...4 bytes done in  0.000s (11.322kByte/s)
reading 8 bytes...8 bytes done in  0.000s (22.843kByte/s)
reading 16 bytes...16 bytes done in  0.000s (45.687kByte/s)
reading 32 bytes...32 bytes done in  0.000s (91.374kByte/s)
reading 64 bytes...64 bytes done in  0.000s (182.215kByte/s)
reading 128 bytes...128 bytes done in  0.000s (363.372kByte/s)
reading 256 bytes...256 bytes done in  0.000s (716.332kByte/s)
reading 512 bytes...512 bytes done in  0.000s (1457.726kByte/s)
reading 1024 bytes...1024 bytes done in  0.000s (2506.266kByte/s)
reading 2048 bytes...2048 bytes done in  0.000s (4065.042kByte/s)
reading 4096 bytes...4096 bytes done in  0.001s (5882.355kByte/s)
reading 8192 bytes...8192 bytes done in  0.001s (7561.440kByte/s)
reading 16384 bytes...16384 bytes done in  0.002s (8834.902kByte/s)
reading 32768 bytes...32768 bytes done in  0.004s (7540.060kByte/s)
reading 65536 bytes...65536 bytes done in  0.008s (8323.583kByte/s)
reading 131072 bytes...131072 bytes done in  0.015s (8772.535kByte/s)
reading 262144 bytes...262144 bytes done in  0.028s (9023.303kByte/s)
reading 524288 bytes...524288 bytes done in  0.057s (9036.202kByte/s)
reading 1048576 bytes...1048576 bytes done in  0.112s (9160.118kByte/s)
reading 2097152 bytes...2097152 bytes done in  0.222s (9223.775kByte/s)
reading 4194304 bytes...4194304 bytes done in  0.442s (9257.087kByte/s)
reading 8388608 bytes...8388608 bytes done in  0.884s (9268.472kByte/s)
reading 16777216 bytes...16777216 bytes done in  1.766s (9275.356kByte/s)

USB read test...
reading 1 bytes...1 bytes done in  0.001s ( 1.247kByte/s)
reading 2 bytes...2 bytes done in  0.001s ( 2.488kByte/s)
reading 4 bytes...4 bytes done in  0.001s ( 4.901kByte/s)
reading 8 bytes...8 bytes done in  0.001s ( 9.990kByte/s)
reading 16 bytes...16 bytes done in  0.001s (20.006kByte/s)
reading 32 bytes...32 bytes done in  0.001s (40.064kByte/s)
reading 64 bytes...64 bytes done in  0.001s (79.617kByte/s)
reading 128 bytes...128 bytes done in  0.001s (159.235kByte/s)
reading 256 bytes...256 bytes done in  0.001s (315.258kByte/s)
reading 512 bytes...512 bytes done in  0.001s (636.132kByte/s)
reading 1024 bytes...1024 bytes done in  0.001s (1269.036kByte/s)
reading 2048 bytes...2048 bytes done in  0.001s (2395.210kByte/s)
reading 4096 bytes...4096 bytes done in  0.001s (3960.398kByte/s)
reading 8192 bytes...8192 bytes done in  0.001s (5869.408kByte/s)
reading 16384 bytes...16384 bytes done in  0.002s (8815.431kByte/s)
reading 32768 bytes...32768 bytes done in  0.003s (12061.822kByte/s)
reading 65536 bytes...65536 bytes done in  0.006s (10555.835kByte/s)
reading 131072 bytes...131072 bytes done in  0.011s (11769.038kByte/s)
reading 262144 bytes...262144 bytes done in  0.021s (12391.100kByte/s)
reading 524288 bytes...524288 bytes done in  0.040s (12781.792kByte/s)
reading 1048576 bytes...1048576 bytes done in  0.079s (12970.898kByte/s)
reading 2097152 bytes...2097152 bytes done in  0.157s (13033.883kByte/s)
reading 4194304 bytes...4194304 bytes done in  0.312s (13107.332kByte/s)
reading 8388608 bytes...8388608 bytes done in  0.624s (13121.314kByte/s)
reading 16777216 bytes...16777216 bytes done in  1.248s (13131.705kByte/s)


USB write test...
writing 1 bytes...1 bytes done in  0.001s ( 1.856kByte/s)
writing 2 bytes...2 bytes done in  0.001s ( 3.727kByte/s)
writing 4 bytes...4 bytes done in  0.001s ( 7.440kByte/s)
writing 8 bytes...8 bytes done in  0.001s (14.880kByte/s)
writing 16 bytes...16 bytes done in  0.001s (30.339kByte/s)
writing 32 bytes...32 bytes done in  0.001s (59.523kByte/s)
writing 64 bytes...64 bytes done in  0.001s (119.047kByte/s)
writing 128 bytes...128 bytes done in  0.001s (237.642kByte/s)
writing 256 bytes...256 bytes done in  0.001s (472.590kByte/s)
writing 512 bytes...512 bytes done in  0.001s (452.898kByte/s)
writing 1024 bytes...1024 bytes done in  0.001s (903.342kByte/s)
writing 2048 bytes...2048 bytes done in  0.001s (1792.115kByte/s)
writing 4096 bytes...4096 bytes done in  0.001s (3246.754kByte/s)
writing 8192 bytes...8192 bytes done in  0.001s (5449.594kByte/s)
writing 16384 bytes...16384 bytes done in  0.004s (4539.009kByte/s)
writing 32768 bytes...32768 bytes done in  0.004s (7191.014kByte/s)
writing 65536 bytes...65536 bytes done in  0.015s (4258.152kByte/s)
writing 131072 bytes...131072 bytes done in  0.016s (8192.004kByte/s)
writing 262144 bytes...262144 bytes done in  0.030s (8404.745kByte/s)
writing 524288 bytes...524288 bytes done in  0.060s (8464.493kByte/s)
writing 1048576 bytes...1048576 bytes done in  0.120s (8547.369kByte/s)
writing 2097152 bytes...2097152 bytes done in  0.241s (8502.092kByte/s)
writing 4194304 bytes...4194304 bytes done in  0.509s (8052.676kByte/s)
writing 8388608 bytes...8388608 bytes done in  0.995s (8235.363kByte/s)
writing 16777216 bytes...16777216 bytes done in  1.991s (8229.559kByte/s)

SD write test...
writing 1 bytes...1 bytes done in  0.001s ( 1.547kByte/s)
writing 2 bytes...2 bytes done in  0.001s ( 3.066kByte/s)
writing 4 bytes...4 bytes done in  0.001s ( 6.132kByte/s)
writing 8 bytes...8 bytes done in  0.001s (12.131kByte/s)
writing 16 bytes...16 bytes done in  0.001s (24.801kByte/s)
writing 32 bytes...32 bytes done in  0.001s (48.981kByte/s)
writing 64 bytes...64 bytes done in  0.001s (97.809kByte/s)
writing 128 bytes...128 bytes done in  0.001s (193.798kByte/s)
writing 256 bytes...256 bytes done in  0.001s (393.700kByte/s)
writing 512 bytes...512 bytes done in  0.190s ( 2.633kByte/s)
writing 1024 bytes...1024 bytes done in  0.130s ( 7.706kByte/s)
writing 2048 bytes...2048 bytes done in  0.006s (357.653kByte/s)
writing 4096 bytes...4096 bytes done in  0.006s (699.423kByte/s)
writing 8192 bytes...8192 bytes done in  0.007s (1175.261kByte/s)
writing 16384 bytes...16384 bytes done in  0.014s (1158.497kByte/s)
writing 32768 bytes...32768 bytes done in  0.007s (4348.419kByte/s)
writing 65536 bytes...65536 bytes done in  0.013s (4863.963kByte/s)
writing 131072 bytes...131072 bytes done in  0.069s (1859.141kByte/s)
writing 262144 bytes...262144 bytes done in  0.047s (5484.737kByte/s)
writing 524288 bytes...524288 bytes done in  0.284s (1800.256kByte/s)
writing 1048576 bytes...1048576 bytes done in  0.187s (5467.634kByte/s)
writing 2097152 bytes...2097152 bytes done in  0.372s (5505.334kByte/s)
writing 4194304 bytes...4194304 bytes done in  0.999s (4101.966kByte/s)
writing 8388608 bytes...8388608 bytes done in  1.779s (4605.802kByte/s)
writing 16777216 bytes...16777216 bytes done in  3.090s (5302.262kByte/s)
It was only a single measurement, but it was pretty constant when repeating the test. The irregularities during writing are probably wear leveling activities.
The speed varies a lot between different SD cards and USB drives, but the most important parameter is the block size: You need to write many sectors at once, otherwise it will be really slow. Especially for modern flash devices with large write pages, you need to read/write at least 4-32kBytes at once to nearly reach the maximum speed possible.
I haven't used the recent versions of Microchip's MSD code, but the older versions didn't make use of the read multiple sector commands efficiently.
Another catch when using USB drives and writing data to them: You need to send the SYNCHRONIZE CACHE command before unplugging, otherwise the last written data may get lost, because they could be still in the write buffer of the thumb drive. Both the code provided by Microchip and ST hadn't implemented this correctly.
If you need to write a continous stream of data to flash devices, don't use the newest, largest devices, but go for older, smaller ones. I had many issues with 8 and 16GB SD cards. They often paused writing for a couple of seconds (!) to do wear leveling. 1-4GB seem to be the best choice for embedded systems with limited amount of memory for write buffer.

The separately selectable data width for DMA is a nice feature in STM32 for transferring unaligned data packets to a 32bit target. But otherwise I prefer PIC24/PIC32, because its DMA is much easier to understand but very powerful because of the many trigger sources.
 
The following users thanked this post: hans, thm_w

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 5320
  • Country: gb
Re: Which DMA microcontroller engines you like the most?
« Reply #27 on: January 17, 2017, 08:18:44 am »
That's odd - write speed will probably be affected more by variations in memory devices, but I'm usually only interested in reads. that speed looks a bit disappointing, but if it's anything like the library for the older versions, there's probably a lot of scope for optimisation.
The old SD card code was doing stupid stuff like checking for end of cluster on every single byte transferrred, and not using the SPI in 32 bit mode during sector reads, but easy to optimise without needing to understand much of their code. USB may take a bit more work however.

FWIW, when I was fiddling about with the device code (my application is a streaming device so I needed to optimise device to host performance), you could increase the speed fairly significantly by increasing the size of the buffer to quite large sizes, using much more than the 4096 bytes I used in this example.

When I've used the LPC4370 as a streaming device in the past (204MHz Cortex M4F), their USB device library managed single endpoint speeds of ~250Mbps, and I could get it up to about 320Mbps by bonding endpoints.
 

Offline Carl47D

  • Newbie
  • Posts: 9
Re: Which DMA microcontroller engines you like the most?
« Reply #28 on: January 17, 2017, 05:31:07 pm »
I particularly like the DMA of the PSoC 5LP. They even make it easy to setup with a DMA wizard. Another neat thing is the DMA component has status pins, which makes it possible for the programmable logic to trigger DMA reads and writes completely outside of the software. (This allows you to do cool stuff like, say, move data from an I2C sensor to SPI flash when the sensor pulls an IRQ line, all with basically no CPU intervention.)
Second that. Being able to trigger DMA with any signal you can come in hardware (even being able to create new hardware signals in custom logic), and then getting a DMA transfer signal in hardware which you can use to do any other things in hardware is really powerful. And IMHO the API for it is easy to understand, and Cypress has some really good application notes. They even explain the tricky things (such as self-modifying DMA - having one of the DMA channels modifying another DMA channels registers is quite powerful, e.g. to be able to use multiple target buffers without any software intervention).

I like more the PSoC4 DMA API than the 5LP, and DMA on PSoC4 works on sleep mode IIRC.
 

Offline Lukas

  • Frequent Contributor
  • **
  • Posts: 412
  • Country: de
    • carrotIndustries.net
Re: Which DMA microcontroller engines you like the most?
« Reply #29 on: January 17, 2017, 06:18:48 pm »
The ultimate question: Can you build a turing machine using just the DMA peripheral DMA'ing into its control registers?
 

Offline rea5245

  • Frequent Contributor
  • **
  • Posts: 581
  • Country: us
Re: Which DMA microcontroller engines you like the most?
« Reply #30 on: August 29, 2020, 07:58:59 pm »
The ultimate question: Can you build a turing machine using just the DMA peripheral DMA'ing into its control registers?

These guys claim that the PIC32's DMA system is Turing-complete: http://people.ece.cornell.edu/land/courses/ece4760/PIC32/index_DMA_weird_machine.html
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14513
  • Country: fr
Re: Which DMA microcontroller engines you like the most?
« Reply #31 on: August 31, 2020, 08:07:38 pm »
I second the greatness of the PIC32 DMA.
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2227
  • Country: 00
Re: Which DMA microcontroller engines you like the most?
« Reply #32 on: September 01, 2020, 06:08:43 am »
The problem is the tools. They use ancient GCC compilers (XC16/32) that are forked and aren't very well optimized.
(As opposed to sending their modifications upstream and use newer versions.)
 

Online coppice

  • Super Contributor
  • ***
  • Posts: 8684
  • Country: gb
Re: Which DMA microcontroller engines you like the most?
« Reply #33 on: September 01, 2020, 09:25:25 am »
The problem is the tools. They use ancient GCC compilers (XC16/32) that are forked and aren't very well optimized.
(As opposed to sending their modifications upstream and use newer versions.)
Recent versions of GCC work well for complex cores, like x86-64 ones. However, GCC 10 often produces slower code than GCC 3.2 (a high spot for its simple core performance) with simpler cores. Maybe they have a good reason to stick with ancient versions of GCC.
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2227
  • Country: 00
Re: Which DMA microcontroller engines you like the most?
« Reply #34 on: September 01, 2020, 10:09:04 am »
The problem is the tools. They use ancient GCC compilers (XC16/32) that are forked and aren't very well optimized.
(As opposed to sending their modifications upstream and use newer versions.)
Recent versions of GCC work well for complex cores, like x86-64 ones. However, GCC 10 often produces slower code than GCC 3.2 (a high spot for its simple core performance) with simpler cores. Maybe they have a good reason to stick with ancient versions of GCC.

Yes, the reason is, they (Microchip) crippled GCC (specifically the optimization settings) and they charge you for the non-crippled version.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf