Author Topic: How are math.h library functions like sqrt, asinf, tanf implemented in CoIDE  (Read 17814 times)

0 Members and 1 Guest are viewing this topic.


Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Quote
FreeBSD math library
It looks to me like the newlib nano libraries and the freebsd libraries are essentially the same old Sun code.
 

Offline kodyTopic starter

  • Contributor
  • Posts: 34
  • Country: ca
 

Offline kodyTopic starter

  • Contributor
  • Posts: 34
  • Country: ca
@dannyf- could you please expand more on how the delisting and the counting of the instruction cycles can be done?

Quote
is it not possible if we only have the compiler?

Yes. By looking at the delisting and counting up the instruction cycles.

Quote
I mean, without the actual hardware/microcontroller.

A far better approach is to get the actual hardware.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19507
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
@dannyf- could you please expand more on how the delisting and the counting of the instruction cycles can be done?

Especially if the processor has a cache :(
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
There are several options:
- Use a high-frequency counter. The simplest, and works on any chip. On many MCUs you can have timers ticking at the CPU frequency or f/2.
- Use tracing. Both the MCU and your development tools have to support this.
- Use performance counters. Many MCUs have these nowadays, and they can count instruction cycles, memory access cycles and all kinds of things.

The ARMv7-M DWT (Data Watchpoint and Trace unit) has both a cycle count timer and performance counters that can be used by software running on the MCU. Although it is optional, I would expect that all Cortex-M4 MCUs have it.

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1576
  • Country: de
Especially if the processor has a cache :(
Obviously, a real measurement will always differ due to interrupts, pipeline effects, branch prediction, cache, waitstates, DMA blocking the bus or RAM etc.
Anyway, looking at the code will make it possible to better estimate the number of cycles needed. As stated above, even looking at C code for a "fast" tangens implementation allows to say this will need > 100 cycles. With the actual source code the prediction will be better and with the ASM code, it can be quite accurate - if you don't consider the complex runtime effects discussed above.
Trying is the first step towards failure - Homer J. Simpson
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19507
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Especially if the processor has a cache :(
Obviously, a real measurement will always differ due to interrupts, pipeline effects, branch prediction, cache, waitstates, DMA blocking the bus or RAM etc.
Anyway, looking at the code will make it possible to better estimate the number of cycles needed. As stated above, even looking at C code for a "fast" tangens implementation allows to say this will need > 100 cycles. With the actual source code the prediction will be better and with the ASM code, it can be quite accurate - if you don't consider the complex runtime  discussed above.

Even 20 years ago, measurements on an i486 with its tiny cache doing nothing else, there was a measured 10:1 difference between mean and maximum times. Modern processors have much faster clocks,but DRAM memory latency hasn't changed. Processors have much bigger caches and are more dependent on them to reduce the average memory latency. Naturally caches cannot change the maximum latency.

Hence the maximum:mean ratio has increased significantly and predicted execution times are even less valid than before.

Remember the truism "cache is the new RAM, RAM is the new disk"
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1576
  • Country: de
The situation is a little different on microcontrollers. Some still don't have any cache at all, some have very simple implementations, only the high end controllers have complex ones.
Generally, the internal SRAM is usually not cached. Cache is mainly needed to improve performance when running from flash. Note that fetching instructions from flash is a bottleneck for most faster microcontrollers. They usually use a burst read to fill a whole cache line but if there are are lot of branches and/or bad branch prediction, a cache miss can be a big performance hit.
Trying is the first step towards failure - Homer J. Simpson
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3240
  • Country: gb
@dannyf - is it not possible if we only have the compiler? I mean, without the actual hardware/microcontroller.

You could use a simulator if one was available, but I don't think CoIDE includes this functionality?  You could use the demo version of Keil etc. if you code fits into the space limitations.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19507
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
The situation is a little different on microcontrollers. Some still don't have any cache at all, some have very simple implementations, only the high end controllers have complex ones.
Generally, the internal SRAM is usually not cached. Cache is mainly needed to improve performance when running from flash. Note that fetching instructions from flash is a bottleneck for most faster microcontrollers. They usually use a burst read to fill a whole cache line but if there are are lot of branches and/or bad branch prediction, a cache miss can be a big performance hit.

Yes, as I implied in my first response.

OTOH, many microcontrollers have already surpassed the i486 in terms of cache. The current microcontroller I am using, in a Zynq FPGA, is a dual-core Arm-A9, each core having 32K=32K I+D cache.  (The cheapest ARM is, IIRC costs <$1) That trend will continue, although there will always be some MCUs that don't have/need cache.

More interestingly, some actively avoid cache due to its "poor" behaviour in hard realtime systems, e.g. the very small and cheap XMOS processors with 2-10 cores. http://www.digikey.co.uk/product-search/en/integrated-circuits-ics/embedded-microcontrollers/2556109?k=xmos

Those XMOS processors are the only ones I know where the compiler/IDE guarantees the execution time. With all other processors, all bets are off.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline gmb42

  • Frequent Contributor
  • **
  • Posts: 294
  • Country: gb
In a post, here, RedHat explain how they've improved the performance of some math functions in glibc.  Eventually I suppose these will filter down to newlib\nanolib etc.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf