Hello, I would like to get some comments of my questions about managing complexities and costs in this age of multi-core ARM and multi-cores DSPs. I have co-workers that have spent months working on the software for one of today's post powerful multi-core ARM+DSP processors. They tell me of the problems they have in getting things working and also the lack of documentation and the frustration they have when the processor vendor uses third party tools to configure the device because there are thousands of registers that need to be configured. So, we can get everything done on one chip but at an extremely high cost and the possibility that only a few engineers can keep the device running. I've almost jumped on board and started selecting one of these devices but stopped short at the increased life-cycle+support costs that I would pass onto the customer. My conclusion is to keep things manageable by using more two or more devices and link them together using their built-in high speed links.
Has anyone else came across this or thought about the increased cost of using these kind of devices?
Thanks,
Joe
I assume it's made by TI. Who else is pushing heavy-duty DSP + ARM these days?
The tool chain is more important than the processor, when it comes to minimising complexity.
You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.
Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.
See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools
I assume it's made by TI. Who else is pushing heavy-duty DSP + ARM these days?What about the top two smartphone chip makers, Qualcomm and Mediatek? And Nvidia as well, if you count GPGPU as DSP.
I assumed the OP was referring to an actual DSP, as opposed to a fast processor capable of doing DSP. I am not aware of DSPs in modern smartphones, except for maybe a highly specialized core here and there, e.g. in the audio subsystem.
In Snapdragon S4 (MSM8960 and newer) there are three QDSP cores, two in the Modem subsystem and one Hexagon core in the Multimedia subsystem. Modem cores are programmed by Qualcomm only, and only Multimedia core is allowed to be programmed by user.
The tool chain is more important than the processor, when it comes to minimising complexity.
You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.
Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.
See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools
I think you're right that tools are very important. Having worked with XMOS mcus for a few years now I can say that the tools are very good but only when they work. There are so many ways to break them, especially the compiler, that I don't think I'd ever recommend them for professional use outside of projects that have been using them and audio projects heavily based on their own code. I personally have reported 3 or 4 compiler bugs they didn't know about. They did fix them but it took months. In the mean time it was on me to find and implement a work around. When you get vague errors like "compiler error, exiting..." it can be tough to find out what the problem is. It takes alot of time. The best way to make an expensive project more expensive is drawing it out with bugs that aren't even yours.
However, I do like their hardware and it's fairly unique. There are alot of drawbacks though, and depending on the application they can be total showstoppers.
The tool chain is more important than the processor, when it comes to minimising complexity.
You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.
Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.
See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools
I think you're right that tools are very important. Having worked with XMOS mcus for a few years now I can say that the tools are very good but only when they work. There are so many ways to break them, especially the compiler, that I don't think I'd ever recommend them for professional use outside of projects that have been using them and audio projects heavily based on their own code. I personally have reported 3 or 4 compiler bugs they didn't know about. They did fix them but it took months. In the mean time it was on me to find and implement a work around. When you get vague errors like "compiler error, exiting..." it can be tough to find out what the problem is. It takes alot of time. The best way to make an expensive project more expensive is drawing it out with bugs that aren't even yours.
However, I do like their hardware and it's fairly unique. There are alot of drawbacks though, and depending on the application they can be total showstoppers.
Interesting. How long ago was that? What caused the problems to manifest themselves?
The reason I ask is that I've "kicked the tyres" with them on a small project, and I was amazed at how well the hw+sw worked. I had zero problems apart from where I misused the facilities. As far as I could tell they very simply and transparently "just did what it says on the tin" - unlike most other toolchains!
Before anybody makes a strawman argument, yes of course they aren't the solution to all problems. But they do have a unique set of advantages, can move into FPGA territory - and give the lie to needing an RTOS and presumptions about high-performance software = latency and unpredictability.
The tool chain is more important than the processor, when it comes to minimising complexity.
You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.
Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.
See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools
I think you're right that tools are very important. Having worked with XMOS mcus for a few years now I can say that the tools are very good but only when they work. There are so many ways to break them, especially the compiler, that I don't think I'd ever recommend them for professional use outside of projects that have been using them and audio projects heavily based on their own code. I personally have reported 3 or 4 compiler bugs they didn't know about. They did fix them but it took months. In the mean time it was on me to find and implement a work around. When you get vague errors like "compiler error, exiting..." it can be tough to find out what the problem is. It takes alot of time. The best way to make an expensive project more expensive is drawing it out with bugs that aren't even yours.
However, I do like their hardware and it's fairly unique. There are alot of drawbacks though, and depending on the application they can be total showstoppers.
Interesting. How long ago was that? What caused the problems to manifest themselves?
The reason I ask is that I've "kicked the tyres" with them on a small project, and I was amazed at how well the hw+sw worked. I had zero problems apart from where I misused the facilities. As far as I could tell they very simply and transparently "just did what it says on the tin" - unlike most other toolchains!
Before anybody makes a strawman argument, yes of course they aren't the solution to all problems. But they do have a unique set of advantages, can move into FPGA territory - and give the lie to needing an RTOS and presumptions about high-performance software = latency and unpredictability.
The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.
This is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that.
I think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.
There was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best.
You may have had no issues and that's great but they do have issues, despite the fact that alot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use alot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.
The software is a recurring cost of debugging, maintenance and upgrades. Unending costs...
Hmph. The replies so far are disappointing, ranging from "well duh!" to "your platform sucks, use mine instead." Speaking of which, I see no reason why you wouldn't call out the specific part in question. I assume it's made by TI. Who else is pushing heavy-duty DSP + ARM these days?
To some extent, the "well duh!" people are correct: there is a certain unavoidable complexity that comes with a part, and a project, such as you describe. The real issue is unnecessary complexity, and most complex projects have that in spades, especially when vendor-proprietary tools and/or source code are involved.
Unnecessary complexity is a paradox. Avoiding it requires a LOT of effort, often eschewing the "easy" tools/code provided by the vendor.
The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.
I wonder what caused that.
Was the function (or anything it called) recursive? No compiler can determine the recursion depth without knowing the worst case inputs.
Was the functions's source code available?
QuoteThis is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that.
Don't care; crap libraries are omnipresent Improving tools cannot help; you have to improve the developer.
QuoteI think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.
Strange and concerning.
It is entirely possible (and easy ) to get livelock in multiprocess systems.
Were the source and receiver processes on the same or different tiles?
Were you using the synchronous or asynchronous comms mechanisms?
Could ayschronous source and receiver processes be operating at the same loop interval, and at exactly the wrong phase so that there was a memory clash of some sort?
Was it dependent on optimisation level?
Could anything "odd" be seen in the machine code instructions related to interprocessor comms?
QuoteThere was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best.
My experience is with 14.3.0 and the single issue processors, so I wouldn't have seen that.
There are so many problems with C optimisations for any processor/standard/compiler that I'm not surprised about there being problems with xC - since it is built using traditional C compilers.
QuoteYou may have had no issues and that's great but they do have issues, despite the fact that a lot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use a lot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.
That's always a key decision!
Thanks for your points.
The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.
I wonder what caused that.
Was the function (or anything it called) recursive? No compiler can determine the recursion depth without knowing the worst case inputs.
Was the functions's source code available?
Yes the source code was available, it was written by us. It wasn't recursive either. I suspect it had to do with some C / XC interfacing issue but it was easier to throw stack space at it since there aren't going to be any issues with that. No matter what we do the analyzer tools completely fail with these sections of code. Eats any memory we give it and fails to heap limits. Their version of eclipse fails to start beyond 2GB heap.
QuoteQuoteThis is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that.
Don't care; crap libraries are omnipresent Improving tools cannot help; you have to improve the developer.Regardless of whether you care it is something they tout, seeing as there is no hardware support for common peripherals.
QuoteQuoteI think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.
Strange and concerning.
It is entirely possible (and easy ) to get livelock in multiprocess systems.
Were the source and receiver processes on the same or different tiles?
Were you using the synchronous or asynchronous comms mechanisms?
Could ayschronous source and receiver processes be operating at the same loop interval, and at exactly the wrong phase so that there was a memory clash of some sort?
Was it dependent on optimisation level?
Could anything "odd" be seen in the machine code instructions related to interprocessor comms?
This was an issue with synchronous communication across tiles using interfaces, the same as it is now but without an extra thread. I went back and looked and I had it slightly wrong. It was 4 hours to respond, and then it'd work normally. Like usual the data was passed by reference and memcpy'd since the compiler optimizes memcpy. This is now and always has been the proper way to do it.
It was totally independent of optimization level. Having tried to debug we were spitting data out over a couple uarts we monitored and time stamped. We got the message that it was making the call, and 4 hours later confirmation from the other thread. Neither thread was executing ANY code during that time. There were no other tasks trying to communicate with the threads(during debugging we had it cut down to just the 2 threads at one point) and the thread that took 1/6 a day to respond was busy about 800ns chunks every 10ms. That's it. The assembly didn't look abnormal. No warnings with -wall either. I suspect it was an issue with the scheduler but it was impossible at the time to verify. Unfortunately we don't get paid to find their bugs, we could have made some money charging by the hour.
QuoteQuoteThere was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best.
My experience is with 14.3.0 and the single issue processors, so I wouldn't have seen that.
There are so many problems with C optimisations for any processor/standard/compiler that I'm not surprised about there being problems with xC - since it is built using traditional C compilers.
Except in all their materials the numbers are inflated since you almost only get dual issue when hand writing assembly since the C almost never takes advantage of it. So 4000MIPS is really 2000MIPS. That's significant. The XS1 parts had a slight advantage in this way for lower core count designs since they could execute instructions up to 125MHz each "thread". The XS2 parts are limited to 100MHz max.
QuoteQuoteYou may have had no issues and that's great but they do have issues, despite the fact that a lot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use a lot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.
That's always a key decision!
Thanks for your points.
This wasn't even a large project, ~30k lines of code. These MCU's are really neat, somewhat unique, and relatively unknown. It's important not to look too deep into the marketing materials. We have no plans to stop using them but you can really find yourself in trouble if you are waiting for them to fix their tools and you're on a deadline without enough experience to know what to look at. However, not everyone will have issues just like you had none. We've had projects that had no real issues, we've also had designs that had to be totally redesigned due to issues though and that costs A LOT of time.
You can't memcopy across tile boundaries since tiles have separate memories. Memcopy only works between cores in a tile.
You can't memcopy across tile boundaries since tiles have separate memories. Memcopy only works between cores in a tile.
memcpy is optimized for the xcore mcus. http://www.xcore.com/viewtopic.php?t=5585
Anyway, I just wanted to mention these things because unlike PIC, AVR, ARM, PSOC there aren't too many people really familiar with the xcore MCU. Even fewer outside of audio projects.