Author Topic: GHz Microcontrollers? You Bet! (Read 23393 times)

SiliconWizard · « **Reply #100 on:** October 06, 2019, 02:36:43 pm »

Quote from: Siwastaja on October 06, 2019, 07:09:36 am

Interrupt jitter is a total non-issue for motor control, on any modern ARM Cortex anything, I would dare to say even including Cortex-M0, maybe excluding some very special high-tech micromachining applications. Maximum current change rate defined by motor inductance, and physical inertia are orders of magnitude more.

Agree.

SiliconWizard · « **Reply #101 on:** October 06, 2019, 02:44:22 pm »

Quote from: coppice on October 06, 2019, 02:08:47 pm

Quote from: tggzzz on October 06, 2019, 12:44:07 pm
Maybe people not used to "thinking parallel" and in terms of "events" and "messages" would find it more difficult to get started.
Most people who really struggle with parallel thinking on day one never seem to get very good at it. Its something that really seems to divide people in capable and non-capable groups.

Yes. Actually, thinking "parallel" is not very natural for most people. Although a myriad of things are happening in parallel in our brain, they are happening completely unconsciously. We have a very hard time *consciously* thinking about several different things at the same time. When we think of a number of steps to accomplish (again, on a conscious level), we are mostly single-task.

Parallel processing is thus hard to grasp fo those that have a hard time abstracting things. I think it needs good abstraction capabilities, and would venture that both are strongly related.

Nominal Animal · « **Reply #102 on:** October 07, 2019, 05:37:02 am »

FWIW, the Teensy 4.0 microcontroller can be overclocked to 1008/996 MHz, but cooling is a problem; that thread shows a setup with a smallish heatsink (~ 1" cubed) where it seems to run fine at 960 MHz. This has the i.MX RT1062 "crossover processor".

Siwastaja · « **Reply #103 on:** October 07, 2019, 06:42:14 am »

Don't know about Teensy, but at least for STM32H743, the issue isn't the core power dissipation at all - it's the fact that people tend want to default to using the integrated linear Vcore regulator, to minimize part count. With Vcore=1.2V and Vin=3.3V, this increases the consumption by 2.75x. The integrated linear regulator heats up the whole chip!

IIRC (and don't quote me on this), STM32H743 is specified for a typical consumption of about 400mA running most peripherals and CPU at 400MHz (the current draw is almost exclusively by the Vcore domains). This means, the consumption (and thus, heating) is 0.48W when ran from an external 1.2V source, or 1.32W from a 3V3 source. Quite a difference!

So if you are looking for power savings, or running it cool, or overclock it, do what I did and supply Vcore from a separate switch-mode 1V2 converter.

Nominal Animal · « **Reply #104 on:** October 07, 2019, 07:50:43 am »

Quote from: Siwastaja on October 07, 2019, 06:42:14 am

Don't know about Teensy

Teensy 4.0 has a TLV75733P for the 3.3V rail. i.MX RT1062 does have four LDOs (3.0V, 2.5V and two 1.1V). However, looking at the Teensy 4.0 schematic and the reference manual (page 1064, 12.4.2.4 Examples of External Power Supply Interface), the 2.5V LDO and one of the 1.1V LDOs are only used for the PLLs and crystal; a DC-DC converter (internal to i.MX RT1062) with external inductor and capacitors is used to provide the main 1.1V rail (from the 3.3V rail). The 3.0V LDO is only used for USB, and the final 1.1V LDO for 32 kHz RTC and possibly watchdog timer.

They say in that thread that higher clock frequencies work, but only for a while, so it looks like overall heat dissipation is the first limiting factor for Teensy 4.0. I could be utterly wrong, of course; I'm not an EE.

SiliconWizard · « **Reply #105 on:** October 07, 2019, 12:41:40 pm »

At such elevated clock frequencies, I wouldn't be too troubled having to use some kind of heatsink for the MCU.

Oh, and of course, regarding internal LDOs... using them really depends on the application. Note that some MCUs don't give you access to the Vcore supply from an external pin, in which case, you obviously don't have a choice. But if you have a choice, and can "afford" a buck converter (in terms of cost/area/part count/...), this is often a much better choice regarding power consumption (and of course, dissipation), as long as you select it properly. One downside (apart from the above) is that it will take longer to stabilize than a typical LDO. May or may not matter. Depends!

Sal Ammoniac · « **Reply #106 on:** October 08, 2019, 10:03:55 pm »

Quote from: coppice on October 06, 2019, 02:08:47 pm

Most people who really struggle with parallel thinking on day one never seem to get very good at it.

Have them take an FPGA class. That'll either straighten them out in a hurry, or weed them out.

Nominal Animal · « **Reply #107 on:** October 09, 2019, 06:00:56 am »

From my days using Macromedia Director (late nineties), and a course on Shockwave I held in 1999, I know that event-based approach works well for learning purposes. Technically, the programming language used, Lingo, is single-threaded, but the way scripts could be attached to objects, makes the object-oriented paradigm easy to understand. It also works well for introducing parallelism, as it appears that multiple of those scripts are run in parallel in Shockwave (even though they are really run sequentially in a specific order).

I don't think parallel processing is hard to understand, since many things happen in parallel in real life. You do need some good analogs to explain how scripts and machine code is executed, to give the learners a good intuitive grasp, first.

It is asynchronous multiprocessing that people struggle with, in my experience. (You know, like queues in front of several cashiers. Or multiple lanes in a traffic jam. Some people weave, as they do not understand how far the interference patterns reach.)

For example, I basically never use OpenMP, because it is really only useful for synchronous parallelism. It is too restrictive/inefficient for my use cases.

In distributed computing, MPI is used for passing information between threads and processes. It has excellent features for asynchronous communication, but the majority of "MPI experts" recommend not using them, because they cannot grok the asynchronicity themselves! I suppose that is like suggesting all FPGA designs should have exactly one global master clock, with no asynchronous clock zones? And no full-duplex transfers, because it's too complex for us humans to handle; such designs may deadlock.

I use exclusively asynchronous I/O between distributed processes, because that is the only way to do both communication and computation in parallel. As I've mentioned before, I really hate how all current distributed molecular dynamics simulators switch between the two sequentially, wasting time, rather than doing the communications in parallel with computation. Sure, it is a bit harder, because you need to split the computation into parts you can do without the transferred information and do those first. Or even keep the information transfers one step ahead of the computation, which is harder still. But, if I can do that, it can't be that hard.

GeorgeOfTheJungle · « **Reply #108 on:** October 09, 2019, 07:10:11 am »

This reminds me of...

tggzzz · « **Reply #109 on:** October 09, 2019, 07:23:26 am »

Quote from: Nominal Animal on October 09, 2019, 06:00:56 am

It is asynchronous multiprocessing that people struggle with, in my experience. (You know, like queues in front of several cashiers. Or multiple lanes in a traffic jam. Some people weave, as they do not understand how far the interference patterns reach.)

For example, I basically never use OpenMP, because it is really only useful for synchronous parallelism. It is too restrictive/inefficient for my use cases.

In distributed computing, MPI is used for passing information between threads and processes. It has excellent features for asynchronous communication, but the majority of "MPI experts" recommend not using them, because they cannot grok the asynchronicity themselves! I suppose that is like suggesting all FPGA designs should have exactly one global master clock, with no asynchronous clock zones? And no full-duplex transfers, because it's too complex for us humans to handle; such designs may deadlock.

That all rings true, but I haven't use OpenMP nor MPI because it wasn't relevant to my application domains.

The deadlock/livelock issue is always going to be a problem, especially where no one individual comprehends the entire system.

The only resolution (not solution) to that is liberal use of timeouts when a reply is required or a transmit queue might become full. Other techniques welcome

Nominal Animal · « **Reply #110 on:** October 09, 2019, 08:28:38 am »

Quote from: tggzzz on October 09, 2019, 07:23:26 am

The deadlock/livelock issue is always going to be a problem, especially where no one individual comprehends the entire system.

Kinda sorta yeah. For humans, precedence/priority rules work, but they're too complex to implement for computing hardware.

OpenMPI, MPICH, and MVAPICH all have a separate I/O/management thread per parallel process per machine, which handles the sending and receiving of asynchronous messages. Each message is specific to a communicator (handle specifying a group of processes), and has a data type and a tag. If you use the tags properly, all messages a process sends, and all messages it receives, via a specific communicator, are handled in parallel. There is no risk of deadlock or livelock, since there is no lock, nor any expectation or assumptions about the order the messages arrive/are sent. (In fact, they can very well be sent in parallel too, if they are larger than a single packet on packet-switched networks like IB or IP.)

It really does boil down to proper use of those tags, and getting a grip on how it really works when it works right. Not at all that complex. Which makes it that more infuriating how some "experts" decry it is "dangerous", just because they don't grasp it right themselves.

Quote from: tggzzz on October 09, 2019, 07:23:26 am

The only resolution (not solution) to that is liberal use of timeouts when a reply is required or a transmit queue might become full.

Well, you need those to detect unexpected problems, which can always occur in real life. I consider them complementary to a robust design; a "but if a situation occurs where these assumptions do not hold" -catch clause.

tggzzz · « **Reply #111 on:** October 09, 2019, 10:06:36 am »

Quote from: Nominal Animal on October 09, 2019, 08:28:38 am

Quote from: tggzzz on October 09, 2019, 07:23:26 am
The deadlock/livelock issue is always going to be a problem, especially where no one individual comprehends the entire system.
Kinda sorta yeah. For humans, precedence/priority rules work, but they're too complex to implement for computing hardware.

OpenMPI, MPICH, and MVAPICH all have a separate I/O/management thread per parallel process per machine, which handles the sending and receiving of asynchronous messages. Each message is specific to a communicator (handle specifying a group of processes), and has a data type and a tag. If you use the tags properly, all messages a process sends, and all messages it receives, via a specific communicator, are handled in parallel. There is no risk of deadlock or livelock, since there is no lock, nor any expectation or assumptions about the order the messages arrive/are sent. (In fact, they can very well be sent in parallel too, if they are larger than a single packet on packet-switched networks like IB or IP.)

It really does boil down to proper use of those tags, and getting a grip on how it really works when it works right. Not at all that complex. Which makes it that more infuriating how some "experts" decry it is "dangerous", just because they don't grasp it right themselves.

Also consider deadlock/livelock that isn't related to the messaging mechanism, but to the applications themselves. That is a user design error, but if different companies produce different parts of the application, then no one person can comprehend the entire system.

Quote

Quote from: tggzzz on October 09, 2019, 07:23:26 am
The only resolution (not solution) to that is liberal use of timeouts when a reply is required or a transmit queue might become full.
Well, you need those to detect unexpected problems, which can always occur in real life. I consider them complementary to a robust design; a "but if a situation occurs where these assumptions do not hold" -catch clause.

Agreed. That machine I didn't know existed in a company I've never heard of just died or became very slow...

coppice · « **Reply #112 on:** October 09, 2019, 11:11:54 am »

Quote from: Nominal Animal on October 09, 2019, 08:28:38 am

Quote from: tggzzz on October 09, 2019, 07:23:26 am
The only resolution (not solution) to that is liberal use of timeouts when a reply is required or a transmit queue might become full.
Well, you need those to detect unexpected problems, which can always occur in real life. I consider them complementary to a robust design; a "but if a situation occurs where these assumptions do not hold" -catch clause.

Did you really mean complementary? Dealing with the unexpected is at the very heart of creating robust designs.

Nominal Animal · « **Reply #113 on:** October 09, 2019, 11:46:35 am »

Quote from: tggzzz on October 09, 2019, 10:06:36 am

Also consider deadlock/livelock that isn't related to the messaging mechanism, but to the applications themselves. That is a user design error, but if different companies produce different parts of the application, then no one person can comprehend the entire system.

Yes; I skipped that in an effort to keep my responses from being so darn long. I know some people hate my walls of text. Sorry!

Deadlock and livelock situations inherently involve more than one lock, with the same sets of users (the code doing the locking). It is an interesting topic, and very important in practical parallel applications and services, but not directly related to or associated with asynchrony. "Grab", "try", "hold", and "release" are clear and easily understood concepts when dealing with mutexes; others are needed for rwlocks, semaphores, and condition variables. But, as a whole, nontrivial locking is a subject that can be handled separately.

For example, when writing multi-threaded Python code -- and you will want to if you use Gtk+ or Qt, and do any kind of communication or significant computation in the same process, without freezing the user interface --, you can use Queue, a synchronous queue class between threads. Now, the most common Python interpreters only execute one thread at a time, but it does not mean threads have no benefits in Python. Using blocking I/O with Python threads (Threading, or GTK+ or Qt threads) makes a lot of sense, for both code simplicity, and efficiency, as the Python threads release the interpreter lock for the duration of the blocking call; essentially, such code works as if multiple Python threads did execute at the same time. For heavy computation, you'll want to use a separate processes -- the multiprocessing module -- to make use of more than one core for concurrent computation. With this approach, your code may not have any explicit locks, and introducing locking issues is a matter of not requiring a specific message order in Queues. (Combining Arduino and Python to create an user interface for an USB-connected microcontroller is one of my long-term projects; I'd like to write a tutorial about it.)

I personally do not have experience in teaching how to avoid deadlock/livelock in multi-lock, multi-party systems. I'm stupid enough to distrust locking schemes that I don't understand, so I tend to rework the structure to avoid the problem entirely. I am aware of the locking issues in the Linux kernel, and how they evolved (from the time when there was just one Big Kernel Lock), but other than how to analyze such schemes with the help of tools like Graphviz graphs, and how to simplify such locking schemes, I don't know much. (I know nothing about current CS research about locking schemes, for example; and haven't used an FPGA yet.)

A dual-core (heterogenous crossover) processor like the i.MX RT1170 is not something I'd consider so complex that locking would be particularly susceptible to deadlock/livelock, especially since their tasks would be rather separated. In particular, I believe most use cases would involve atomics instead; and that one of the cores would in most use cases never hold more than one lock at a time, which is one of the cases where lock analysis and avoiding deadlock/livelock is easy.

Quote from: coppice on October 09, 2019, 11:11:54 am

Quote from: Nominal Animal on October 09, 2019, 08:28:38 am
Quote from: tggzzz on October 09, 2019, 07:23:26 am
The only resolution (not solution) to that is liberal use of timeouts when a reply is required or a transmit queue might become full.
Well, you need those to detect unexpected problems, which can always occur in real life. I consider them complementary to a robust design; a "but if a situation occurs where these assumptions do not hold" -catch clause.
Did you really mean complementary? Dealing with the unexpected is at the very heart of creating robust designs.

I was referring to liberal timeouts, the ones that do not have an immediately obvious purpose. (Remember, the context is avoiding deadlock/livelock, things that should not happen if the system is working as designed.)

Timeouts and checking whether a buffer has enough space, are an integral, low-level part of all robust designs, yes.

Even a robust design makes assumptions about the context, and what should happen in what order. Typically, this is reflected in how some problems can be worked around, and how some problems are fatal. These assumptions are inherent in the design. Remember, a computer program cannot really deal with anything truly unexpected: all you can do, is check whether each operation succeeds or fails, and be prepared for their failure. (And, obviously, one has to remember that just because a send() succeeded, does not mean the data has reached the other end, or is even on the wire yet; and so on.)

As an analog for liberal timeouts, consider a C function checking if a pointer is NULL, even when calling that function with a NULL pointer makes absolutely no sense, and you're sure your code never calls it with a NULL pointer. Similarly, a networked service might have a timeout for the entire handshake-and-authorization process, so that if it does not happen within say half a minute, the connection is dropped (or moved to a tar bucket as hostile), even though no error has occurred.

The "complementary" bit I mentioned, is writing defensive code, that has extra checks and liberal timeouts, with the purpose of catching errors in the assumptions inherent in the design. Typically, they catch programming errors, and design errors like a deadlock/livelock-susceptible locking scheme. (Unfortunately, POSIX pthreads does not have a function to check if the current thread is already holding a mutex, without affecting the state of that mutex. If there was, sprinkling such checks to code that is not supposed to be called with certain locks held, would belong to this category too.) For example, the most commonly used code path might check twice that the connection handle is still valid, and many programmers feel that the second check is unnecessary. However, I may have put it there deliberately, because I believe it likely that an addition is made in the future that adds a secondary code path that shares the latter check, and I want to catch the most likely errors the implementors of that secondary code path writers will do. (A typical reason for such is that I've seen similar cases leading to that sort of a bug.)
Others call it paranoia, but I've found it a very useful approach in security-sensitive situations.

coppice · « **Reply #114 on:** October 09, 2019, 12:05:36 pm »

Quote from: Nominal Animal on October 09, 2019, 11:46:35 am

Quote from: coppice on October 09, 2019, 11:11:54 am
Quote from: Nominal Animal on October 09, 2019, 08:28:38 am
Quote from: tggzzz on October 09, 2019, 07:23:26 am
The only resolution (not solution) to that is liberal use of timeouts when a reply is required or a transmit queue might become full.
Well, you need those to detect unexpected problems, which can always occur in real life. I consider them complementary to a robust design; a "but if a situation occurs where these assumptions do not hold" -catch clause.
Did you really mean complementary? Dealing with the unexpected is at the very heart of creating robust designs.
I was referring to liberal timeouts, the ones that do not have an immediately obvious purpose. (Remember, the context is avoiding deadlock/livelock, things that should not happen if the system is working as designed.)

Timeouts and checking whether a buffer has enough space, are an integral, low-level part of all robust designs, yes.

Even a robust design makes assumptions about the context, and what should happen in what order. Typically, this is reflected in how some problems can be worked around, and how some problems are fatal. These assumptions are inherent in the design. Remember, a computer program cannot really deal with anything truly unexpected: all you can do, is check whether each operation succeeds or fails, and be prepared for their failure. (And, obviously, one has to remember that just because a send() succeeded, does not mean the data has reached the other end, or is even on the wire yet; and so on.)

As an analog for liberal timeouts, consider a C function checking if a pointer is NULL, even when calling that function with a NULL pointer makes absolutely no sense, and you're sure your code never calls it with a NULL pointer. Similarly, a networked service might have a timeout for the entire handshake-and-authorization process, so that if it does not happen within say half a minute, the connection is dropped (or moved to a tar bucket as hostile), even though no error has occurred.

The "complementary" bit I mentioned, is writing defensive code, that has extra checks and liberal timeouts, with the purpose of catching errors in the assumptions inherent in the design. Typically, they catch programming errors, and design errors like a deadlock/livelock-susceptible locking scheme. (Unfortunately, POSIX pthreads does not have a function to check if the current thread is already holding a mutex, without affecting the state of that mutex. If there was, sprinkling such checks to code that is not supposed to be called with certain locks held, would belong to this category too.) For example, the most commonly used code path might check twice that the connection handle is still valid, and many programmers feel that the second check is unnecessary. However, I may have put it there deliberately, because I believe it likely that an addition is made in the future that adds a secondary code path that shares the latter check, and I want to catch the most likely errors the implementors of that secondary code path writers will do. (A typical reason for such is that I've seen similar cases leading to that sort of a bug.)
Others call it paranoia, but I've found it a very useful approach in security-sensitive situations.

You seem to have a less robust notion of robust design than me.

Nominal Animal · « **Reply #115 on:** October 09, 2019, 12:20:18 pm »

Quote from: coppice on October 09, 2019, 12:05:36 pm

You seem to have a less robust notion of robust design than me.

I tend to use "robust" for any check/workaround/expectation I can immediately show a causal chain for, and "paranoid"/"defensive" for the ones I cannot.

I use "robust design" for designs that cater to all known causal chains. A design that does not consider known causal chains is "fragile", and a design that does not consider all practical causal chains is "buggy".

tggzzz · « **Reply #116 on:** October 09, 2019, 04:05:25 pm »

Quote from: Nominal Animal on October 09, 2019, 11:46:35 am

Quote from: tggzzz on October 09, 2019, 10:06:36 am
Also consider deadlock/livelock that isn't related to the messaging mechanism, but to the applications themselves. That is a user design error, but if different companies produce different parts of the application, then no one person can comprehend the entire system.
Yes; I skipped that in an effort to keep my responses from being so darn long. I know some people hate my walls of text. Sorry!

I can deal with that; your responses are consise and relevant

Quote

Deadlock and livelock situations inherently involve more than one lock, with the same sets of users (the code doing the locking). It is an interesting topic, and very important in practical parallel applications and services, but not directly related to or associated with asynchrony.

Yes indeed.

Quote

I personally do not have experience in teaching how to avoid deadlock/livelock in multi-lock, multi-party systems. I'm stupid enough to distrust locking schemes that I don't understand, so I tend to rework the structure to avoid the problem entirely.

My attitude too. Unfortunately on its own it doesn't always scale; consider the telecoms system or the financial system (shudder!).

Quote

...
The "complementary" bit I mentioned, is writing defensive code, that has extra checks and liberal timeouts, with the purpose of catching errors in the assumptions inherent in the design. Typically, they catch programming errors, and design errors like a deadlock/livelock-susceptible locking scheme. (Unfortunately, POSIX pthreads does not have a function to check if the current thread is already holding a mutex, without affecting the state of that mutex. If there was, sprinkling such checks to code that is not supposed to be called with certain locks held, would belong to this category too.) For example, the most commonly used code path might check twice that the connection handle is still valid, and many programmers feel that the second check is unnecessary. However, I may have put it there deliberately, because I believe it likely that an addition is made in the future that adds a secondary code path that shares the latter check, and I want to catch the most likely errors the implementors of that secondary code path writers will do. (A typical reason for such is that I've seen similar cases leading to that sort of a bug.)
Others call it paranoia, but I've found it a very useful approach in security-sensitive situations.

I agree, and that's even without considering two other points:

you can get deadlock/livelock without having mutexes. Merely one example: different nodes on a token ring network have to agree that there is exactly one token circulating - easy to get wrong. Merely another example: the "split brain problem" where a distributed high availability system accidentally partitioned, and subsequently the partitions have to re-merge. (Come to think about it, those two have many similarities!)
POSIX threads couldn't be written in C, until very recently

Harjit · « **Reply #117 on:** October 27, 2019, 06:05:40 pm »

Anyone here using the iMXRT in a 0.65mm BGA package? If so, how are you soldering it?

From looking at the Evaluation board design files, the iMXRT10xx in the BGA package is routed on four layers. Compared to the STM32H743, one thing not going for the iMXRT10xx based on the hardware design kit is that it needs 33+ decoupling capacitors! That is a lot of real estate.

FWIW, there is a 0.8mm pitch BGA for the iMXRT1064. Look for the part with the industrial version of the part - iMXRT1064IEC.

NorthGuy · « **Reply #118 on:** October 27, 2019, 11:39:11 pm »

If there were no restrictions, I would do everything in FPGA. You build a module you like and you just use it - you don't have to live with vendor's idiosyncrasies, you don't need to interrupt other modules, and other modules don't interrupt you. You can run at any speed. It's easy to scale - if you can control 1 motor, you immediately can control 50 of them, or 100. It's easy to reuse - you just connect wires or FIFOs between modules. Unfortunately, high price and slow synthesis firmly stand on the way. May be it gets there in the future, as it's really much easier to develop stuff on FPGA both for professionals and housewives alike.

But right now people are crazy about 32-bit processors. They use them to do stuff which can easily be done by PIC16. When they get FPGA, they start by bringing a 32-bit core into their design, while there's no need for any core at all. But this is something historical. The mark of our time. While this persists, FPGAs don't have much chance. Eventually, the time will pass and people will get crazy about something else (may be this will be FPGAs, but I'm afraid 64-bit CPUs will have their hay day first). But at this point of time, it's not wise to stand on the way of GHz MCUs.

Sal Ammoniac · « **Reply #119 on:** October 30, 2019, 05:00:03 pm »

Quote from: NorthGuy on October 27, 2019, 11:39:11 pm

When they get FPGA, they start by bringing a 32-bit core into their design, while there's no need for any core at all.

I can understand FPGAs with hard CPU cores on-chip might have some utility, but have a difficult time accepting why anyone would use a soft-core CPU on an FPGA. Seems like a waste of (expensive) resources.

Siwastaja · « **Reply #120 on:** October 30, 2019, 05:09:30 pm »

Quote from: Sal Ammoniac on October 30, 2019, 05:00:03 pm

Quote from: NorthGuy on October 27, 2019, 11:39:11 pm
When they get FPGA, they start by bringing a 32-bit core into their design, while there's no need for any core at all.

I can understand FPGAs with hard CPU cores on-chip might have some utility, but have a difficult time accepting why anyone would use a soft-core CPU on an FPGA. Seems like a waste of (expensive) resources.

For many sequential control and configuration tasks, doing it in software is trivial (say, 100 lines of code written in 2 hours), while doing it as a pure hardware state machine might be resource-hungry and/or tedious - say 1000 lines of VHDL written over a week, and the solution possibly uses more LUTs than a small core. In this is the case, synthesizing a small, maybe just a 8-bit core makes a lot of sense.

For a practical example, supplying initial configuration to an external Ethernet MAC+PHY chip was, by far, the biggest part - in work time, and also in LUTs - in a system I had to do once. The part that streamed UDP packets was, of course, trivial. Since existing C code to do the damn configuration, including all workarounds for the broken-by-design-chip, would have been available, it is very likely that synthesizing a small 8-bitter would have made the task easier. But we did it as a VHDL state machine, and it wasn't nice to write and debug. No simulation possible with an external chip with poor documentation.

Of course, synthesizing a "powerful" DSP core to handle a lot of data won't make a lot of sense (you have the FPGA logic fabric and multiplier and RAM blocks to do that!); but in this case, having it as a hard block, doesn't make much more sense either.

A hard, say, Cortex-M4 core or similar, may end up wasted most of the time. With soft cores, you have at least a lot of flexibility (including the option of not synthesizing any CPU).

nctnico · « **Reply #121 on:** October 30, 2019, 05:14:20 pm »

Quote from: Sal Ammoniac on October 30, 2019, 05:00:03 pm

Quote from: NorthGuy on October 27, 2019, 11:39:11 pm
When they get FPGA, they start by bringing a 32-bit core into their design, while there's no need for any core at all.

I can understand FPGAs with hard CPU cores on-chip might have some utility, but have a difficult time accepting why anyone would use a soft-core CPU on an FPGA. Seems like a waste of (expensive) resources.

It is much easier / quicker to use software than hardware. FPGAs need a lot of time to simulate / 'compile' (an hour isn't excessive for a reasonably complex design) where software is compiled & uploaded in seconds. I have used soft cores / programmable state machines in most of my FPGA designs because they reduce development time and made some of the functionality software upgradable.

dietert1 · « **Reply #122 on:** October 30, 2019, 09:39:22 pm »

I think one advantage of the sequential mantra is that you already have it in your workstation. You can write some code and check your idea. If a workstation had some user accessible FPGA fabric inside, always connected and ready to use, that would make a difference. Those FPGAs with embedded arm processors are a fantastic environment.

In the end i think that projects doing something meaningful with GHz multiprocessors are as complex as any FPGA project, there is no difference. Those projects are not for inventors and hobbyists, but require sound organisation.

Regards, Dieter

ejeffrey · « **Reply #123 on:** October 31, 2019, 05:04:18 am »

Quote from: Sal Ammoniac on October 30, 2019, 05:00:03 pm

Quote from: NorthGuy on October 27, 2019, 11:39:11 pm
When they get FPGA, they start by bringing a 32-bit core into their design, while there's no need for any core at all.

I can understand FPGAs with hard CPU cores on-chip might have some utility, but have a difficult time accepting why anyone would use a soft-core CPU on an FPGA. Seems like a waste of (expensive) resources.

Soft cores can actually be quite small. For complex sequential operations with relaxed timing requirements a soft core can easily take up less resources than trying to implement the same thing in standard state machines. They are frequently used for gruntwork / bookkeeping / IO. I have a system with an FPGA talking to a bunch of high speed DACs. But the DACs have an API register file that needs to be programmed in a certain sequence to configure properly and run checksum tests to adjust the clock timing. The person who originally designed it did it all in logic. It is incredibly annoying it would be far simpler if a small CPU and C program handled the initialization, and probably take the same or smaller resources.

tggzzz · « **Reply #124 on:** October 31, 2019, 09:45:29 am »

Quote from: nctnico on October 30, 2019, 05:14:20 pm

It is much easier / quicker to use software than hardware. FPGAs need a lot of time to simulate / 'compile' (an hour isn't excessive for a reasonably complex design) where software is compiled & uploaded in seconds.

When I was young software turnaround times were half a day. That had the benefit that you thought and hand executed your program before submitting it.

It will be interesting to see how well the new Xilinx Vitis platforms works in the real world.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: GHz Microcontrollers? You Bet! (Read 23393 times)

Share me