Author Topic: Debugging low temperature crashes of a 400 MHz ARM Microcontroller  (Read 11051 times)

0 Members and 1 Guest are viewing this topic.

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26751
  • Country: nl
    • NCT Developments
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #25 on: January 07, 2017, 11:49:53 pm »
I wouldn't rule out other parts though. I had a similar problem with a design and it turned out a new batch of decoupling capacitors was much worse then before (tested with DC bias + LCR meter). Fortunately it was fixable by software but on the next board revision I specced different capacitors and added some extra.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline gperoniTopic starter

  • Contributor
  • Posts: 38
  • Country: it
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #26 on: January 08, 2017, 01:02:36 am »
If you don't know why, how do you know it won't crop up again?
And this is why after having replaced the DDR yesterday I spent the whole of today looking at timing parameters and improved the situation with the Micron chips. I wish my understanding of those problems was deeper.

Saying that bad capacitors might be causing the Micron DDR to behave badly while Samsung is less susceptible is a very good point, I should probably just try new chips in the old board or new capacitors in the new one.
 

Offline gperoniTopic starter

  • Contributor
  • Posts: 38
  • Country: it
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #27 on: July 21, 2017, 10:09:04 am »
So, 6 months in update. The problem discussed in the thread was solved by changing DDR, replacing it with one with the same timing parameters but I guess blander requirements. Since that it's now time to respin the board, I'm wondering if I can get your opinion on whatever or not the DDR layout of the board is decent, or if it should be improved.

This is the DDR layout of a reference design board, a Leopardboard DM368:
http://imgur.com/a/YymoU

This is our DDR layout:
http://imgur.com/a/LVveI

Thanks a lot!
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26751
  • Country: nl
    • NCT Developments
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #28 on: July 21, 2017, 04:23:28 pm »
Can't you route all the data traces on the top layer? The address lines are way less critical than the data lines. I'd also put the data signals from the same lane on the same layer.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline gperoniTopic starter

  • Contributor
  • Posts: 38
  • Country: it
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #29 on: July 21, 2017, 07:41:05 pm »
I think the constrain in routing all of the data lines in the top layer is space. This is quite a small board and that's all the space there is to route the DDR, we can't move the DDR further from the IC. I asked the designer about keeping data/address in different planes, and he is saying that yes, that would be an improvement, but again he isn't sold on it as he considers it mostly a waste of time since that the board was working fine in the past, for two (prototype) revisions, before that batch.

I sort of agree with him. Please don't take this the wrong way, I extremely appreciate people making suggestions here for free (!!!) but ultimately he is the guy in charge. If there is nothing blatantly wrong for a 400 MHz DDR design, I think we shouldn't move the data/address lines to different planes, as this would take a lot of time.

However what we just saw as a possible improvement is moving the DDR VREF trace further back from the signal lines (not depicted, so here is a screen, fat one on the right:
)

Thank you!
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26751
  • Country: nl
    • NCT Developments
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #30 on: July 21, 2017, 08:52:18 pm »
The problem with using many layers is that you need to have ground layers or very good decoupling between power and signal layers. In a DDR400 design I did I routed all data lines on the top layer using 0.1mm traces / 0.15mm distance. Yes, this is a lot of work but I thing it will help signal integrity. This probably needs swapping data lines (but keep lanes together) to have avoid needing vias. Perhaps reducing the layers used to top and bottom layer with a solid ground plane in between will already help. It is important each trace has the same impedance/capacitance and I don't see that happen when using several inner layers.
The fact the board didn't work with some chips shows the design is marginal.
« Last Edit: July 21, 2017, 08:59:42 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline gperoniTopic starter

  • Contributor
  • Posts: 38
  • Country: it
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #31 on: July 22, 2017, 12:29:24 am »
Ok, fine. It's after 2AM here and I can't sleep. It's because I looked at another design I was going to manufacture where I have the same DDR/IC, and I got super scared when I realised the DDR there is routed using another 4 signal layers, but those are all stacked together and there is no ground/power plane in between, just adjacent to the first and fourth signal layer. I saw another post of yours, nctnico, where you are saying impedance starts becoming a factor only when traces are multiple cms long and hundreds of MHz, you were suggesting to focus on trace lengths. Do you think I can get away with that impedance-control-free-4-stacked-signal-layers design since that the maximum trace length is around 1cm? (still DDR400).

The stackup of the board discussed in this topic is better: ground, signal, signal, ground (repeat). There isn't half a mm of separation between the two signal layers as I saw in some application notes from Micron, unfortunately, it's just ~100um. By plugging the numbers in a calculator I get 73 Ohm for a microstrip and 58 Ohm by using the asymmetric stripline model, basically ignoring the second signal layer stacked between the two ground/power planes, that I guess makes sense and should say the impedance in there is not that wrong?

If that's the case, for this board, I believe the only improvements possible are the ones you suggested.

(I feel like a baby with being unable to sleep for this, but the idea of spending all that time again debugging this DDR scares me, and you convinced me there is work to do in here, nctnico).

Btw, Thank you for your time, again.
 

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4694
  • Country: au
  • Question Everything... Except This Statement
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #32 on: July 24, 2017, 02:39:47 am »
Your layout does look lacking in return paths for signals, and high risk of crosstalk. But yes for these distances you only need to be within an order of magnitude for impedance, so focus on him last.

Your stackup should be ok, likely having the data run between the 2 planes, and remember to via connect the planes near the source and ddr

You will not always be able to, but larger radius meanders have less velocity factor shift, so your matching will be closer to what you expect.

With a ground for your signals it will remove your current crosstalk issues, i would say you can definatly do it on 4 layers, you may just have to get creative.

 

Offline VALTERLED

  • Newbie
  • Posts: 4
  • Country: it
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #33 on: January 20, 2021, 09:25:11 pm »
Hi ll,
I am working on a similar issue at the moment but its a fault developed on an existing board from a Wandel Goltermann machine, the SPM-19.

The CPU board is running around a Intel 8085.
The problem is that at room temperature of 16-18C the board does not boot- If I warm ANY I mean ANY IC that is Any Eprom or Any RAM or the CPU, the circuit boots correctly. 20C are enough to make it work.
Once it boots, the board works fine forever and the small heat produced after 1 minute or so of operation takes away the problem until it will cool again.
I tried to remove from the sockets every single EPROM one each at the time and the CPU and by warming each one (20C are enough) outside the board, and then re-plugging it makes the board boot perfectly, so I must remove the PCB from being faulty as any of the components if removed and heated makes the board work. What's strange is that is enough that I heat just one of the components that are sharing the ADDR bus and Data bus and the board will boot.

I will focus on the Addr and Data bus and see what happens there. I will check the pull-ups and everything is involved and is common to the CPU and associated components.
Do you have any clue of what to look for?
Thank you for reply,
Valter
Valter
 

Offline srb1954

  • Super Contributor
  • ***
  • Posts: 1085
  • Country: nz
  • Retired Electronics Design Engineer
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #34 on: January 21, 2021, 06:43:40 am »
Did you try cooling the chips with a freezer spray while it is running correctly to see if it stops again?
 

Offline VALTERLED

  • Newbie
  • Posts: 4
  • Country: it
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #35 on: January 21, 2021, 07:43:05 am »
Hi,
thank you for reply,
not yet sistematically, I could do it by blowing fresh air with an Hair phon as I did till now to cool the board (its around 13-15C in my lab) but I would prefer to purchase a cooling can and do it this next saturday.

Thanks for advise, , I keep you posted on the result.
Ciao Valter
Valter
 

Offline srb1954

  • Super Contributor
  • ***
  • Posts: 1085
  • Country: nz
  • Retired Electronics Design Engineer
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #36 on: January 21, 2021, 08:24:41 am »
Easier to use a can of freezer spray or if you don't have this you can use a compressed air duster can held upside down (with the nozzle to the bottom).

These can quickly cool chips down to much lower temperatures than using a fan. It is also possible to pin-point a faulty chip more precisely because of the small area cooled by the spray compared to a that of  a fan.
 

Offline VALTERLED

  • Newbie
  • Posts: 4
  • Country: it
Re: Debugging low temperature crashes of a 400 MHz ARM Microcontroller
« Reply #37 on: January 21, 2021, 11:18:00 am »
I'll certainly do.

Whats really strange is that from a "non working state", ANY of the components on the data/address bus, if heated to around 20C is curing the booting. There is not a specific component that is responsible of the problem but any one of them (and I replaced almost all of the as they are on socket) can make the board work.
I am thinking some physics effect that mast have appeared after many years (the machine is 30 years old), like the the reverse saturation current of diodes effect related to temperature, something that involves all the components sitting on the bus....Don't know, I will investigate more during weekend.
in over 40 years of electronics I have never met such issue.

Thank you for support,   
Valter
Valter
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf