Hmm, so no inner copper pour/plane? And there's ground on the outer layers, but only local to the controller. I don't understand the inner layer polys for the switching nodes, heat dissipation possibly but there aren't many vias in them, and the one doesn't seem to do anything at all, electrically (no via under M1). I would like to see bigger polys around the drain pads, to improve heat dissipation, and more vias to get better heat sinking and lower ESL.
Without inner planes, your switching loops are, more or less: Coutx2-D38-M2, and C1nx2-M1-D36-Rsense1. And those...aren't great themselves, the polys are oddly shaped and few thermal spokes connect to component pads.
I don't have a great sense of scale here, but I'm guessing the loop inductance is around something like 10nH?
The transistors are awfully fast, that is they're quite small: Qg = 2.4nC, which is much less than the 7nC the controller was tested at, which is rated for 14ns gate edges. So you can expect somewhat faster edges still, probably not half the time, but 10ns might well be possible. And that's the gate waveform, but the drain waveform transitions in a fraction of that, so 5ns switching waveforms are entirely within the realm of possibility here. This is hot stuff!*
*As silicon goes; with GaN entering into the market, 1ns (and below) is getting to be ordinary. Extremely tight layout is mandatory there!
I'm not real clear on what current range this is supposed to be. You say you tested at a bit over an ampere. There's 10A worth of diodes in there, but also a 7A transistor, but also but also, minimal footprint so maybe more like 4A continuous? (And then, actual rating will depend on input and output voltages, because duty cycle and stuff.) It's a bit all over the place.
The two diodes in parallel have almost 2nF of capacitance at zero bias. This is way more than the transistor's 500pF at Vds=0; which doesn't mean anything by itself, but does mean the hard-switching loss would be about quadruple that of a synchronous version (using transistors in place of diodes). I'm not sure if the intention was current capacity, or low voltage drop. If drop: diodes just don't do much in parallel, the cure is worse than the disease so to speak. I think you'll find total losses here will be much worse with both in parallel, than with just one; and if the current is actually more like 2 or 3A, a B340 might be better still, despite the somewhat higher Vf.
When the transistor switches on, it yanks the voltage up/down, turning off the diode and charging its capacitance. The switching loop overcharges to some peak current above the load current, and rings down through circuit losses. As the voltage swings and the transistor saturates, the capacitances all shrink (the diodes each range from ~900pF at zero bias, to ~100pF at rated voltage), which raises the resonant frequency of the LC network thus formed (switching loop inductance combined with off-state semiconductor's capacitance). So we would expect to see spectral peaks around 1 / (2 pi sqrt(L C)) = 112MHz, or spikes or bursts with a period of around 9ns. Also the diodes are placed rather far apart, adding additional inductance between them, giving a double peaked resonance somewhere around there.
When the transistor switches off, load current pulls the voltage down/up, charging the transistor's capacitance and discharging the diode. As the voltage settles down, the rising diode capacitance "cushions" the swing (dV/dt slows down), which is actually nice, as the more gradual transition softens the transfer of load current from transistor to diode. Still, the transistor turning off, pings whatever inductance is on its side, and with a capacitance on the order of 100pF and if we use the same 10nH loop inductance figure, it would resonate at 160MHz (6ns period). This is longer than the expected drain switching speed (though not by much), so I would expect it to be visible.
Oh, wow, and that probably won't happen because M1 at least has a huge loop in its gate drive path: from U7 pin 20, through Cboot2, to L3 pad -- polygon picks up load current, wraps around D36 and D37, M1, then the gate returns under Rsense1. All that shared load current will slow down turn-on and turn-off, increasing switching loss in boost mode operation. Yeah, that needs to be a Kelvin connection at the source, run a separate trace for it, vias are cheap!
Also, there's no footprints for external gate resistors, so this thing is running full throttle, as fast as it can go, and you have to cut traces if you want to dampen things out.
As for switching loss, the energy stored in that 10nH loop, at say 5A peak and switching at 300kHz, is only equivalent to 37mW, so it may not be all that interesting. The diode capacitances store, ehh, hard to calculate, but in the ballpark of 350mW, a large part of the total if I'm not mistaken.
Adding RC dampers/snubbers isn't going to do very much, because the R and C will have maybe 3nH ESL each (depends on size and placement), so can only act in parallel with part of the 10nH loop inductance to reduce or dampen it modestly. (Typical values would be around 220pF to 2.2nF and 2.2-10 ohms.) The problem is more fundamental, the large gaps between nodes in the switching loop and the lack of ground plane.
Now, all of this only addresses the fastest switching edge trash; it doesn't do anything about the slower ripple between pulses. Of which it looks there are two, a faster one on the switching waveform tops, and a slower one on the output.
Speaking of the waveform, is the edge really ~50ns? Is your scope or probe set to full bandwidth..? Or was this at light load or something?
What are the can capacitors? Electrolytic, polymer? (Have to keep guessing, there's no BOM and no PNs on the schematic.) If electrolytic, I worry they aren't big enough to handle the ripple current; if polymer, I wonder if they're resonating with the smaller ceramics. Model the equivalent circuit, using rough figures for ESR (can get mfg models for the chips, maybe not the cans though?) and ESL (include stray inductance on the board!), can simulate it in LTSpice or what have you and see what works.
Most likely, a good solution will simply be to use bigger ceramics, enough in parallel. 1-10uF at 25-50V are common enough values.
Finally, an LC outside of the main filter caps can be a good idea. In this configuration, both sides (input and out) are expected to be noisy at one condition or another, so tacking on an extra say 0.2-1uH and 10uF may help.
Tim