As far as I know KiCad has no special features for breaking out BGA's.
Tutorials on how to do this do not need to be KiCad specific.
A tip I once heard was to start with the center of the BGA. If you do the outer ball rows first, here is no room left to do the breakout of the center.
I don't know if a 100 pin BGA can be done on a four layer board without sacrificing too much of the power planes. I don't have the experience to advise on that, except the obvious: Too many layers makes the board more expensive. Too few layers and you won't get it routed.
Perhaps such questions are better asked in the general eda forum:
https://www.eevblog.com/forum/eda/There are links to other projects made with KiCad on KiCad's website. Studying some of those boards may help you.
https://kicad-pcb.org/made-with-kicad/For example, the Olinuxino A64 is a complete KiCad project with an Alwinner A64, DDR ram and such, which you can clone from github to study how they did it.
Decoupling cap placement for BGA's is no different from other chip packages. Using short wires and 100nF caps are generally enough for low to medium speed logic (upto 50MHz or so) For higher speed logic you need to use different sized capacitors to get a low impedance over a wide frequency range.