langwadt is correct.
While it might be unintuitive, it is easy to see how this happens if we apply a bit of math here. (Lemme know if you don't want me to post this kind of posts, BTW.)
To describe the noise, we need to know two of its parameters: its mean (which we assume is zero here), and its standard deviation σ. The noise distribution does not need to be Gaussian, as we only use σ as a measure of the noise magnitude: by definition, 68.3% of noise samples are within σ of its mean.
When we average samples, zero-mean-noise tends to cancel itself. With an N-sample average, the standard deviation of the noise is σ/N.
To work this out, let xi be the noiseless samples, and yi be the corresponding sampled noise, so that the actual measurement i is (xi + yi). When you take the average of those N actual measurements, you can split the sum into two, and get the mean of the noiseless samples plus the average mean of the noise samples. (If you do the same keeping the standard deviation labeling, using any notation you want, you'll see that averaging over N samples, the standard deviation of the noise is indeed σ/N.)
(In fact, if we know both the mean and the standard deviation of the noise, and the original signal is sufficiently stable (low change rate), the noise can be compensated to give a similar result, i.e. 1/N reduction in the magnitude of the noise, if their dependency on the original signal is such that every unique measurement mean corresponds to a specific original signal mean. But you might get stuff like differential equations that have to be numerically solved to do that. It is useful with e.g. very sensitive low-temperature sensors, where the thermal noise can we well characterized and thus compensated for.)
My own post above attempted to point out that if you use integer math (and with microcontrollers, you usually prefer integer math, unless your microcontroller has fast hardware floating point capabilities at sufficient precision), you'll want to do hysteresis and other similar analysis in the sum, prior to the division, because although the noise is higher, the added precision allows correspondingly more precise controls. My example shows that if your noise is ±0.5 bits (-1-0 or 0+1, or something in between) in the averaged value, with a bit of smart hysteresis in the sum you can keep the averaged value stable and noise-free.