Window width is defined as the width of the nonzero samples in the window. Adding zeros outside the window does not make it wider.
I think it does.
No, you're conflating "DFT width" and "window width". The PDF you linked to, uses terms like "a window function to be used with DFT of length
N". That length refers to the DFT size, and not the window function size.
Discrete Fourier transform is defined as
$$Y_k = \sum_{n=0}^{N-1} s_n e^{-i 2 \pi k n / N} \tag{1}\label{1}$$
and windowed discrete Fourier transform is
$$Z_k = \sum_{n=0}^{N-1} w_n s_n e^{-i 2 \pi k n / N} \tag{2}\label{2}$$
By definition, zero \$w_n\$ contribute nothing to the sum.
It is important to realize that above, \$N\$ is the width of the DFT.
(It is possible to implement a DFT that incorporates the window function in its coefficients, at computational cost \$O(M \log N)\$, I believe, where \$M\$ is the number of nonzero values in \$w_n\$.)
(Also note that when generating spectrograms or power spectrums, the magnitude at bin \$k\$ is squared and then divided by the square of the sum of window function weights, \$(\sum w_n)^2\$, and the result multiplied by 2 because for real inputs, the FFT is symmetric, and we only consider the positive frequencies. The width of the DFT, \$N\$, is not important, as it only determines the precision and range in frequencies.)
Like I wrote, triangle filters can be defined in at least three different ways. For large \$N\$, they all converge to the same.
Even the linked PDF states on page 8,
"This window function starts near or at zero, then increases to a maximum at the center of the time series and decreases again."Because zero \$w_n\$ coefficients lose information about the original time-domain signal, you normally want the windowing function to be nonzero within the DFT window, and zero elsewhere; this maximizes the amount of information you can obtain from the time-domain signal. The cases where you do use zeros are when you have a DFT implementation that restricts the DFT length, and want to use a smaller window; or when you want more precision in frequencies.
Also, window overlap is NOT calculated in terms of the DFT length (for example, half the DFT length), but in terms of the window function width.
The reason DFT of the same windowed signal varies as the DFT size varies, is because of two things: spectral leakage, and the center frequencies of the bins vary as a function of the DFT length.
As a practical example, consider window 0.2 0.4 0.6 0.8 1.0 0.8 0.6 0.4 0.2, and signal \$s_n\$ = 0.0872 0.9962 0.2588 -0.9063 -0.5736 0.7071 0.8192 -0.4226 -0.9659, via A=realFFT(0.0174 0.3985 0.1553 -0.7250 -0.5736 0.5657 0.4915 -0.1690 -0.1932), and B=realFFT(0.0174 0.3985 0.1553 -0.7250 -0.5736 0.5657 0.4915 -0.1690 -0.1932 0 0 0 0 0 0 0 0 0).
You'll find out that A[0] = B[0], and A[k] = B[2k]. In other words, by increasing the DFT length \$N\$ (by adding zeroes to the window function), you only increased the frequency resolution. Using numpy.fft.rfft(),
n Re B[n] Im B[n] |B[n]| Re C[n] Im C[n] |C[n]| Re A[n] Im A[n] |A[n]|
0 -0.0036 0.0000 0.0036 -0.0036 0.0000 0.0036 -0.0036 0.0000 0.0036
1 0.0018 0.0165 0.0166 0.0073 0.0149 0.0166 0.0329 0.0826 0.0889
2 0.0329 0.0826 0.0889 0.0783 0.0421 0.0889 0.0196 -0.2755 0.2762
3 0.2152 -0.0192 0.2161 0.0910 -0.1960 0.2161 -0.0342 0.0839 0.0906
4 0.0196 -0.2755 0.2762 -0.2679 -0.0672 0.2762 -0.0078 -0.0069 0.0105
5 -0.2079 -0.0449 0.2127 -0.0081 0.2125 0.2127
6 -0.0342 0.0839 0.0906 0.0898 -0.0123 0.0906
7 0.0092 0.0060 0.0110 -0.0032 -0.0105 0.0110
8 -0.0078 -0.0069 0.0105 0.0050 0.0092 0.0105
9 -0.0192 0.0000 0.0192 0.0192 0.0000 0.0192
If you look at what happens if you shift the samples in B by moving some of the zeros to the front but keeping the FFT the same size –– above, C=DFT(0 0.0174 0.3985 0.1553 -0.7250 -0.5736 0.5657 0.4915 -0.1690 -0.1932 0 0 0 0 0 0 0 0) ––, you'll see that the magnitude of each bin stays the same, |A[0]| = |C[0]|, |A[k]| = |C[2k]|, but the phase in each bin in C is different, as expected.
Small changes in DFT length \$N\$ can be confusing, because spectral leakage affects only signals whose wavelength is not an exact multiple of \$N\$. The signal that yielded a perfectly crisp peak at DFT length \$N\$ will be spread out in DFT length \$N+1\$ or \$N-1\$.