Would a fourier transform or wavelet transform be better for characterizing the sound of tapping on different surfaces? I'm trying to train a machine learning model to identify different tapping sounds. So far I've used a canned FFT function on raw audio data and have been able to train one to tell a desk tap from a keyboard key press almost every time. Although taps on similar sounding surfaces have proven more difficult. Would wavelets be better? I know that Fourier transforms are best for stationary signals whose frequencies don't change over time. I can't imagine these sounds have unwavering frequencies, but perhaps they are brief enough to where that doesn't matter?

a tap is more than 1 frequency, usually a delicious combo with a distinct time to decay for particular frequencies.
id assume they wouldn't change frequencies but youre being autistically vague. u tapping a table? cymbal? some ass?
My approach would be looking for particular fft values by freq, amp, phase and rate of decay. wavelets are possible but sound like alot more work

I recommend Dynamic Time Warping (DTW).

