| Electronics > Projects, Designs, and Technical Stuff |
| Hardware Frequency detector (for audio) |
| << < (3/4) > >> |
| RJSV:
Yes you're right, about 5 IC's is a good enough build size for me. Since Radio Shack demise, I can still cruise to Fry's, picking up those longer proto strips, that can handle maybe that much logic. By way of answer, I stopped adding channels at the 'T11' point, because step size gets out of hand, dividing by an integer below about 10, whereas when dividing by 17 then by 16 is not such a huge jump. But I get the point about a 'bottom heavy' system, reporting way too much on only slightly different places. I suppose I could claim the 20 sample capacitor store is 'too crude' to suffer regular analysis, (partially true!). It can help to view the capacitor audio buffer, in a time-line, left to right. Starting on left, is the oldest sample, at 2.2 milli-seconds, is T20. For a run of ten capacitors, each has an assigned channel, ending on (channel tap) T11 at the lowest. The remaining capacitors (channel tap) T10 downward continue the time-line (and indeed the audio sample storage continues un-interrupted) but are functionally different, essentially T10 down to delay T1 are used as the progression goes thru each possible phase ( of detection). When the hardware is finished in the discrete interval of T1, that is the point in time of the audio frequency 220hz reaching one half cycle time (2.2 msec). At that point in time (after T1), there are now 20 samples complete and stored, ready for the process half (simple subtraction). The set of ten frequency detection points, is over a piano half-tone range of ten notes, and so obviously is going to fall short... My logic-design skills obviously exceed writing skills, I find clear writing a task! As a micro-code writer, it is 'relaxing' to put various designs togther. Maybe someday.... hmm |
| RJSV:
OK, here is a really cool discovery: The ratio of 'step size' between integers, when dividing by numbers around 20, happens to closely match the 'perfect' calculated piano type notes, where that ratio is, I believe, the 12th root of two. Turns out to be 1.059 463 The 'detection' frequencies are: 220 hz (A), 232 hz (A#), etc. (244, 259, 275, 293), each detection only off by one or two hz |
| DaJMasta:
So you're trying to detect a fundamental frequency of a live sound signal for the purposes of pitch shifting? I'm still trying to make sense of what you're really trying to do - is there a reference tone that you are comparing the input to, or are you trying to discern the fundamental from the incoming signal and compare that to the rest of the data? Your capacitor bank explanation sounds like an analog version of an ADC with very little storage memory, is that right? Even if you want to end up doing this in analog hardware, I would start with a PC, a sound card, and a spectrogram program (like the freeware Spectrum Lab), to actually visualize the frequency content of the kinds of signals you want to see. I'd also take a look at programs like the freeware Audacity which let you zoom in on the waveform itself, so you can visualize how these extraneous noises from playing an instrument or singing or whatnot actually look like as a waveform. Because it's a tricky problem to figure out the signal vs. the noise in an electronics sense, I would try to come up with methods of processing the incoming signal on the computer where it has plenty of grunt to try things out, then when you have a detection algorithm in mind and want to build it up, you have a better understanding of what's happening and a working comparison point. I've done some work on a tuner frontend for a frequency counter, and just manipulating a real signal to arrive at the played pitch quickly is tricky. You have to auto-gain control it so that articulation noises or volume fluctuations aren't a big issue, and you have to tweak the settings of it so that it doesn't respond too fast to big changes but responds fast enough to catch moving notes - much easier in digital, but needs a fair bit of processing. If you think of looking at the problem with FFTs, you'll eventually see that while bins are typically linear in size, that means you have to pick your bin size to be ideal for the range you're looking at, because you will lose detail quickly on the low end. One technique to get around this and get very small bin size (high frequency resolution) is to run several FFTs on the same input signal, but with different bin sizes and different bands of interest, so that you have something closer to the logarithmic sweep and similar detail over all the ranges without having hundreds of thousands of bins. 1Hz difference at the bottom range of a bass could easily be an entire semitone on a musical scale, but several octaves up, could just be a small tuning glitch from the primary tone. Also worth emphasizing that instruments and voices get their unique timbre because of their frequency content when playing a tone and their unique articulation sounds, mostly, and the frequency content will change a LOT with different ranges. For example, woodwind instruments in their lowest ranges may have the fundamental may be fairly small, with a lot of higher frequency content, while in the central range of the instrument it tends to be more spectrally pure (pure being relative and this not being the case for every kind of instrument), and then high in the range often becomes dominated by overtones again, but in a different proportion than the other ranges. Actually identifying an instrument from a signal would depend on characterizing its various ranges, then normalizing the incoming signal to compare it with those spectral finger prints - picking an instrument out of a group of them playing could be done in a similar way, but the combinatorial nature of all the overtones could get very processing heavy, very fast. Not something I'd want to try in analog unless I could isolate a very specific sort of difference to look for that would give me what I was looking for. As a side note, the standard 12 note equal temperament tuning system does use the 12th root (440*2^(n/12) where 440 is the reference tone and n is the number of semitones away (positive, zero, or negative) from the reference), but that's not the only system used - there are "pythagorean" and "just" tuning systems among others (and this is just in western music) where fundamentals are determined slightly differently, but which in some cases the tones line up to sound much more "in tune" than using an equal temperament system. The classic example is a major cord, like the notes C E and G on a piano, when struck together will sound out of tune, but if you adjust the pitch of the E 12 cents down (cents are hundredths of a semitone, still logarithmic scale so you can calculate with the original formula but n/1200), the chord's harmonics line up much better and it sounds much more in tune. There's a lot to look into in the field, but if you can narrow down what you're trying to detect or manipulate, it should simplify the problem quite a bit. |
| RJSV:
Thanks for recent reply. I am not a PRO in signal processing and appreciate those careful comments. First, this project is for understanding FFT and to get a practical view of what convolution is, in common language(!). For a complete module, an accurate set of 9 adjacent musical notes can be generated/sensed (synchronously) using integer clock dividers of 17 or nearby. Those being: divide by (22, 21, 20, 19, 18, 17, 16, 15 and 14). In this first module, it is 8800 hz divided by 22 (and then divided by 2) for detecting 200 hz (I believe near 'G' note) as the lowest in the run of detection points. Engineering often calls for knowing what kind of sensitivity 'band' is in operation (using a rectangular sample window of 114 microseconds duration). The notes corresponding are: G, G#, A, A#, B, C, C#, D, and D# at (200hz, 210hz, 220hz, 232hz, 244hz, 259hz, 275 hz, 293hz, and 314hz). Now,for the CD4017 sequential outputs, channel T-20 for example, has a Modulo-20 counter: lets assume the channel T20 counter is in 'LOCK' mode (lock with the audio input). This simply means the counter Q0 output clocking state has the best phase for detection, (and thus the sample/hold circuit uses Q0 as a gate trigger.) The 20 counts represent a full circle of analog phase (360 degrees), so a superimposed 'SINE WAVE' actually stretches along with 5 counts for every 90 degrees of waveform. Thus the expected sine function is 90 degrees phase-shifted, starting at minus 5 counts, (that is count 15 for the 220 hz channel T20), going to the SINE peak at count=0 (phase = 90 degrees) and then going thru 180 degrees of phase, at count=5. The shifted SINE function ends (360 degrees) at count=15. Readers should start to see: the counter plays a simple dual role, both in synchronous input actions and in synchronous output actions. Pulses at Q15 and at Q5 are brought out, representing 'zero-crossing' times (zero and 180 degrees). It is a textbook matter from there, to generate an appropriate analog SINE wave: first, the Q5 and Q15 outputs connect to clock a flip-flop (CD4013). Now you have a square wave, 50/50 duty cycle. Referring to the work of Forrest M. Mimms III , page 80 shows a function generator circuit. (see Mini-Notebook Series Volume I). A square wave is converted to triangular, then filtered to make an approximate SINE wave output. It is perhaps helpful to push the 'pulse stream' for earlier output, using Q4 and Q14, as there are analog delays in converting towards analog SINE output. (Maybe does not matter, tho). Now, for the T16 channel (a multiple of 4) each 90 degrees of analog phase travel is represented by 4 counts, and so the pulse outputs will be Q4 and Q12, neatly. However, what about T19, or T17 ? WELL... never stopped me before: Channel 19 simply 'pretends' to be approximately an even divisor: Thus the pulse outputs, for channel 19, are placed at Q5 and Q15, and resulting waveform is not entirely symmetrical, at about 55/45 duty cyle. Astute readers might note: it is not trivial to 'chain' together two CD-4017 decade counter, as both IC's will have 'live' outputs. One method uses a 'RANGE' high / low flip-flop to switch out Q0 from the 'high' counter IC (and suppress counting). That way a RESET can clear both counter IC's and clear the 'RANGE' flip-flop. The modulo-20 counters can simply roll-over, to zero. The others (MOD19, 18, 17, 16, 15, and 14) use the classic method, by connecting 'last count+1' to the IC RESET. ERRATIA: I will discuss in another post, thanks for your time! |
| RJSV:
OOPS! I'm off by a factor of two, sorry. What is needed is a 'POLARITY' flip-flop, where my previous explanation has 90 degrees of analog waveform represented by 10 counts, (not 5). So a 'zero-crossing' pulse needs also to use the polarity (plus or negative half of waveform), for identifying which zero-cross. Another source of error, misplaced mention of a divide by two: proper formula is 8800hz divided by integer delay count, then divided by two... Biggest error is ignorance of the 'opposite polarity' type detection, that being use of the negative going waveform, along with a reversal of the subtraction process. So actual detection, or first fully scanned wave is going to take 5 milliseconds or more, at 220 hz (cycle period of 5.4 milliseconds). If thats OK then a latency of 7 milliseconds if possible... T10 or channel with divide by 10, is not looked for. (That would be 440 hz). And a 'burst' mode will not work, there is needed an analog front end that continuously chuggs in samples, at 114 micro-seconds per sample, which is 8800 hz. Note that an output pulse at Q10 can be issued early, relative to the whole phase tracked by the channel 20 counter. For example, issuing the pulse from Q9 connection will be early by 9 degrees. This is because there are 20 steps or phases in 180 degrees of analog waveform. This easy availability of outputs, from a CD-4017 shows a flexible aspect of that un-coded counter IC. |
| Navigation |
| Message Index |
| Next page |
| Previous page |