In theory, yes, but in practice it was not that easy.
The inertia of phosphorus tubes is high, tens of ms, and depends from one TV to another. Those who were selling the home computers were usually selling them without a monitor, and the buyer was using whatever TV was heaving at home.
Then, the CPU back then was clocked at only a couple of MHz, for example the ZX Spectrum from the video was with a Z80, an 8 bit microprocessor, and clocked at only 3.5MHz. Same processor was handling the read/writes to the video RAM, plus the execution of whatever code was running.
The total screen was 384 pixels wide, out of which 256 pixels were the useful video image, and the rest were border. A typical TV line has 64us, out of which about 12us or so (don't recall exactly) were sync pulses. The fastest CPU instruction was taking 4 clocks at 3.5MHz. During this CPU instruction, the electrons spot was traveling along 8 image pixels. Not to say, most instructions were 2-4 times longer than the minimum possible 8 pixels.
The timing was very tight, and everything was heavily optimized. Not much room was left to detect a pencil position. There were no dedicated hardware counter/timers yet, like there are in the nowadays microcontrollers, so all timings were to be done in software. If exact timing was needed, you were writing each machine code mnemonic in your loop on a piece of paper, and count the total numbers of clocks needed to execute one loop.
Or else, add dedicated hardware (for example a Z80 CTC, which was another chip on the same data and address busses, very similar with the nowadays PWM/counters from MCUs). But adding hardware extensions was expensive.