There are many, many ways to do this.
Your assumed way of "directly" counting time to see the actual echo is either low-resolution, very expensive, or both. Used in very high end LiDARs. Pros:
- No "slightly false" readings due to reflections, multipaths etc.
- Motion during measurements can't mess up the reading.
- Possibility to get multiple echoes with one beam (partially occluded case)
Cons:
- Given cheap timing hardware, the actual resolution is poor
- If high resolution is needed, the timing hardware will be ridiculously expensive
- The actual light packet (pulse) needs to have massive power. Even if the amount of energy used is competitive, it must be delivered in a very short time.
Instead, all low-cost systems utilize some form of phase detection. There are many. An example using one of the ideas:
Blink the laser at, say, 10 MHz (50ns on, 50 ns off). Have one photodiode site to receive the light, but redirect the generated charge to two alternating wells, flip this switch at the same 10MHz, in sync with the laser. 50 ns to well A, 50 ns to well B.
Now what happens? If your obstacle is at right up close, there will be no delay, and during the 50 ns of laser on, it all gets into well A. Well B gets nothing. What if your obstacle is 1 meter away? Light has to travel 2 meters, causing a delay of 6.66ns. So now well A sees 43.3ns of light, and well B sees 6.66ns of light, they collect charges in respective portions, and you can calculate the distance from their ratio.
Now comes the nice thing: nothing prevents you from running this for longer than just one cycle. You can run it for thousands of cycles. (For example, a millisecond-long exposure would repeat 10000 such cycles). The wells continue collecting photons, maybe just a few per cycle, but they add up. You don't need a massive pulse of power. This way, both wells have significant number of photons converted, so signal-to-noise ratio gets better, and you can have very accurate amplitude measurements, and do the math back to distance with high accuracy. This way, even when the frequency is small, you still have the accuracy.
The minus side? You don't actually know what elements caused the ratio you see in the wells. Say, your sensors front glass is dirty or you have a tiny amount of fog in the air. Your obstacle is 1 meter away and produces photons in 43.3:6.66 ratio to wells A and B. But the fog also reflects light, say, from 1cm away, and produces photons in almost 50:0 ratio. They get mixed in the wells, so your calculated measurement drifts closer than what it actually is, and you can't detect this situation.
A real "echo" pulse LiDAR system would give you two separate "pings", first 1cm, then 1m, given that both exceed some energy limit.
Actual phase detectors tend to be slightly more complicated (and there are many different subtypes), to compensate for electro-optical inaccuracies (like nonlinearity or drift in the photodiode, well, or ADCs), but the basic principle is what I explained.
Hope this helps.