Honestly, I've tried these text to speech voice cloning softwares.
I've noticed that my voice was already changing if I was reading something, probably more than with the TTS.
I've tried one that worked with something like a 20 second sample, you imagine you would need hours of sample to train it. And it got the tone quite alright. I think when we get the frameworks to process hours of recordings, and learn speech differences/dialects then it will be very difficult to distinguish. Because the TTS was pronouncing some words quite differently than I'm, well because not a native speaker and such.


