Use a 3-terminal connector ("TRS").
Do not use a 4-terminal connector ("TRRS")
That way you will not have to decode which is the ground/return node.
It will always be the "sleeve" ("S") terminal.

Then use the "tip" ("T") for your audio signal.
That will always be the "left" channel of the stereo audio.
In the case of a voice call, the left/tip and ring/right will be the same (i.e. "monaural")
But it is probably not a good idea to connect the tip/left directly to the ring/right terminal.