TVSes and zeners are there to provide long life for IC by protecting them from ESD zap as you mentioned. Dropping them from your design may cost you the IC and that way make the device totally useless as an audio device. What comes to signal integrity they can have measurable effect with good audio analyser but then there is probably something else wrong in circuit too.
The effect of those protection components is always inaudible.