SolvedTensorFlowTTS FS2 + MBMelgan Speech sounds much worse when exported to TFLite in Android


I have trained a FS2 model and fine-tuned my MBMelgan model. Here is a sample of the speech produced in Python before exporting to TFLite: Normal FS2 and MBMelgan.

Once converted to TFLite and used in Android (based on TensorFlowTTS/examples/android/) the models sound much worse...

  1. Using my FS2 model + repo's pretrained multiband_melgan.v1_24k, both exported as TFLite. Speech

  2. Using my FS2 model + my MB MelGAN, both exported as TFLite. Speech

  • In 1 and 2 the speech sounds lower quality, the voice is less life-like and sounds less like my speaker. It's also much lower pitch, even though f0_ratio 1.0 is passed into both models.

  • When using my vocoder in (2) you can hear low-frequency crackling you do not hear before converting to TFLite.

Is it normal to expect the model to sound worse once exported to TFLite?

Why do you think my vocoder adds crackling noise but the repo's pre-trained one does not?


31 Answers

โœ”๏ธAccepted Answer

Hi @OscarVanL, this is what I also observed in German model I trained. Basically, I obtained good results after removing optimizations with:

converter.optimizations = []

Default optimizations do not disturb the performance of Tacotron 2 seriously, but the performance of Multi-band MelGAN quickly degrades with optimizations. So I applied them to the former but not to the latter.

