2. Language

💡 Tips for Best Results:

  • Clean Audio: Use a reference audio clip without background noise or music.
  • Length: A reference clip of 3 to 10 seconds is usually the sweet spot.
  • Language Match: Make sure the selected language matches the text you typed!
  • First Run: The very first generation might take a few extra seconds while the models allocate memory.