Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

zhrtvc

zhrtvc

Chinese Real Time Voice Cloning

tips: 中文或汉语的语言缩写简称是zh

关注【啊啦嘻哈】微信公众号,回复一个字【】,小萝莉有话对你说哦^v^

啊啦嘻哈

语音合成工具箱

pip install -U ttskit
  • 快速使用:
import ttskit

# 合成语音
ttskit.tts('这是个样例', audio='24')
  • 网页界面:

index

# 执行python函数部署
from ttskit import http_server

http_server.start_sever()
# 打开网页:http://localhost:9000/ttskit
# 安装ttskit后,命令行部署网页界面

tkhttp

usage: tkhttp [-h] [--device DEVICE] [--host HOST] [--port PORT]

optional arguments:
  -h, --help       show this help message and exit
  --device DEVICE  设置预测时使用的显卡,使用CPU设置成-1即可
  --host HOST      IP地址
  --port PORT      端口号
  • 命令行:
tkcli

usage: tkcli [-h] [-i INTERACTION] [-t TEXT] [-s SPEAKER] [-a AUDIO]
             [-o OUTPUT] [-m MELLOTRON_PATH] [-w WAVEGLOW_PATH] [-g GE2E_PATH]
             [--mellotron_hparams_path MELLOTRON_HPARAMS_PATH]
             [--waveglow_kwargs_json WAVEGLOW_KWARGS_JSON]

版本

v1.5.6

使用说明和注意事项详见README

  • 注意事项:

    • 这个说明是新版GMW版本的语音克隆框架的说明,使用ge2e(encoder)-mellotron-waveglow的模块(简称GMW),运行更简单,效果更稳定和合成语音更加优质。

    • 基于项目Real-Time-Voice-Cloning改造为中文支持的版本ESV版本的说明见README-ESV,该版本用encoder-synthesizer-vocoder的模块(简称ESV),运行比较复杂。

    • 需要进入zhrtvc项目的代码子目录【zhrtvc】运行代码。

    • zhrtvc项目默认参数设置是适用于data目录中的样本数据,仅用于跑通整个流程。

    • 推荐使用mellotron的语音合成器和waveglow的声码器,mellotron设置多种模式适应多种任务使用。

  • 中文语料

中文语音语料zhvoice,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。

  • 中文模型

扫描上面的二维码,关注**【啊啦嘻哈】微信公众号**,回复:中文语音克隆模型走起,获取百度网盘的模型文件。

  • 合成样例

合成语音样例的目录

目录介绍

zhrtvc

代码相关的说明详见zhrtvc目录下的readme文件。

models

预训练的模型在百度网盘下载,下载后解压,替换models文件夹即可。

data

语料样例,包括语音和文本对齐语料。

注意:

该语料样例用于测试跑通模型,数据量太少,不可能使得模型收敛,即不会训练出可用模型。

在测试跑通模型情况下,处理自己的数据为语料样例的格式,用自己的数据训练模型即可。

学习交流

【AI解决方案交流群】QQ群:925294583

点击链接加入群聊:https://jq.qq.com/?_wv=1027&k=wlQzvT0N

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Owner
Kuang Dada
让世界多一点AI。 Move on, work hard, do what you do.
Kuang Dada
Similar Resources

TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

Aug 1, 2022

Command Line Text-To-Speech using Google TTS

Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

Dec 27, 2021

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Jul 8, 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Aug 1, 2022

A multi-voice TTS system trained with an emphasis on quality

TorToiSe Tortoise is a text-to-speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and inton

Aug 8, 2022

Repository for the paper: VoiceMe: Personalized voice generation in TTS

🗣 VoiceMe: Personalized voice generation in TTS Abstract Novel text-to-speech systems can generate entirely new voices that were not seen during trai

Jul 27, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Aug 6, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Jul 27, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Dec 28, 2021
vits chinese, tts chinese, tts mandarin

vits chinese, tts chinese, tts mandarin 史上训练最简单,音质最好的语音合成系统

May 27, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Jul 26, 2022
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning
 Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

Aug 1, 2022
Ukrainian TTS (text-to-speech) using Coqui TTS

title emoji colorFrom colorTo sdk app_file pinned Ukrainian TTS ?? green green gradio app.py false Ukrainian TTS ?? ?? Ukrainian TTS (text-to-speech)

Jul 11, 2022
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支,删除 wavegan 分支! 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块! 2021/04/13 softdtw 分支 支持使用 Sof

Aug 1, 2022
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

Aug 2, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

Jul 11, 2022
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features ?? Chinese supported mandarin and tested with

Jul 31, 2022
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

English | 中文 Features ?? Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc. ?

Aug 3, 2022
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

Jul 6, 2022