本模型是个性化语音合成的自动标注工具所依赖的模型及资源集合
暂无
使用方式:
pip install -U tts-autolabel -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
另外tts-autolabel包依赖的onnxruntime,默认onnxruntime安装版本大于等于1.11会导致要求numpy版本大于1.21,在Modelscope默认镜像中,因为默认安装tensorflow-1.15,该版本tensorflow要求numpy小于等于1.18,这样会出现冲突。碰到这种情况,可以参考如下方案解决:
使用范围:
目标场景:
参考下方代码范例
如在Notebook下使用请先在代码块中安装tts-autolabel依赖
,
# 运行此代码块安装tts-autolabel
import sys
!{sys.executable} -m pip install -U tts-autolabel -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
下面的示例以/tmp/test_wavs
作为输入音频目录,/tmp/output_dir
为标注输出目录
其中/tmp/test_wavs
结构如下:
./test_wavs
├── 100001.wav
├── 100002.wav
├── 100003.wav
├── 100004.wav
├── 100005.wav
├── 100006.wav
├── 100007.wav
├── 100008.wav
├── 100009.wav
├── 100010.wav
├── 100011.wav
├── 100012.wav
├── 100013.wav
├── 100014.wav
├── 100015.wav
├── 100016.wav
├── 100017.wav
├── 100018.wav
├── 100019.wav
└── 100020.wav
调用接口运行自动标注, 20条标注数据大约需要30秒(具体时间视实际机器性能而定)
from modelscope.tools import run_auto_label
input_wav = '/tmp/test_wavs' # wav audio path
work_dir = '/tmp/output_dir' # output path
ret, report = run_auto_label(
input_wav = input_wav,
work_dir = work_dir,
resource_revision = "v1.0.7"
)
print(report)
自动标注完成后,产出的标注数据目录/tmp/output_dir
如下:
./output_dir
├── interval
│ ├── 100001.interval
│ ├── 100002.interval
│ ├── 100003.interval
│ ├── 100004.interval
│ ├── 100005.interval
│ ├── 100006.interval
│ ├── 100007.interval
│ ├── 100008.interval
│ ├── 100009.interval
│ ├── 100010.interval
│ ├── 100011.interval
│ ├── 100012.interval
│ ├── 100013.interval
│ ├── 100014.interval
│ ├── 100015.interval
│ ├── 100016.interval
│ ├── 100017.interval
│ ├── 100018.interval
│ ├── 100019.interval
│ └── 100020.interval
├── log
│ ├── badlist.txt
│ ├── log_info.txt
│ └── stdout.log
├── prosody
│ └── prosody.txt
├── result.json
└── wav
├── 100001.wav
├── 100002.wav
├── 100003.wav
├── 100004.wav
├── 100005.wav
├── 100006.wav
├── 100007.wav
├── 100008.wav
├── 100009.wav
├── 100010.wav
├── 100011.wav
├── 100012.wav
├── 100013.wav
├── 100014.wav
├── 100015.wav
├── 100016.wav
├── 100017.wav
├── 100018.wav
├── 100019.wav
└── 100020.wav
其中prosody
为文本标注信息,interval
为音素时间对齐信息,wav
为音频文件, log
为日志信息。
本工具产出的训练数据, 符合达摩院TTS训练数据标准, 可使用KAN-TTS源码或ModelScope TTS Model trainer进行训练。
目前支持中文、英文、中英混音频的标注,其他语种的支持将在后续版本中更新。
仅支持Linux环境,基于gcc4.8.5,过低版本glibc可能会出现不兼容情况
无
无
无
暂无
@inproceedings{gao2023funasr,
author={Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang},
title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
year={2023},
booktitle={INTERSPEECH},
}
@inproceedings{gao22b_interspeech,
author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={2063--2067},
doi={10.21437/Interspeech.2022-9996}
}
@INPROCEEDINGS{9747578,
author={Zhao, Shengkui and Ma, Bin and Watcharasupat, Karn N. and Gan, Woon-Seng},
booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement},
year={2022},
pages={9281-9285},
doi={10.1109/ICASSP43922.2022.9747578}}
@INPROCEEDINGS{9747230,
author={Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Aazami, Ashkan and Matusevych, Sergiy and Braun, Sebastian and Eskimez, Sefik Emre and Thakker, Manthan and Yoshioka, Takuya and Gamper, Hannes and Aichner, Robert},
booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Icassp 2022 Deep Noise Suppression Challenge},
year={2022},
pages={9271-9275},
doi={10.1109/ICASSP43922.2022.9747230}}