[음성신호처리16]

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

공부 정리 블로그

[음성신호처리16] - TTS2 본문

대학원 수업/음성신호

[음성신호처리16] - TTS2

따옹 2023. 6. 16. 21:26

Concatenative Methods(2)

• TD-PSOLA(time-domain pitch synchronous overlap and add)

wave form 자체를 잘라서 이어붙임

▫ Diphones are concatenated pitch-synchronously.

Diphones 방식 그대로 이용

▫ The alignment to pitch periods

 Vary the timing of repeats for each waveform type.

▫ Simple but discontinuous at the overlap points

잘라서 이어붙이는 부분에서 약간 어색한 소리가 날 염려가 있음

-> 이런 부분들을 완화를 시키려면 가공

• LP-PSOLA

▫ Store linear predictive coefficients to represent a segment.

linear predictive coefficients을 가지고 음성 신호를 deconvolution -> excitation siganl 생성(discontinuous)

▫ Provide prosodic modifications by operating on the excitation signal.

▫ Linear smoothing is applied to the LP coefficients.

LP coefficient 를 이용하여 linear smoothing -> 불연속성 제거된 음성신호 생성

▫ In its simplest form, the input is a voiced-unvoiced choice.

▫ MPLPC-, CELP-, and RELP-PSOLA

https://sanghyu.tistory.com/41

LPC(Linear Prediction Coding, 선형예측부호화)와 formant estimation

LPC에 대한 소개 음성신호처리에서 강력한 음성분석기법중 하나가 바로바로 Linear predictive analysis이다. 이 기법은 음성의 생성모델의 파라미터 (그 중 vocal tract 필터)를 예측하는 좋은 방법이다.(

sanghyu.tistory.com

An Example of TD -PSOLA

Concatenative Methods (3)

• Sinusoidal (harmonic) modeling

입력신호를 sinusoide합으로 가정,

▫ Combination of sinusoidal signals

▫ Accomplished by taking a DFT of the segment

segment에 대해 DFT -> 주파수 성분 -> 이어붙임

▫ Voiced signals : harmonics of the fundamental frequency

pitch frequency가 있으므로 그걸 이용

▫ Unvoiced sounds : a group of sinusoids (not necessarily harmonic)

마땅한 게 없으므로 overㅣap 해서 이어붙임

• ABSOLA(analysis-by-synthesis overlap add(위의 것과 동일하지만 analysis-by-synthesis 이용))

▫ A recent modification of the sinusoidal approach

▫ Smoothing between concatenated segments using representations of the spectral envelope

• MBR-PSOLA(multiband resynthesis PSOLA)

▫ A TD-PSOLA data base is analyzed and resynthesized for voiced frames using a multiband analysis.

밴드별 대역을 쪼개고 그걸 이어붙인 뒤, 전체 주파수와 연결해서 붙임

cf)TDP 전 대역 waveform

▫ Better concatenation properties than TD-PSOLA

한 대역에서 불연속성이 생겨도, 다른 대역의 신호로 희석되는 결과를 만듦

Diphone Versus Formant Synthesis

• Diphone synthesis

▫ Uses real speech.

▫ Much less computation

▫ Much easier to create the synthesizer

▫ Ignores allophonic variations except for extreme cases.

여러개의 diphone을 녹음하게 되면 동일한 seq를 가지고 있지만 발음이 다양한 것들으 커버할 수 있게 됨

 Multiple diphones can be recorded for a particular phoneme pair.

• Formant synthesis

▫ Uses rules to convert phonemes into sound.

phonemes -> sound 과정 거쳐야 함

▫ Very hard to create the synthesizer

▫ Each phoneme is created independently and as a function of its entire context.

이 룰을 통해 formant를 합성하므로, 실제와 비슷한 소리를 만들어 낼 수 있음

▫ Allophonic rules can easily be included if they are known.

Speculation

• Synthesized speech quality

▫ Still not as good as that of natural speech

지금은 사람이 식별할 수 없을 정도로 감

▫ Suboptimal technique : the use of segment instances

• Natural speech

▫ Corpus-based models of speech segments

 Derive parameters from a large speech corpus so that speech waveforms could be generated based on statistical properties.

• Generating good prosodic contours (pitch, duration, and amplitude)를 잘 다뤄야 실제와 같은 음성 합성 가능

▫ A critical aspect of the process although we have focused on the phonetic and spectral aspects.

▫ Develop better statistical models for these prosodic quantities

'대학원 수업 > 음성신호' 카테고리의 다른 글

[음성신호처리13] - vocoder, CELP coding (0)	2023.06.15
[신호처리-15]-TTS (음성 합성) (0)	2023.05.31
[음성신호처리14] - CELP coding 여러 기법들 (0)	2023.05.31
[음성신호처리13]-vocoders;voice coding (0)	2023.05.17
[음성신호처리12]-pitch (0)	2023.05.17

'대학원 수업/음성신호' Related Articles