I was interested in neural style transfer and wondered if it is possible to apply that to voice? The input and filter voice is based on Ryerson emotion database.
The generated audio below is based on Audio texture synthesis and style transfer
Input female neutral baseline:
Emotion female angry filter:
Output female angry voice: