TY - GEN
T1 - Towards direct speech synthesis from ECoG
T2 - 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2016
AU - Herff, Christian
AU - Johnson, Garett
AU - Diener, Lorenz
AU - Shih, Jerry
AU - Krusienski, Dean
AU - Schultz, Tanja
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/10/13
Y1 - 2016/10/13
N2 - Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.
AB - Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.
UR - http://www.scopus.com/inward/record.url?scp=85009089427&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009089427&partnerID=8YFLogxK
U2 - 10.1109/EMBC.2016.7591004
DO - 10.1109/EMBC.2016.7591004
M3 - Conference contribution
AN - SCOPUS:85009089427
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
SP - 1540
EP - 1543
BT - 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 August 2016 through 20 August 2016
ER -