• Journal of Terahertz Science and Electronic Information Technology
  • Vol. 21, Issue 9, 1163 (2023)
CHENJunqi and ZHANG Xiaolei
Author Affiliations
  • [in Chinese]
  • show less
    DOI: 10.11805/tkyda2021247 Cite this Article
    CHENJunqi, ZHANG Xiaolei. Multi-agent ad-hoc speech recognition[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(9): 1163 Copy Citation Text show less

    Abstract

    Speech perception is an important part of unmanned systems. Most of the existing work focuses on the speech perception of a single agent, which is affected by factors such as noise and reverberation, and the performance has an upper limit. Therefore, it is necessary to study multi-agent speech perception, and improve perception performance through multi-agent self-organization and mutual cooperation. A multi-agent ad-hoc speech system is proposed under the assumption that each agent outputs a channel of speech stream. The multi-agent ad-hoc speech system aims to comprehensively utilize all channels to improve perception performance. Taking the speech recognition as an example, a channel selection method that can handle large-scale multi-agent speech recognition is proposed. Specifically, an end-to-end speech recognition stream attention mechanism based on Sparsemax operator is proposed to force the channel weights of noisy channels to zero, and make the stream attention bear the function of channel selection. Nevertheless, Sparsemax would punish the weights of many channels to zero harshly. Therefore, Scaling Sparsemax is proposed, which punishes the channels mildly by setting the weights of strong noise channels to zero only. At the same time, a multilayer stream attention structure is proposed to effectively reduce computational complexity. Experimental results in an unmanned system environment with up to 30 agents under the conformer speech recognition architecture show that the Word Error Rate(WER) of the proposed Scaling Sparsemax is lower than that of Softmax by over 30% on simulation data sets, and by over 20% on semi-real data sets, in test scenarios with mismatched channel numbers.
    CHENJunqi, ZHANG Xiaolei. Multi-agent ad-hoc speech recognition[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(9): 1163
    Download Citation