Real-time speech acquisition, compression and wireless transmission solution on resource-constrained embedded systems
7 viewsDOI:
https://doi.org/10.54939/1859-1043.j.mst.109.2026.35-46Keywords:
Wireless; Speech-to-text; STM32F4; Codec2; RF; Embedded Systems.Abstract
Resource-constrained embedded systems are electronic systems designed to perform specific tasks with minimal hardware and software resources. They are very popular and essential to building a compact and efficient system at a low cost. This paper presents an embedded system architecture for real-time acquisition and compression, utilising wireless transmission, for intelligent embedded devices. The platform uses an STM32F411CEU6 (ARM Cortex–M4) microcontroller, paired with an INMP441 MEMS microphone, and employs the Codec2 encoder at a rate of 3.2 kbps. An optimised algorithm based on receiver-side data and sending voice frame processing on I2S and UART interfaces, respectively, has been applied using CMSIS-DSP acceleration and computational constrained STM32F4 series and NRF24L01 modules, with COBS encoding. System operation results in real-time execution with a latency of 2.31 ms/ frame and a low power consumption of 50.23-51.7 mW at 3.3 V operation, demonstrating a good model with performance characteristics that simultaneously achieve minimal real-time transmission and low power consumption. The proposed architecture system is well-suited and potentially suitable for next-generation speech-centric applications such as responsive speech-to-text, real-time command recognition, and a compact on-device language translation module.
References
[1]. D. L. Kuhite and M. S. Madankar, “Wireless audio transmission system for real-time applications — A review”, 2017 International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, pp. 1-5, (2017). doi: 10.1109/ICISC.2017.8068680 DOI: https://doi.org/10.1109/ICISC.2017.8068680
[2]. Fathi, Inaam, Q. Ali and Abdul-Jabbar, “Real-Time Voice Transmission over Wireless Sensor Network (VoWSN) based Automatic Speech Recognition (ASR) Technique”, AL-Rafdain Engineering Journal (AREJ), vol. 24, no. 2, pp. 23-35, (2019). doi: 10.33899/rengj.2020.126441.1005 DOI: https://doi.org/10.33899/rengj.2020.126441.1005
[3]. I. Fathi, Q. I. Ali, and J. M. Abdul-Jabbar, “Design and Implementation of Real-Time Voice Streaming Evaluation Platform Over Wireless Sensor Network (VoWSN)”, 2018 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 233-238, (2018). doi: 10.1109/ICOASE.2018.8548923 DOI: https://doi.org/10.1109/ICOASE.2018.8548923
[4]. Gomathinayagam. P and S. Jayanthy, “Performance Optimization of Codec in VOIP using Raspberry Pi”, International Journal of Engineering and Manufacturing (IJEM), vol. 8, no. 2, pp. 56-65, (2018). doi: 10.5815/ijem.2018.02.06 DOI: https://doi.org/10.5815/ijem.2018.02.06
[5]. V. K. Abdrakhmanov, R. B. Salikhov and K. V. Vazhdacv, “Development of a Sound Recognition System Using STM32 Microcontrollers for Monitoring the State of Biological Objects”, 2018 XIV International Scientific-Technical Conference on Actual Problems of Electronics Instrument Engineering (APEIE), pp. 170-173, (2018). DOI: https://doi.org/10.1109/APEIE.2018.8545278
[6]. S. Wisayataksin, “An Efficient Hardware Architecture of Codec2 Low Bit-rate Speech Decoder”, 2019 5th International Conference on Engineering, Applied Sciences and Technology (ICEAST), Laos, pp. 1-4, (2019). doi: 10.1109/ICEAST.2019.8802570 DOI: https://doi.org/10.1109/ICEAST.2019.8802570
[7]. Z. Yu, B. Su, and Y. Hou, “Transplantation of Codec2 Speech Compression Algorithm Based on STM32 Processor”, Instrumentation and Equipments, vol. 10(3), pp. 210-216, (2022). DOI: 10.12677/IAE.2022.103028 DOI: https://doi.org/10.12677/IaE.2022.103028
[8]. P. Jamieson, S. Sampath Kumar, J. A. M. Nacif and R. Ferreira, “Analyzing a Low-bit rate Audio Codec - Codec2 - on an FPGA”, 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, pp. 1486-1492, (2021). doi: 10.1109/CSCI54926.2021.00065
[9]. A. A. Jaish and B. K. J. Al-Shammari, “QUALITY OF EXPERIENCE FOR VOICE OVER INTERNET PROTOCOL (VoIP)”, Wasit Journal of Engineering Sciences, Wasit, Iraq, pp. 96-105, (2023). DOI: https://doi.org/10.31185/ejuow.Vol11.Iss3.460
[10]. S. Cheshire and M. Baker, “Consistent overhead byte stuffing”, IEEE/ACM Transactions on Networking, vol. 7, no. 2, pp. 159-172, (1999). doi: 10.1109/90.769765 DOI: https://doi.org/10.1109/90.769765
[11]. J. Lin, K. Kalgaonkar, Q. He, and X. Lei, “Speech Enhancement for Low Bit Rate Speech Codec”, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, pp. 7777-7781, (2022). doi: 10.1109/ICASSP43922.2022.9746670 DOI: https://doi.org/10.1109/ICASSP43922.2022.9746670
[12]. P. Jamieson, S. Sampath Kumar, J. A. M. Nacif and R. Ferreira, “Analyzing a Low-bit rate Audio Codec - Codec2 - on an FPGA”, 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, pp. 1486-1492, (2021). doi: 10.1109/CSCI54926.2021.00065 DOI: https://doi.org/10.1109/CSCI54926.2021.00065
[13]. M. A. Syahmi Md Dzahir and K. Seng Chia, “Evaluating the Energy Consumption of ESP32 Microcontroller for Real-Time MQTT IoT-Based Monitoring System”, 2023 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), Sakheer, Bahrain, pp. 255-261, (2023). doi: 10.1109/3ICT60104.2023.10391358 DOI: https://doi.org/10.1109/3ICT60104.2023.10391358
[14]. Z. Fan, Z. Guo, Y. Lai, and J. Kim, “TSDCA-BA: An Ultra-Lightweight Speech Enhancement Model for Real-Time Hearing Aids with Multi-Scale STFT Fusion”, Applied Sciences, vol. 15, no. 15, art. no. 8183, (2025). doi: 10.3390/app15158183 DOI: https://doi.org/10.3390/app15158183
[15]. K. BhangaleMohanaprasad and K. Kothandaraman, “Survey of Deep Learning Paradigms for Speech Processing”, Wireless Personal Communications, vol. 125, no. 2, pp. 1-37, (2022). DOI: https://doi.org/10.1007/s11277-022-09640-y
[16]. T. H. Nguyen, D. N. Tran, S. Q. Dinh, and T. N. Dang, “Improving IoT system performance based on nRF2401 using Reed-Solomon code”, Journal of Science on Information and Communications Technology (JSTIC), Vietnam, no. 03 & 04 (CS.01), pp. 87-92, (2019).
[17]. Nguyen Trung Hieu, Kou Yamada, “A Novel Method for Multiple Sound Sources Localization with Low Complexity”, Advances in Electrical and Electronic Engineering, Vol. 23, No. 3, pp 173-188, (2025). DOI: 10.15598/aeee.v23i3.240708 DOI: https://doi.org/10.15598/aeee.v23i3.240708
