Spectrogram Window Comparison: Cough Sound Recognition using Convolutional Neural Network


Dzikri Rahadian Fudholi(1*), Muhammad Auzan(2), Novia Arum Sari(3)

(1) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author


 Cough is one of the most common symptoms of diseases, especially respiratory diseases. Quick cough detection can be the key to the current pandemic of COVID-19. Good cough recognition is the one that uses non-intrusive tools such as a mobile phone microphone that does not disable human activities like stick sensors. To do sound-only detection, Deep Learning current best method Convolutional Neural Network (CNN) is used. However, CNN needs image input while sound input differs (one dimension rather than two). An extra process is needed, converting sound data to image data using a spectrogram. When building a spectrogram, there is a question about the best size. This research will compare the spectrogram's size, called Spectrogram Window, by the performance. The result is that windows with 4 seconds have the highest F1-score performance at 92.9%. Therefore, a window of around 4 seconds will perform better for sound recognition problems.


Spectrogram Window; Convolutional Neural Network; Cough Sound; Deep Learning

Full Text:



[1] J. Monge-Alvarez, C. Hoyos-Barceló, K. Dahal, and P. Casaseca-de-la-Higuera, “Audio-cough event detection based on moment theory,” Appl. Acoust., vol. 135, pp. 124–135, 2018.

[2] D. Fudholi and H. Suominen, "The Importance of Recommender and Feedback Features in a Pronunciation Learning Aid," 2019, pp. 83–87, doi: 10.18653/v1/w18-3711.

[3] F. Barata, K. Kipfer, M. Weber, P. Tinschert, E. Fleisch, and T. Kowatsch, "Towards device-agnostic mobile cough detection with convolutional neural networks," in 2019 IEEE International Conference on Healthcare Informatics (ICHI), 2019, pp. 1–11.

[4] B. R. Ismanto, T. M. Kusuma, and D. Anggraini, "Indonesian Music Classification on Folk and Dangdut Genre Based on Rolloff Spectral Feature Using Support Vector Machine (SVM) Algorithm," IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 1, pp. 11–20.

[5] P. R. Amalia, "Aspect-Based Sentiment Analysis on Indonesian Restaurant Review Using a Combination of Convolutional Neural Network and Contextualized Word Embedding," IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 3.

[6] S. A. H. Tabatabaei, G. Augustinov, V. Gross, K. Sohrabi, P. Fischer, and U. Koehler, "Automatic Detection and Classification of Cough Events Based on Deep Learning," Curr. Dir. Biomed. Eng., vol. 6, no. 3, pp. 322–325, 2020.

[7] J. Monge-Álvarez, C. Hoyos-Barceló, L. M. San-José-Revuelta, and P. Casaseca-de-la-Higuera, “A machine hearing system for robust cough detection based on a high-level representation of band-specific audio features,” IEEE Trans. Biomed. Eng., vol. 66, no. 8, pp. 2319–2330, 2018.

[8] Q. Zhou et al., "Cough Recognition Based on Mel-Spectrogram and Convolutional Neural Network," Front. Robot. AI, vol. 8, 2021.

[9] S. Matos, S. S. Birring, I. D. Pavord, and H. Evans, "Detection of cough signals in continuous audio recordings using hidden Markov models," IEEE Trans. Biomed. Eng., vol. 53, no. 6, pp. 1078–1083, 2006.

[10] L. Orlandic, T. Teijeiro, and D. Atienza, "The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms," Sci. Data, vol. 8, no. 1, pp. 1–10, 2021.

[11] N. Sharma et al., "Coswara--A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis," arXiv Prepr. arXiv2005.10548, 2020.

[12] A. R. Isnain, N. S. Marga, and D. Alita, "Sentiment Analysis Of Government Policy On Corona Case Using Naive Bayes Algorithm," IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 1, pp. 55–64, 2021.

[13] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265–283.

[14] B. McFee et al., "librosa: Audio and music signal analysis in python," in Proceedings of the 14th Python in science conference, 2015, vol. 8, pp. 18–25.

[15] F. Pedregosa et al., "Scikit-learn: Machine learning in Python," J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

[16] A. Gulli and S. Pal, Deep learning with Keras. Packt Publishing Ltd, 2017.

[17] D. M. W. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation," arXiv Prepr. arXiv2010.16061, 2020.

DOI: https://doi.org/10.22146/ijccs.75697

Article Metrics

Abstract views : 1173 | views : 760


  • There are currently no refbacks.

Copyright (c) 2022 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs

View My Stats1
View My Stats2