AbstractObjectivesWe analyzed Sasang constitution case reports using text mining and designed a classification algorithm using machine learning to select a model suitable for determining Sasang constitution prescriptions based on text data.
MethodsCase reports on Sasang constitution published from January 1, 2000, to December 31, 2023, were collected. A total of 360 papers and 483 cases were identified, from which text was extracted for 253 cases. The extracted texts were preprocessed and tokenized using the Python-based KoNLPy package, and each morpheme was vectorized using TF-IDF values. To select the most suitable classification model for diagnosing Sasang constitution, the performance of five models—Random Forest Classifier, XGBoost, LightGBM, SVM, and Logistic Regression—was evaluated based on accuracy and F1-Score.
ResultsThe highest accuracy was achieved by Random Forest Classifier (0.57037), followed by SVM (0.544444), Logistic Regression (0.518519), LightGBM (0.481481), and XGBoost (0.474074). The F1 score was highest for Random Forest Classifier (0.528), followed by SVM (0.52039), Logistic Regression (0.500861), XGBoost (0.45866), and LightGBM (0.446349).
ConclusionsThis study is the first to analyze Sasang constitution prescription decisions by applying text mining and machine learning to case reports, providing a concrete research model for follow-up studies. Based on case reports and text data, the most suitable machine learning model for determining Sasang constitution prescriptions is Random Forest Classifier.
참고문헌1. Chang, JY. (2013). A Study on Research Trends of Graph-Based Text Representations for Text Mining. The Journal of the Institute of Internet Broadcasting and Communication, 13(5), 37-47. https://doi.org/10.7236/JIIBC.2013.13.5.37
![]() 2. Koo, HI. (2018). AI and Deep Learning Trends. The Korean Institute of Electrical Engineers, 67(7), 7-12.
3. Kim, NG., Lee, DH., Choi, HC., & Wong, WXS. (2017). Investigations on Techniques and Applications of Text Analytics. The Journal of Korean Institute of Communications and Information Sciences, 42(2), 471-492. https://doi.org/10.7840/kics.2017.42.2.471
![]() 4. Eum, SW. (2020). A Study on Analysis of consumer perception of YouTube advertising using text mining. Management & Information Systems Review, 39(2), 181-193. https://doi.org/10.29214/DAMIS.2020.39.2.011
5. Jung, M., Lee, YL., Yoo, CM., Kim, JW., & Chung, JE. (2019). An exploratory study on consumers’ responses to mobile payment service focused on Samsung Pay. Journal of Digital Convergence, 17(1), 9-27. https://doi.org/10.14400/JDC.2019.17.1.009
6. Choi, HJ. (2022). Comparison of Machine Learning Methods for a Prediction of Match Outcomes in Soccer. The Journal of the Korean Society of Measurement and Evaluation in Physical Education and Sports Sceince, 24(4), 81-91. http://doi.org/10.21797/ksme.2022.24.4.081
7. Park, HS., Lee, MS., Hwang, SJ., & Oh, SY. (2016). TF-IDF Based Association Rule Analysis System for Medical Data. The Transactions of the Korea Information Processing Society, 5(3), 145-154. https://doi.org/10.3745/KTSDE.2016.5.3.145
![]() 8. Cho, SZ., & Kang, SH. (2016). Industrial Applications of Machine Learning (Artificial Intelligence). Industrial Engineering Magazine, 23(2), 34-38.
9. Jang, DY., Ha, YS., Lee, CY., & Kim, CE. (2020). Analysis of Symptoms-Herbs Relationships in Shanghanlun Using Text Mining Approach. Journal of Physiology & Pathology in Korean Medicine, 34(4), 159-169. https://doi.org/10.15188/kjopp.2020.08.34.4.159
![]() 10. Bae, HJ., Kim, CE., Lee, CY., Shin, SW., & Kim, JH. (2018). Investigation of the Possibility of Research on Medical Classics Applying Text Mining - Focusing on the Huangdi’s Internal Classic -. Journal of Korean Medical classics, 31(4), 27-46. https://doi.org/10.14369/jkmc.2018.31.4.027
11. Yea, SJ. (2023). Analysis of Papers on Side-Effects Caused by Herbal Medicine Prescription Using Text Mining: Leveraging PubMed Articles. Journal of Knowledge Information Technology and Systems, 18(3), 501-511. https://doi.org/10.34163/jkits.2023.18.3.001
12. Yea, SJ., & Kim, SH. (2022). An Analysis of the Research Trends of Five Traditional Korean Medicine Prescriptions Using Text Mining: Leveraging PubMed Articles. Journal of Knowledge Information Technology and Systems, 17(5), 815-823. https://doi.org/10.34163/jkits.2022.17.5.003
13. Kim, JS., Park, SH., Jeong, RA., Lee, ES., Kim, YS., Sung, HD., & Yu, JS. (2024). Application of text-mining technique and machine-learning model with clinical text data obtained from case reports for Sasang constitution diagnosis: a feasibility study. The Journal of Korean Medicine, 45(3), 193-210. http://dx.doi.org/10.13048/jkm.24049
![]() 14. Park, MS., Kim, MH., Park, SY., Choi, IH., & Kim, CE. (2022). Individualized Diagnosis and Prescription in Traditional Medicine: Decision-Making Process Analysis and Machine Learning-Based Analysis Tool Development. The American Journal of Chinese Medicine, 50(7), 1827-1844. https://doi.org/10.1142/S0192415X2250077X
![]() ![]() 15. Cho, IH., Kwon, JH., Lee, EJ., & Lee, JH. (2020). A Study on Clinical Status for Development of Clinical Practice Guidelines for Sasang Constitutional Medicine Symptomatology. Journal of Sasang Constitution and Immune Medicine, 32(4), 29-44. https://doi.org/10.7730/JSCM.2020.32.4.29
16. Jeon, TH. (2022). A linguistic study on tokenization methods for Korean text. Language Facts and Perspectives, 55, 309-354. https://doi.org/10.20988/lfp.2022.55..309
17. Park, HJ. (2020). Trend Analysis of Korea Papers in the Fields of ‘Artificial Intelligence’, ‘Machine Learning’ and ‘Deep Learning. Journal of Korea Institute of Information, Electronics, and Communication Technology, 13, 283-292. http://doi.org/10.17661/jkiiect.2020.13.4.283
18. Lee, JH., Lee, MB., & Kim, JW. (2019). A study on Korean language processing using TF-IDF. The Journal of Information Systems, 28(3), 105-121. http://dx.doi.org/10.5859/KAIS.2019.28.3.105
19. Park SE., Gang JY.(2022). Python Text Mining Complete Guide. 1st Edition. Gyeonggi. Wikibooks;p. 322
20. Hong, KH. (2020). A Predictive Model for Suicidal Ideation of Adolescents Using Random Forests Machine Learning Algorithm. Korean Journal of Social Welfare, 72(3), 157-180. https://doi.org/10.20970/kasw.2020.72.3.007)
![]() 21. Bae, JS., & Kim, SB. (2021). Predictions of COVID-19 in Korea Using Machine Learning Models. Journal of the Korean Institute of Industrial Engineers, 47(3), 272-279. https://doi.org/10.7232/JKIIE.2021.47.3.272
![]() 22. Hah, DW., Kim, YM., & Ahn, JJ. (2019). A study on KOSPI 200 direction forecasting using XGBoost model. Journal of the Korean Data And Information Science Sociaty, 30(3), 655-669. http://dx.doi.org/10.7465/jkdi.2019.30.3.655
![]() 23. Hwang, YJ., Son, SE., & Lee, ZK. (2024). Prediction of Stock Returns from News Article’s Recommended Stocks Using XGBoost and LightGBM Models. Journal of The Korea Society of Computer and Information, 29(2), 51-59. http://doi.org/10.9708/jksci.2024.29.02.051
24. Park, SY., & Chung, HW. (2020). Exploring Variables Affecting Career Decision of Middle School Students: An Application of Machine Learning Approaches. Asian Journal of Education, 21(3), 727-753. http://doi.org/10.15753/aje.2020.09.21.3.727
![]() 25. Kim, PS., & Lee, SH. (2023). Application of AI Machine Learning Algorithms to Predict Korea Ladies Professional Golf Association (KLPGA) Players Top 10 Ranking: A Sports Analytics Perspective. Korean Journal of Sport Management, 28(4), 51-66. http://doi.org/10.31308/KSSM.28.4.51
![]() 26. Jung, MH., & Kwon, WH. (2021). Present Status and Future of AI-based Drug Discovery. Journal of the Korea Institute Of Information and Communication Engineering, 25(12), 1797-1808. http://doi.org/10.6109/jkiice.2021.25.12.1797
27. Lee JH.(2022). Korean Medicine Clinical Practice Guideline for Sasang(Four) constitutional medicine patterns. Korea. The Society of Sasang Constitutional Medicine.
28. National Institute of Korean Medicine Development. 2017 Korean Medicine Health Service Utilization and Consumption Survey. Seoul. (2018.
29. Lee, JH., & Lee, HH. (2019). Selecting Sasang-Type classification model using machine learning and designing the service flow. Journal of Digital Contents Society, 20(2), 321-327. http://dx.doi.org/10.9728/dcs.2019.20.2.321
![]() 30. Rácz, A., Bajusz, D., & Héberger, K. (2021). Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules, 26(4), 1111. https://doi.org/10.3390/molecules26041111
![]() ![]() ![]() 31. Oh, SM., & Park, MS. (2024). Machine Learning-based Phishing Website Detection Model. The Journal of the Convergence on Culture Technology, 10(4), 575-580. http://dx.doi.org/10.17703/JCCT.2024.10.4.575
32. Han, KS., & Park, SS. (2002). Research about The Discourse on The Discourse on The Medications and Prescriptions on The ShinChukBon DongyiSuseBowon. Journal of Sasang Constitution and Immune Medicine, 14(3), 52-73.
33. Department of Sasang Constitutional Medicine, College of Korean Medicine. (2004). Sasang constitutional medicine. Jipmoon, 53.
34. Kang, MS., Oh, JW., Lee, HR., & Lee, JH. (2019). Patient Group Study to Improve the Accuracy of QSCC II+. Journal of Sasang Constitution and Immune Medicine, 31(3), 48-65. http://doi.org/10.7730/JSCM.2019.31.3.48
|
|