Home | Register | Login | Inquiries | Alerts | Sitemap |  


Advanced Search
JKM > Volume 46(1); 2025 > Article
Yunseo, Eunsu, Roa, Sohyun, Jinseok, Hyeonseo, and Jun-sang: Selection of Machine Learning Models for Prescription Decision-Making Based on Text Mining - Focusing on Case Studies of Single Prescriptions in Sasang Constitutional Medicine

Abstract

Objectives

We analyzed Sasang constitution case reports using text mining and designed a classification algorithm using machine learning to select a model suitable for determining Sasang constitution prescriptions based on text data.

Methods

Case reports on Sasang constitution published from January 1, 2000, to December 31, 2023, were collected. A total of 360 papers and 483 cases were identified, from which text was extracted for 253 cases. The extracted texts were preprocessed and tokenized using the Python-based KoNLPy package, and each morpheme was vectorized using TF-IDF values. To select the most suitable classification model for diagnosing Sasang constitution, the performance of five models—Random Forest Classifier, XGBoost, LightGBM, SVM, and Logistic Regression—was evaluated based on accuracy and F1-Score.

Results

The highest accuracy was achieved by Random Forest Classifier (0.57037), followed by SVM (0.544444), Logistic Regression (0.518519), LightGBM (0.481481), and XGBoost (0.474074). The F1 score was highest for Random Forest Classifier (0.528), followed by SVM (0.52039), Logistic Regression (0.500861), XGBoost (0.45866), and LightGBM (0.446349).

Conclusions

This study is the first to analyze Sasang constitution prescription decisions by applying text mining and machine learning to case reports, providing a concrete research model for follow-up studies. Based on case reports and text data, the most suitable machine learning model for determining Sasang constitution prescriptions is Random Forest Classifier.

tf-idf(t,d)=tf(t,d)×idf(t)idf(t)=log1+nd1+df(d,t)+1

Fig. 1
Study flow of Text-mining and Machine learning
jkm-46-1-70f1.gif
Fig. 2
Flow chart of literature searches and screening results
jkm-46-1-70f2.gif
Fig. 3
Performance Comparison of Different Algorithms
jkm-46-1-70f3.gif
Table 1
Data Refining Criteria
Criteria Example
Before After
Compound words Cases where a compound word is perceived as separate components Cold, Sweat Cold sweat
Hyung, Geumji, Pose Hyunggeumjipose
Cases where multiple words should be considered as a single phrase Abdominal, Bloating Abdominal bloating
Nocturnal, sleep, disorder Nocturnal sleep disorder
Synonyms Cases with the same or similar meanings but different spellings Feel dizzy, Dizziness, Lightheadedness Vertigo
Cases where a single word represents or encompasses other words Sleep disorder, Difficulty falling asleep, Nocturnal sleep disorder, Insomnia, Sleep difficulties, Difficulty falling asleep Sleep disorder
Stop words Not a key variable, and used conventionally Above-mentioned, Opinion, Usually, And, When, Time, Patient Delete
Table 2
English Word Translation Exclusion Criteria
Translation exclusion criteria Example
Words written in English in most of the research papers VAS, QSCC II
Words that represents a unit kg, cm
Name of the medicine trolac, NSAID
Table 3
Hyperparameter Settings
Algorithm Hyper Parameter Input Value
Random Forest Classifier n_estimators 50, 200, 500, 1000

XGBoost learning_rate 0.01, 0.1, 0.2
n_estimators 100, 200

LightGBM learning_rate 0.01, 0.1, 0.2
n_estimators 100, 200

Support Vector Machine C 0.1, 1, 10, 20
kernel ‘linear’, ‘rbf’, ‘sigmoid’, ‘poly’
degree 2, 3, 4

Logistic Regression C 0.1, 1, 10, 20
Table 4
Best Hyperparameter Settings
Algorithm Hyper Parameter Best Hyper Parameter Best CV F1 Score
Random Forest Classifier n_estimators 500 0.508679386

XGBoost learning_rate 0.1 0.490174846
n_estimators 200

LightGBM learning_rate 0.1 0.456298434
n_estimators 200

Support Vector Machine C 10 0.484772635
kernel ‘sigmoid’
degree 2

Logistic Regression C 20 0.482295398
Table 5
Number of Cases Based on Use of Single Prescription by Sasang Constitution
Number of case studies with a single prescription Number of case studies with compound prescriptions Total number of case studies
Taeyangin 14 4 18
Taeeumin 73 81 154
Soyangin 111 105 216
Soeumin 55 40 95
Total 253 230 483
Table 6
Frequency of Prescriptions by Sasang Constitution (Single Prescription)
Taeyangin Taeeumin Soyangin Soeumin
Mihudeungsikjang-tang 11 Yeoldahanso-tang 18 Yanggyeoksanhwa-tang 21 Gwakhyangjeonggi-san 14
Ogapijangcheok-tang 2 Cheongsimyeonja-tang 16 Hyeongbangjihwang-tang 18 Sibimigwanjung-tang 10
Yeoldahanso-tang 1 Taeumjowi-tang 9 Hyeongbangdojeok-san 18 Hyangbujapalmul-tang 3
Jowiseungcheong-tang 8 Hyeongbangsabaek-san 17 Palmulgunja-tang 3
Galgeunhaegi-tang 7 Dojeokganggi-tang 7 Seungyangikgibuja-tang 3
Gwache 3 Dokhwaljihwang-tang 7 Geopung-san 2
Joripyeowon-tang 2 Yangdokbaekho-tang 6 Hyangsayangwi-tang 2
Cheongpyesagan-tang 2 Gamsumal 5 Cheongunggyeji-tang 2
Geonyuljeotang-tang 1 Hyeongbangpaedok-san 4 Osuyubujaijung-tang 2
Mahwangjeongcheon-tang 1 Yukmijihwang-tang 1 Doksampalmulgunja-tang 2
Mankgeummunmu-tang 1 Jeoryeongchajeon-tang 1 Bojungikgi-tang 2
Seunggeumjowi-tang 1 Jihwangbaekho-tang 1 Oryeong-san 1
Seunggijowi-tang 1 Palmulgunja-tang 1 Seonghyangjeonggi-san 1
Cheongrijagam-tang 1 Sukjihwanggosam-tang 1 Seungyangikgi-tang 1
Cheonghyeolgangih-tang 1 Saenghwa-tang 1 Samgyepalmul-tang 1
Handayeolso-tang 1 Ganghwajihwang-tang 1 Doksamgwangyebujaijung-tang 1
Gamijihwang-tang 1 Dangguibaekhaoogwanjung-tang 1
unggihyangso-san 1
Gunggichiseup-tang 1
Gwangyebujaijung-tang 1
Hwanggyeogyeji-tang 1
14 73 111 55
Table 7
Average Accuracy, F1-score, Precision, and Recall of Algorithms
Algorithm Average
Accuracy
Average
F1 score
Average
Precision
Average
Recall
Random Forest Classifier 0.57037 0.528 0.548862 0.557407
XGBoost 0.474074 0.45866 0.477778 0.482407
LightGBM 0.481481 0.446349 0.454074 0.478704
Support Vector Machine 0.544444 0.52039 0.548003 0.546296
Logistic Regression 0.518519 0.500861 0.533915 0.52037

참고문헌

1. Chang, JY. (2013). A Study on Research Trends of Graph-Based Text Representations for Text Mining. The Journal of the Institute of Internet Broadcasting and Communication, 13(5), 37-47. https://doi.org/10.7236/JIIBC.2013.13.5.37
crossref

2. Koo, HI. (2018). AI and Deep Learning Trends. The Korean Institute of Electrical Engineers, 67(7), 7-12.


3. Kim, NG., Lee, DH., Choi, HC., & Wong, WXS. (2017). Investigations on Techniques and Applications of Text Analytics. The Journal of Korean Institute of Communications and Information Sciences, 42(2), 471-492. https://doi.org/10.7840/kics.2017.42.2.471
crossref

4. Eum, SW. (2020). A Study on Analysis of consumer perception of YouTube advertising using text mining. Management & Information Systems Review, 39(2), 181-193. https://doi.org/10.29214/DAMIS.2020.39.2.011


5. Jung, M., Lee, YL., Yoo, CM., Kim, JW., & Chung, JE. (2019). An exploratory study on consumers’ responses to mobile payment service focused on Samsung Pay. Journal of Digital Convergence, 17(1), 9-27. https://doi.org/10.14400/JDC.2019.17.1.009


6. Choi, HJ. (2022). Comparison of Machine Learning Methods for a Prediction of Match Outcomes in Soccer. The Journal of the Korean Society of Measurement and Evaluation in Physical Education and Sports Sceince, 24(4), 81-91. http://doi.org/10.21797/ksme.2022.24.4.081


7. Park, HS., Lee, MS., Hwang, SJ., & Oh, SY. (2016). TF-IDF Based Association Rule Analysis System for Medical Data. The Transactions of the Korea Information Processing Society, 5(3), 145-154. https://doi.org/10.3745/KTSDE.2016.5.3.145
crossref

8. Cho, SZ., & Kang, SH. (2016). Industrial Applications of Machine Learning (Artificial Intelligence). Industrial Engineering Magazine, 23(2), 34-38.


9. Jang, DY., Ha, YS., Lee, CY., & Kim, CE. (2020). Analysis of Symptoms-Herbs Relationships in Shanghanlun Using Text Mining Approach. Journal of Physiology & Pathology in Korean Medicine, 34(4), 159-169. https://doi.org/10.15188/kjopp.2020.08.34.4.159
crossref

10. Bae, HJ., Kim, CE., Lee, CY., Shin, SW., & Kim, JH. (2018). Investigation of the Possibility of Research on Medical Classics Applying Text Mining - Focusing on the Huangdi’s Internal Classic -. Journal of Korean Medical classics, 31(4), 27-46. https://doi.org/10.14369/jkmc.2018.31.4.027


11. Yea, SJ. (2023). Analysis of Papers on Side-Effects Caused by Herbal Medicine Prescription Using Text Mining: Leveraging PubMed Articles. Journal of Knowledge Information Technology and Systems, 18(3), 501-511. https://doi.org/10.34163/jkits.2023.18.3.001


12. Yea, SJ., & Kim, SH. (2022). An Analysis of the Research Trends of Five Traditional Korean Medicine Prescriptions Using Text Mining: Leveraging PubMed Articles. Journal of Knowledge Information Technology and Systems, 17(5), 815-823. https://doi.org/10.34163/jkits.2022.17.5.003


13. Kim, JS., Park, SH., Jeong, RA., Lee, ES., Kim, YS., Sung, HD., & Yu, JS. (2024). Application of text-mining technique and machine-learning model with clinical text data obtained from case reports for Sasang constitution diagnosis: a feasibility study. The Journal of Korean Medicine, 45(3), 193-210. http://dx.doi.org/10.13048/jkm.24049
crossref

14. Park, MS., Kim, MH., Park, SY., Choi, IH., & Kim, CE. (2022). Individualized Diagnosis and Prescription in Traditional Medicine: Decision-Making Process Analysis and Machine Learning-Based Analysis Tool Development. The American Journal of Chinese Medicine, 50(7), 1827-1844. https://doi.org/10.1142/S0192415X2250077X
crossref pmid

15. Cho, IH., Kwon, JH., Lee, EJ., & Lee, JH. (2020). A Study on Clinical Status for Development of Clinical Practice Guidelines for Sasang Constitutional Medicine Symptomatology. Journal of Sasang Constitution and Immune Medicine, 32(4), 29-44. https://doi.org/10.7730/JSCM.2020.32.4.29


16. Jeon, TH. (2022). A linguistic study on tokenization methods for Korean text. Language Facts and Perspectives, 55, 309-354. https://doi.org/10.20988/lfp.2022.55..309


17. Park, HJ. (2020). Trend Analysis of Korea Papers in the Fields of ‘Artificial Intelligence’, ‘Machine Learning’ and ‘Deep Learning. Journal of Korea Institute of Information, Electronics, and Communication Technology, 13, 283-292. http://doi.org/10.17661/jkiiect.2020.13.4.283


18. Lee, JH., Lee, MB., & Kim, JW. (2019). A study on Korean language processing using TF-IDF. The Journal of Information Systems, 28(3), 105-121. http://dx.doi.org/10.5859/KAIS.2019.28.3.105


19. Park SE., Gang JY.(2022). Python Text Mining Complete Guide. 1st Edition. Gyeonggi. Wikibooks;p. 322


20. Hong, KH. (2020). A Predictive Model for Suicidal Ideation of Adolescents Using Random Forests Machine Learning Algorithm. Korean Journal of Social Welfare, 72(3), 157-180. https://doi.org/10.20970/kasw.2020.72.3.007)
crossref

21. Bae, JS., & Kim, SB. (2021). Predictions of COVID-19 in Korea Using Machine Learning Models. Journal of the Korean Institute of Industrial Engineers, 47(3), 272-279. https://doi.org/10.7232/JKIIE.2021.47.3.272
crossref

22. Hah, DW., Kim, YM., & Ahn, JJ. (2019). A study on KOSPI 200 direction forecasting using XGBoost model. Journal of the Korean Data And Information Science Sociaty, 30(3), 655-669. http://dx.doi.org/10.7465/jkdi.2019.30.3.655
crossref

23. Hwang, YJ., Son, SE., & Lee, ZK. (2024). Prediction of Stock Returns from News Article’s Recommended Stocks Using XGBoost and LightGBM Models. Journal of The Korea Society of Computer and Information, 29(2), 51-59. http://doi.org/10.9708/jksci.2024.29.02.051


24. Park, SY., & Chung, HW. (2020). Exploring Variables Affecting Career Decision of Middle School Students: An Application of Machine Learning Approaches. Asian Journal of Education, 21(3), 727-753. http://doi.org/10.15753/aje.2020.09.21.3.727
crossref

25. Kim, PS., & Lee, SH. (2023). Application of AI Machine Learning Algorithms to Predict Korea Ladies Professional Golf Association (KLPGA) Players Top 10 Ranking: A Sports Analytics Perspective. Korean Journal of Sport Management, 28(4), 51-66. http://doi.org/10.31308/KSSM.28.4.51
crossref

26. Jung, MH., & Kwon, WH. (2021). Present Status and Future of AI-based Drug Discovery. Journal of the Korea Institute Of Information and Communication Engineering, 25(12), 1797-1808. http://doi.org/10.6109/jkiice.2021.25.12.1797


27. Lee JH.(2022). Korean Medicine Clinical Practice Guideline for Sasang(Four) constitutional medicine patterns. Korea. The Society of Sasang Constitutional Medicine.


28. National Institute of Korean Medicine Development. 2017 Korean Medicine Health Service Utilization and Consumption Survey. Seoul. (2018.


29. Lee, JH., & Lee, HH. (2019). Selecting Sasang-Type classification model using machine learning and designing the service flow. Journal of Digital Contents Society, 20(2), 321-327. http://dx.doi.org/10.9728/dcs.2019.20.2.321
crossref

30. Rácz, A., Bajusz, D., & Héberger, K. (2021). Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules, 26(4), 1111. https://doi.org/10.3390/molecules26041111
crossref pmid pmc

31. Oh, SM., & Park, MS. (2024). Machine Learning-based Phishing Website Detection Model. The Journal of the Convergence on Culture Technology, 10(4), 575-580. http://dx.doi.org/10.17703/JCCT.2024.10.4.575


32. Han, KS., & Park, SS. (2002). Research about The Discourse on The Discourse on The Medications and Prescriptions on The ShinChukBon DongyiSuseBowon. Journal of Sasang Constitution and Immune Medicine, 14(3), 52-73.


33. Department of Sasang Constitutional Medicine, College of Korean Medicine. (2004). Sasang constitutional medicine. Jipmoon, 53.


34. Kang, MS., Oh, JW., Lee, HR., & Lee, JH. (2019). Patient Group Study to Improve the Accuracy of QSCC II+. Journal of Sasang Constitution and Immune Medicine, 31(3), 48-65. http://doi.org/10.7730/JSCM.2019.31.3.48


TOOLS
PDF Links  PDF Links
Full text via DOI  Full text via DOI
PubReader  PubReader
Download Citation  Download Citation
  Print
Share:      
METRICS
0
Crossref
240
View
25
Download
Editorial office contact information
3F, #26-27 Gayang-dong, Gangseo-gu Seoul, 157-200 Seoul, Korea
The Society of Korean Medicine
Tel : +82-2-2658-3627   Fax : +82-2-2658-3631   E-mail : skom1953.journal@gmail.com
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
Developed in M2PI