Home | Register | Login | Inquiries | Alerts | Sitemap |  


Advanced Search
JKM > Volume 45(3); 2024 > Article
ORIGINAL ARTICLE
J Korean Med. 2024;45(3): 193-210.         doi: https://doi.org/10.13048/jkm.24049
자연어 처리에 기반한 사상체질 치험례의 텍스트 마이닝 분석과 체질 진단을 위한 머신러닝 모델 선정
김진석1  , 박소현1  , 정로아1  , 이은수1  , 김윤서1  , 성현동2  , 유준상3,4 
1상지대학교 한의과대학 한의학과
2서강대학교 공과대학 컴퓨터공학과
3상지대학교 한의과대학 사상체질의학교실
4상지대학교 한의학연구소
 
Application of text-mining technique and machine-learning model with clinical text data obtained from case reports for Sasang constitution diagnosis: a feasibility study
Jinseok Kim1  , So-hyun Park1  , Roa Jeong1  , Eunsu Lee1  , Yunseo Kim1  , Hyundong Sung2  , and Jun-sang Yu3,4 
1Department of Korean Medicine, College of Korean Medicine, Sangji University
2Sogang Univ. Computer Science & Engineering
3Department of Sasang Constitutional Medicine, College of Korean Medicine, Sangji University
4Research Institute of Korean Medicine, Sangji University
Corresponding Author: Jun-sang Yu ,Tel: +82-33-741-9203, Email: hiruok@sangji.ac.kr
Received: August 2, 2024;  Revised: August 28, 2024.  Accepted: August 28, 2024.
ABSTRACT
Objectives: We analyzed Sasang constitution case reports using text mining to derive network analysis results and designed a classification algorithm using machine learning to select a model suitable for classifying Sasang constitution based on text data.
Methods: Case reports on Sasang constitution published from January 1, 2000, to December 31, 2022, were searched. As a result, 343 papers were selected, yielding 454 cases. Extracted texts were pretreated and tokenized with the Python-based KoNLPy package. Each morpheme was vectorized using TF-IDF values. Word cloud visualization and centrality analysis identified keywords mainly used for classifying Sasang constitution in clinical practice. To select the most suitable classification model for diagnosing Sasang constitution, the performance of five models—XGBoost, LightGBM, SVC, Logistic Regression, and Random Forest Classifier—was evaluated using accuracy and F1-Score.
Results: Through word cloud visualization and centrality analysis, specific keywords for each constitution were identified. Logistic regression showed the highest accuracy (0.839416), while random forest classifier showed the lowest (0.773723). Based on F1-Score, XGBoost scored the highest (0.739811), and random forest classifier scored the lowest (0.643421).
Conclusions: This is the first study to analyze constitution classification by applying text mining and machine learning to case reports, providing a concrete research model for follow-up research. The keywords selected through text mining were confirmed to effectively reflect the characteristics of each Sasang constitution type. Based on text data from case reports, the most suitable machine learning models for diagnosing Sasang constitution are logistic regression and XGBoost.
Keywords: Data Mining | Machine Learning | Case reports | Sasang Constitutional Medicine
TOOLS
PDF Links  PDF Links
Full text via DOI  Full text via DOI
PubReader  PubReader
Download Citation  Download Citation
Share:      
METRICS
0
Crossref
57
View
7
Download
Editorial office contact information
3F, #26-27 Gayang-dong, Gangseo-gu Seoul, 157-200 Seoul, Korea
The Society of Korean Medicine
Tel : +82-2-2658-3627   Fax : +82-2-2658-3631   E-mail : skom1953.journal@gmail.com
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
Developed in M2PI