Pilot Development of a ‘Clinical Performance Examination (CPX) Practicing Chatbot’ Utilizing Prompt Engineering

Article information

J Korean Med. 2024;45(1):200-212
Publication date (electronic) : 2024 March 1
doi : https://doi.org/10.13048/jkm.24013
1Department of Physiology, College of Korean Medicine, Gachon University
2Division of Humanities and Social Medicine, School of Korean Medicine, Pusan National University
3Department of Sasang Constitutional Medicine, College of Korean Medicine, Gachon University
Correspondence to: Chang-Eop Kim, Tel: +82-10-4186-2985, E-mail: eopchang@gachon.ac.kr
Correspondence to: Ji-Hwan Kim, Tel: +82-10-7159-4375, E-mail: jani77@gachon.ac.kr
Received 2024 February 5; Revised 2024 February 13; Accepted 2024 February 16.

Abstract

Objectives

In the context of competency-based education emphasized in Korean Medicine, this study aimed to develop a pilot version of a CPX (Clinical Performance Examination) Practicing Chatbot utilizing large language models with prompt engineering.

Methods

A standardized patient scenario was acquired from the National Institute of Korean Medicine and transformed into text format. Prompt engineering was then conducted using role prompting and few-shot prompting techniques. The GPT-4 API was employed, and a web application was created using the gradio package. An internal evaluation criterion was established for the quantitative assessment of the chatbot’s performance.

Results

The chatbot was implemented and evaluated based on the internal evaluation criterion. It demonstrated relatively high correctness and compliance. However, there is a need for improvement in confidentiality and naturalness.

Conclusions

This study successfully piloted the CPX Practicing Chatbot, revealing the potential for developing educational models using AI technology in the field of Korean Medicine. Additionally, it identified limitations and provided insights for future developmental directions.

Fig. 1

Flowchart for developing CPX Practicing Chatbot. CPX; clinical performance exam, LLM; large language model.

Overview of CPX Practicing Chatbot Prompt.

Examples of CPX Practicing Chatbot Screens

Examples of five criteria on a chatbot evaluation form

References

1. Gwak D, Kim M, Kwon J, Yin C. Trends in Korean Medical education research from 2003 to 2022. The Journal of Korean Medical History 2023;36:89–98.
2. Jeong S-H, Kim J-P, Kang Y-J, Jeong HI, Kim KH. A Survey of Recognitions and Satisfaction with Education in Traditional Korean Medicine. Journal of Society of Preventive Korean Medicine 2020;24:49–56.
3. Cho E, Kim Y, Ha J, Park J, Jung H-J. Developing a manual for clinical practice on real patients in Korean Medicine. J Kor Med Edu 2023;1:15–22.
4. Han SY, Lee SH, Chae H. Developing a best practice framework for clinical competency education in the traditional East-Asian medicine curriculum. BMC Med Educ 2022;:22.
5. Lim C, Han H, Hong J, Kang Y. 2016 Competency Modeling for Doctor of Korean Medicine & Application Plans. Journal of Korean Medicine 2016;37:101–13.
6. Han C-Y, Kang D-W, Park J-G, Kim B-H, Kim K-S, Kim Y-B, et al. An Analysis of Clerkship Satisfaction in College of Korean Medicine : Focusing on Doctor-patient Role-play and mock CPX. J Korean Med Ophthalmol Otolaryngol Dermatol 2020;33:12–24.
7. Jo H, Min S. The current status and future operations of Clinical Performance Evaluation (CPX) in the nationwide colleges (graduate schools) of Traditional Korean Medicine. The Journal of Korean Medical History 2020;33:9–21.
8. Jo H-J, Roh J-D, Sung HK, Park J-S. A Survey on Students’ Perception of Clinical Performance Examination (CPX) in College of Korean Medicine Using Student Standardized Patients. Journal of Society of Preventive Korean Medicine 2020;24:1–13.
9. Park S, Kim C. Enhancing Korean Medicine Education with Large Language Models: Focusing on the Development of Educational Artificial Intelligence. Journal of Physiology & Pathology in Korean Medicine 2023;37:134–8.
10. Chen B, Zhang Z, Langrené N, Zhu S. Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review 2023;
11. Renze M, Guven E. The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models 2024;
12. Institute of Medical Education and Training. Revised Clinical Diagnosis 2nd editionth ed. Seoul: Seoul National University Press; 2015.
13. Bae H, Lee S, Lee CY, Kim CE. A Novel Framework for Understanding the Pattern Identification of Traditional Asian Medicine From the Machine Learning Perspective. Front Med (Lausanne) 2022;8:763533.
14. Jang D, Yun T-R, Lee C-Y, Kwon Y-K, Kim C-E. GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors. PLOS Digital Health 2023;2:e0000416.
15. Banerjee D, Singh P, Avadhanam A, Srivastava S. Benchmarking LLM powered Chatbots: Methods and Metrics 2023;

Article information Continued

Fig. 1

Flowchart for developing CPX Practicing Chatbot. CPX; clinical performance exam, LLM; large language model.

Table 1

Overview of CPX Practicing Chatbot Prompt.

Contents
Instruction 1
  • - Explanation of the Role of the CPX Practicing Chatbot (Role prompting)

  • - Brief description of the contents and usage methods of Context 1 and Context 2.

Context 1
  • - Standardized Patient Scenario (Chatting Format)

Context 2
  • - Five Examples of Q&A between the Chatbot and Student Doctor (Few-shot Prompting)

Instruction 2
  • - Eight Guidelines for Appropriate Responses

Instruction 3
  • - Five Procedures (Steps) to Refer to for Appropriate Responses

Instruction 4
  • - Reminder of the Role of the CPX Practicing Chatbot

  • - Request for a summary of the situational instructions to initiate the mock consultation.

Total Token Count 3,853 tokens

Table 2

Examples of CPX Practicing Chatbot Screens

Chatbot Screen Example 1. Scenario Guidelines Chatbot Screen Example 2. Dialogue Window
A page that guides scenario guidelines and precautions for using the chatbot to practice CPX. An example of actual usage where the user is practicing CPX.

Table 3

Examples of five criteria on a chatbot evaluation form

correctness hallucination confidentiality naturalness compliance
Q1 Hello, patient. I’m student KMD Jundong Kim.
It must have been hot and tough getting here, right?
(“안녕하세요 환자분. 학생 한의사 김준동이라고 합니다.
오시느라 덥고 힘드셨죠?”)
1 0 1 1 1 Chatbot No, it’s okay. I came to see you, student KMD.
(“아니요, 괜찮아요. 학생 한의사님 만나러 왔으니까요.”)
Q2 Could you please tell me your name and date of birth?
(“환자분 성함과 생년월일을 말씀해주세요.”)
1 0 1 1 1 Chatbot Hi, my name is Park Minyoung. I was born on June 3, 1976.
(“네 안녕하세요. 이름은 박민영이구요. 생년월일은 1976 년 6 월 3 일입니다.”)
Q3 Alright. May I ask what brings you in today?
(“예. 혹시 어디가 불편해서 오셨나요?”)
1 0 1 1 1 Chatbot I recently had a health checkup at work, and it showed that my blood pressure was high.
(“얼마 전 직장에서 건강검진을 했는데, 혈압이 높게 나와서 왔습니다.”)
Q4 Can you tell me more about it?
(“예 조금 더 자세히 말씀해주시겠어요?”)
1 0 0 1 1 Chatbot When I measured it at home recently, my blood pressure was often high too.
Especially in the last six months, I’ve been under a lot of stress at work. I’ve been having headaches and dizziness for the past month, so I came here.
(“얼마 전에 집에서 측정해 볼 때도 혈압이 종종 높아요
특히 최근 6 개월동안 직장에서 스트레스가 심했거든요.
한달 전부터는 머리도 아프고 어지러워서 왔어요.”;)
Q5 Have you ever measured your blood pressure at home, not just at the hospital?
(“병원이 아닌 집에서도 혈압을 재본 적 있습니까?”)
1 0 1 1 1 Chatbot Yes, I’ve occasionally checked it at home with a blood pressure monitor, and it’s generally high, usually in the range of 130 to 140 mmHg.
(“네, 가끔씩 집에서 혈압기로 측정해봤는데요, 대체적으로 130~140mmHg 범위로 높게 나옵니다.”)

The actual conversations with the chatbot were conducted in Korean, and these real Korean conversations are indicated within quotation marks (" "). In instances where the scenario was excessively disclosed to a student KMD in bold text, it was considered a breach of confidentiality, resulting in a score of 0 being awarded.