Named Entity Recognition (NER) is a fundamental task in information extraction that locates the mentions of named entities and classifies them (e.g., person, organization and location) in unstructured texts. The NER task has traditionally been solved as a sequence labeling problem, where entity boundaries and category labels are jointly predicted. Various methods have been proposed to tackle this research problem, including Hidden Markov Models (HMM) (Ponomareva et al., 2007), Maximum Entropy Markov Models (MEMM) (Chieu and Ng, 2003) and Conditional Random Field (CRF) (Wei et al., 2015). Recently, neural networks have been shown to achieve impressive results. The current state-of-the-art for English NER has been achieved by using LSTM (Long Short-Term Memory)- CRF based networks (Chiu and Nichols, 2016; Lample et al., 2016; Ma and Hovy, 2016; Liu et al., 2018).
Chinese NER is more difficult to process than English NER. Chinese language is logographic and provides no conventional features like capitalization. In addition, due to a lack of delimiters between characters, Chinese NER is correlated with word segmentation, and named entity boundaries are also word boundaries. However, incorrectly segmented entity boundaries will cause error propagation in NER. For example, in a particular context, a disease entity “思覺失調症” (schizophrenia) may be incorrectly segmented into three words: “思覺” (thinking and feeling), “失調” (disorder) and “症” (disease). Hence, it has been shown that character-based methods outperform word-based approaches for Chinese NER (He and Wang, 2008; Li et al., 2014; Zhang and Yang, 2018).
In the digital era, healthcare information-seeking users usually search and browse web content in click-through trails to obtain healthcare-related information before making a doctor’s appointment for diagnosis and treatment. Web texts are valuable sources to provide healthcare information such as health-related news, digital health magazines and medical question/answer forums. Domain-specific healthcare information includes many proper names, mainly as named entities. For example, “三酸甘油酯” (triglyceride) is a chemical found in the human body; “電腦斷層掃描” (computed tomography; CT) is medical imaging procedure that uses computer-processed combinations of X-ray measurements to produce tomographic images of specific areas of the human body, and “靜脈免疫球蛋白注射” (intravenous immunoglobulin; IVIG) is a kind of treatment for avoiding infections. In summary, Chinese healthcare NER is an important and essential task in natural language processing to automatically identify healthcare entities such as symptoms, chemicals, diseases, and treatments for machine reading and understanding.
A total of 10 entity types are described and some examples are provided in Table I for Chinese healthcare named entity recognition. In this task, participants are asked to predict the named entity boundaries and categories for each given sentence. We use the common BIO (Beginning, Inside, and Outside) format for NER tasks. The B-prefix before a tag indicates that the character is the beginning of a named entity and I-prefix before a tag indicates that the character is inside a named entity. An O tag indicates that a token belongs to no named entity. Below are the example sentences.
Input: 修復肌肉與骨骼最重要的便是熱量、蛋白質與鈣質。
Output: O, O, B-BODY, I-BODY, O, B-BODY, I-BODY, O, O, O, O, O, O, O, O, O, B-CHEM, I-CHEM, I-CHEM, O, B-CHEM, I-CHEM, O
Input: 如何治療胃食道逆流症?
Output: O, O, O, O, B-DISE, I-DISE, I-DISE, I-DISE, I-DISE, I-DISE, O
| Entity Type | Description | Examples |
|---|---|---|
| Body (BODY) | The whole physical structure that forms a person or animal including biological cells, organizations, organs and systems. | “細胞核” (nucleus), “神經組織” (nerve tissue), “左心房” (left atrium), “脊髓” (spinal cord), “呼吸系統” (respiratory system) |
| Symptom (SYMP) | Any feeling of illness or physical or mental change that is caused by a particular disease. | “流鼻水” (rhinorrhea), “咳嗽” (cough), “貧血” (anemia), “失眠” (insomnia), “心悸” (palpitation), “耳鳴” (tinnitus) |
| Instrument (INST) | A tool or other device used for performing a particular medical task such as diagnosis and treatments. | “血壓計” (blood pressure meter), “達文西手臂” (DaVinci Robots), “體脂肪計” (body fat monitor), “雷射手術刀” (laser scalpel) |
| Examination (EXAM) | The act of looking at or checking something carefully in order to discover possible diseases. | “聽力檢查” (hearing test), “腦電波圖” (electroencephalography; EEG), “核磁共振造影” (magnetic resonance imaging; MRI) |
| Chemical (CHEM) | Any basic chemical element typically found in the human body. | “去氧核糖核酸” (deoxyribonucleic acid; DNA), “糖化血色素” (glycated hemoglobin), “膽固醇” (cholesterol), “尿酸” (uric acid) |
| Disease (DISE) | An illness of people or animals caused by infection or a failure of health rather than by an accident. | “小兒麻痺症” (poliomyelitis; polio), “帕金森氏症” (Parkinson’s disease), “青光眼” (glaucoma), “肺結核” (tuberculosis) |
| Drug (DRUG) | Any natural or artificially made chemical used as a medicine. | “阿斯匹靈” (aspirin), “普拿疼” (acetaminophen), “青黴素” (penicillin), “流感疫苗” (influenza vaccination) |
| Supplement (SUPP) | Something added to something else to improve human health. | “維他命” (vitamin), “膠原蛋白” (collagen), “益生菌” (probiotics), “葡萄糖胺” (glucosamine), “葉黃素” (lutein) |
| Treatment (TREAT) | A method of behavior used to treat diseases. | “藥物治療” (pharmacotherapy), “胃切除術” (gastrectomy), “標靶治療” (targeted therapy), “外科手術” (surgery) |
| Time (TIME) | Element of existence measured in minutes, days, years. | “嬰兒期” (infancy), “幼兒時期” (early childhood), “青春期” (adolescence), “生理期” (on one’s period), “孕期” (pregnancy) |
It includes 30,692 sentences with a total around 1.5 million characters or 91.7 thousand words. After manual annotation, we have 68,460 named entities across 10 entity types: body, symptom, instrument, examination, chemical, disease, drug, supplement, treatment, and time.
at least 3,000 Chinese sentences will be provided for system performance evaluation.
The policy of this shared task is an open test. Participating systems are allowed to use other publicly available data for this shared task, but the use of other data should be specified in the final system description paper.
The performance is evaluated by examining the difference between machine-predicted labels and human-annotated labels. We adopt standard precision, recall, and F1-score, which are the most typical evaluation metrics of NER systems at a character level. If the predicted tag of a character in terms of BIO format was completely identical with the gold standard, that is one of the defined BIO tags, the character in the testing instance was regarded as correctly recognized. Precision is defined as the percentage of named entities found by the NER system that are correct. Recall is the percentage of named entities present in the test set found by the NER system.