Information Extraction from Discharge Reports: A Roadmap

Selin Şahin, Serra Çelik, Sevinç Gülseçen

Discharge reports are an essential and detailed data source in health. In order to information extraction from the discharge reports of patients, using the natural language processing method, which deals with the co-occurrence of words in learning the hidden meanings discussed in the text, gives more reliable results than other methods. Natural Language Processing can be defined as the language people speak to transfer to machines. Information extraction, which has become an increasingly important issue in recent years, is one of the application areas of natural language processing. By processing unstructured data with information extraction, entities in the text are found and stored in a database simultaneously. Thus, a text written in natural language becomes operable in computer language. If natural language processing methods are applied to electronic discharge notes, highly productive descriptive studies can be conducted and can guide future studies. Developments in information technologies allow the keeping and processing of discharge notes electronically. A lot of helpful information is obtained by information extraction from patient discharge notes, which are unstructured data kept in an electronic environment. Models created with natural language processing can allow doctors to classify patients and thus use appropriate treatment methods. Natural language processing studies with Python can be performed with three libraries (NLTK, SpaCy, Gensim). NLTK performs many operations such as classification, extracting sentences or words from the text, and tagging. SpaCy integrates easily with deep learning. Includes part of speech tagging, named entity recognition, and convolutional neural networks model. Gensim is a Python library for topic modeling, document indexing, and similarity to giant corpses. The study aims to reveal a roadmap for information extraction from the discharge notes of the patients using natural language processing methods and to reveal the hidden meanings in the texts (discharge reports-notes).

Keywords: Discharge notes, natural language processing, information extraction, roadmap

DOI :10.26650/B/ET07.2023.005.11 IUP :10.26650/B/ET07.2023.005.11 Full Text (PDF)

Taburcu Raporlarından Bilgi Keşfi: Bir Yol Haritası

Selin Şahin, Serra Çelik, Sevinç Gülseçen

Taburcu notları, sağlık alanında oldukça önemli ve ayrıntılı bir veri kaynağıdır. Hastalara ait taburcu notlarından bilgi çıkarımı yapmak, metinde tartışılan gizli anlamları öğrenmek için sözcüklerin birlikte oluşmasını ele alan doğal dil işleme yöntemini kullanmak diğer yöntemlere kıyasla daha güvenilir sonuçlar vermektedir. Doğal dil işleme, insanların konuştuğu dilin makinelere aktarılması olarak ifade edilebilir. Son yıllarda önemi artan bir konu olan bilgi çıkarımı doğal dil işlemenin uygulama alanlarından biridir. Bilgi çıkarımı ile yapılandırılmamış veriler işlenerek metindeki varlıkların bulunması ve aynı zamanda bir veritabanında depolanması gerçekleştirilir. Böylece doğal dilde yazılmış bir metin bilgisayar dilinde işlenebilir hale gelmektedir. Doğal dil işleme yöntemlerinin elektronik taburcu notlarına uygulanması halinde oldukça verimli tanımlayıcı çalışmalar yürütülebilir ve gelecekte yapılan çalışmalara yol gösterici olabilir. Bilgi teknolojileri alanında yaşanan gelişmeler taburcu notlarının elektronik ortamda tutulmasına ve işlenmesine olanak tanımaktadır. Elektronik ortamda tutulan ve yapılandırılmamış veriler olan hasta taburcu notlarından bilgi çıkarımı ile pek çok yararlı bilgi elde edilmektedir. Doğal dil işleme yöntemi ile oluşturulan modeller doktorların hastaları sınıflandırma ve bu sayede uygun tedavi yöntemleri kullanmasına olanak sağlayabilir. Python ile doğal dil işleme çalışmaları 3 kütüphane (NLTK, SpaCy, Gensim) ile gerçekleştirilebilmektedir. NLTK, sınıflama, metinden cümle veya kelime çıkarma etiketleme gibi pek çok işlemi gerçekleştirmektedir. SpaCy, derin öğrenme ile kolay entegre olmaktadır. Konuşma bölümü etiketleme, adlandırılmış varlık tanıma, evrişimli sinir ağları modeli içerir. Gensim, konu modelleme, belge indeksleme ve büyük derlemlerle benzerlik elde etmek için bir Python kütüphanesidir. Çalışmada amaçlanan hastalara ait taburcu notlarından doğal dil işleme yöntemlerini kullanarak bilgi çıkarımı yapılması ve metinlerde (taburcu raporları-notları) yer alan gizli anlamların ortaya çıkarılmasına dair bir yol haritası ortaya koymaktır.

Keywords: Taburcu notu, doğal dil işleme, bilgi keşfi, yol haritası

References

Acar, E. (2007). Ölümlülük, Ölümsüzlük ve Yapay Zekâ. (Editör: Cem Uçan), İstanbul: Altkitap Yayınevi Adalı, E. (2012). Doğal Dil İşleme. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5(2). google scholar
Alajmi, A., Saad, E. M., Darwish, R. R. (2012). Toward an ARABIC stop-words list generation. International Journal of Computer Applications, 46(8), 8-13. google scholar
Al-Shammari, E. T. (2008). Towards an error-free stemming. In IADIS European Conference Data Mining 2008 (part of MCCSIS 2008), 160-163. google scholar
Aramaki E, Imai T, Miyo K, Ohe K. (2006), Patient status classification by using rule based sentence extraction and MB35 kNN-based classifier. Proceedings i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. google scholar
Atan, S., (2016). Veri, Büyük Veri ve İşletmecilik. Balıkesir Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, (35), 137-153. google scholar
Beam, A. L., Kohane, I. S. (2018). Big data and machine learning in health care. Jama, 319(13), 1317-1318. google scholar
Bilgin, M. (2017). Gerçek Veri Setlerinde Klasik Makine Öğrenmesi Yöntemlerinin Performans Analizi. Breast, 2(9), 683. google scholar
Carrero FM, Gomez Hidalgo JM, Puertas E, Mana M, Mata J. (2006), Quick prototyping of high performance text classifiers. Proceedings i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. google scholar
Chao, W. L., (2011). Machine Learning Tutorial. google scholar
Chopra, A., Prashar, A., Sain, C., (2013), Natural Language Processing, Internatıonal Journal ff Technology Enhancements And Emergıng Engıneerıng Research, 1(4), 131. google scholar
Çoban, Ö., Özyer, G. T. (2018). Word2vec and Clustering based Twitter Sentiment Analysis. In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), 1-5. google scholar
Cohen A. (2007), Five-way smoking status classification using text hot-spot identification and error-correcting output codes. JAMIA; 15(1):32-5; doi:10.1197/jamia.M2434. google scholar
Cowie, J., Lehnert, W. (1996). Information extraction. Communications of the ACM, 39(1), 80-91. google scholar
Cunningham, H. (2005). Information extraction, automatic. Encyclopedia of language and linguistics, 3(8), 10. google scholar
Dalı YA, Saluvan M. (2015), Sağlık Hizmetlerinin Kalitesi İle Hastane Bilgi Sistemleri İlişkisi. Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü google scholar
Eryiğit, G. (2018), Uygulamalı Türkçe Doğal Dil İşleme Evreleri, https://yazokulu.bilimakademisi.org/yapayog-renme/2018/sunumlar/gulseneryigit-byoyo18.pdf google scholar
Grefenstette G. (1999) Tokenization. In: van Halteren H. (eds) Syntactic Wordclass Tagging. Text, Speech and Language Technology, vol 9. Springer, Dordrecht. google scholar
Guillen R. (2006), Automated deidentification and categorization of medical records. Proceedings i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. google scholar
Gündoğdu, Ö.E., Duru, N., (2016), Türkçe Metin Özetlemede Kullanılan Yöntemler, 18. Akademik Bilişim Konferansı-AB’16, Aydın, Türkiye, 69-76 google scholar
Guo Y, Gaizauskas R, Roberts I, Gaizauskas R, Hepple M. (2006), Identifying personal health information using support vector machines. Proceedings i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. google scholar
Hara K. (2007), Applying a SVM based chunker and a text classifier to the Deid Challenge. i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. Also available as a JAMIA on-line data supplement at http://www.jamia.org. google scholar
Harper E. (2014). Can big data transform electronic health records into learning health systems? In: Nursing Informatics, Amsterdam, The Netherlands: IOS Press. google scholar
Hirschberg, J., Manning, C. D., (2019), Advances in natural language processing, http://science.sciencemag. org, 262. google scholar
Hobbs, J. R. ve Riloff, E. (2010). Information Extraction. Handbook of natural language processing, 15, 16. google scholar
https://pypi.org/project/gensim, (2021), gensim 4.1.2. google scholar
https://spacy.io/usage/spacy-101, (2021), What’s spaCy?. google scholar
https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html, Metathesaurus. google scholar
i2b2 Obesity Challenge Documentation, (2008), https://www.i2b2.org/NLP/Obesity/Documentation.php google scholar
IBM (2021), https://www.ibm.com/cloud/learn/machine-learning?lnk=fle, Machine Learning. google scholar
IBM (2021), https://www.ibm.com/watson/natural-language-processing, What is natural language processing? google scholar
Jain, A., Kulkarni, G., Shah, V., (2018), Natural Language Processing , JCSE International Journal of Computer Sciences and Engineering Open Access Review Paper, 6(1), E-ISSN, 2347-2693 google scholar
Konchady M, (2006). Text Mining Application Programming. Charles River Media. google scholar
McCarthy, J. (1989). Artificial intelligence, logic and formalizing common sense. In Philosophical logic and artificial intelligence. Springer, Dordrecht. google scholar
McEnery, T. (2006). Corpus-Based Language Studies An Advanced Resource Book. New York: Routledge. google scholar
Natural Language Toolkit, (2021), https://www.nltk.org/ google scholar
Ngoc, P. V., Ngoc, C. V. T., Ngoc, T. V. T., & Duy, D. N. (2019). A C4. 5 algorithm for english emotional classification. Evolving Systems, 10(3), 425-451. google scholar
Öğütçü G, Nergis A, Köybaşı G., Cula S., (2011), Elektronik Sağlık Kayıtlarının İçeriği, Hassasiyeti ve Erişim Kontrollerine Yönelik Farkındalık ve Beklentilerin Değerlendirilmesi, TÜRKMIA Kongresi. google scholar
Öksüz, E. ve Malhan, S., (2005) Birinci Basamak Sağlık Hizmetlerinde Kodlama ve Sınıflandırma. google scholar
Pedersen T. Determining smoker status using supervised and unsupervised learning with lexical features. i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. Also available as JAMIA on-line data supplement at http://www.jamia.org. google scholar
Pentland A, Reid TG, Heibeck T, (2013). Big data and health: revolutionizing medicine and public health. http:// kit.mit.edu/sites/default/files/documents/ WISH_BigData_Report.pdf google scholar
Plisson, J., Lavrac, N., & Mladenic, D. (2004, May). A rule based approach to word lemmatization. In Proceedings of IS, (3), 83-86. google scholar
Rekdal M. (2006), Identifying smoking status using Argus MLPi2b2. Proceedings i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. google scholar
Salas-Vega, S., Haimann, A., Mossialos, E. (2015). Big data and health care: challenges and opportunities for coordinated policy development in the EU. Health Systems & Reform, 1 (4), 285-300. google scholar
SAS, (2021), What is Natural Language Processing?, https://www.sas.com/en_us/insights/analytics/what-is-na-tural-language-processing-nlp.html, google scholar
Savova G, Ogren PV, Duffy P, Buntrock J, Chute C. (2007), Mayo clinic NLP system for patient smoking status deidentification. JAMIA 2007; Published Online First 10 Dec 2007; 15(1), 25-8. google scholar
Şeker, S. E., (2015), Doğal Dil İşleme (Natural Language Processing), YBS Ansiklopedi, 2(4). google scholar
Shams, R., Mercer, R. E. (2013). Personalized spam filtering with natural language attributes. 12th International Conference on Machine Learning and Applications, 2, 127-132. google scholar
Shinnou, H. (2001). Detection of errors in training data by using a decision list and Adaboost. In IJCAI-2001 Workshop on Text Learning: Beyond Supervision, 61-65. google scholar
Söylemez A., (2016), Hasta kayıtlarının deontoloji disiplin yaklaşımına göre değerlendirilmesi, Türk Dünyası Uygulama ve Araştırma Merkezi Tıp Tarihi ve Etik Dergisi, 1(1). google scholar
Suthaharan, S. (2016). Support vector machine. In Machine learning models and algorithms for big data classification. Springer, Boston, MA. google scholar
Szarvas G, Farkas R, Ivan S, Kocsor A, Busa-Fekete R. (2006), Automatic extraction of semantic content from medical discharge summaries. Proceedings i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. google scholar
Tülek, M. (2007), Türkçe İçin Metin Özetleme. İstanbul Teknik Üniversitesi Fen Bilimleri Enstitüsü google scholar
Unified Medical Language System (UMLS), https://www.nlm.nih.gov/research/umls/index.html google scholar
Uzuner Ö, Luo Y, Szolovits P. (2007), Evaluating the state of the art in automatic de-identification. JAMIA, 14(5)550-63. google scholar
Webb, G. I., Keogh, E., & Miikkulainen, R. (2010). Naïve Bayes. Encyclopedia of machine learning, 15, 713-714. google scholar

Medical Informatics III

CHAPTER

Information Extraction from Discharge Reports: A Roadmap

Taburcu Raporlarından Bilgi Keşfi: Bir Yol Haritası

References

SHARE