CHAPTER


DOI :10.26650/B/T3.2024.40.002   IUP :10.26650/B/T3.2024.40.002    Full Text (PDF)

Natural Language Processing in Bioinformatics

Orçun TaşarÇiğdem Selçukcan Erol

As in many other areas, with the increase in the volume of data produced in the fields of mainly biology and medicine, both the data and the processes of transforming it into information began to become more complex. Therefore, it was inevitable that the bioinformatics discipline, which has great capabilities for the transformation and organization of data in the related domains, needed new tools. Bioinformatics practices have been greatly enhanced by the incorporation of machine learning and deep learning (methodologies, particularly over the last decade. Structured data has long been used in ML and DL studies to derive predictive outcomes in the clinical and biomedical worlds, and now, with the advent of natural language processing (NLP) within artificial intelligence practices, it is also possible to process unstructured textual data. NLP approaches, which can now be used in the development of automated systems for time-consuming, costly, and error-prone routine tasks, can even be used to detect biomedical elements relationships that have been in the literature for a long time but have never been noticed before. Thanks to innovative bioinformatics and AI-powered NLP techniques, the fields of biology, genetics, medicine, and pharmaceuticals are rapidly transforming. In this book chapter, the use cases of NLP within bioinformatics on data in the fields of biology and medicine are discussed with current real-life examples.


DOI :10.26650/B/T3.2024.40.002   IUP :10.26650/B/T3.2024.40.002    Full Text (PDF)

Bi̇yoenformati̇kte Doğal Di̇l İşleme

Orçun TaşarÇiğdem Selçukcan Erol

Pek çok alanda olduğu gibi, biyoloji ve tıp alanlarında üretilen verinin boyutunun artması ile birlikte bu verinin bilgiye dönüştürülmesi için izlenecek süreçler de karmaşık hale gelmeye başlamış; bu durum ilgili alanlardaki verinin işlenmesi ve organize edilmesi için pek çok metodoloji barındıran biyoenformatik disiplininin yeni araçlara ihtiyaç duymasına sebebiyet vermiştir. Bu doğrultuda özellikle son on yılda makine öğrenmesi ve derin öğrenme metodolojilerinin de eklenmesiyle birlikte biyoenformatiğin uygulama alanları büyük ölçüde genişlemiş oldu. Yapısal veriler, uzun bir süredir klinik ve biyomedikal alanlarda tahmine dayalı çıktıların alınmasında kullanılmakta olup doğal dil işleme (natural language processing - NLP) yöntemlerinin gelişmesiyle artık benzer öngörülerin elde edilmesinde yazınsal metinler gibi yapısal olmayan verilerin de kullanılması mümkün hale gelmiştir. Doğal dil işleme yaklaşımları, maliyetli, zaman alıcı ve hataya yatkın rutin işlemlerin otomatize hale getirilmesinde kullanıldığı gibi, literatürde uzun zamandır mevcut olan ama gizli kalmış biyomedikal varlıkların arasındaki ilişkilerin ortaya çıkarılmasına da katkı vermektedir. Yapay zeka destekli doğal dil işleme teknikleri sayesinde biyoloji, genetik, tıp, ilaç keşfi gibi alanlar hızla dönüşmektedir. Bu kitap bölümünde doğal işleme yöntemlerinin biyoenformatikte kullanımı, biyoloji ve tıp alanlarında gerçekleştirilmiş çalışmalardan örnekler verilerek tartışılmıştır. 



References

  • Alharbi, A., Alosaimi, W., & Uddin, M. I. (2021). Automatic surveillance of pandemics using big data and text mining. Computers, Materials & Continua, 68(1), 303-317. https://doi.org/10.32604/cmc.2021.016230 google scholar
  • Ashok, A., Guruprasad, M., Prakash, C. O., & Shylaja, S. S. (2019). A Machine Learning Approach for Disease Surveillance and Visualization using Twitter Data. Second International Conference on Computational In-telligence in Data Science (ICCIDS-2019). https://doi.org/10.1109/iccids.2019.8862087 google scholar
  • Barba, M., Czosnek, H., & Hadidi, A. (2014). Historical Perspective, development and applications of Next-Ge-neration Sequencing in plant Virology. Viruses, 6(1), 106-136. https://doi.org/10.3390/v6010106 google scholar
  • Birgmeier, J., Deisseroth, C. A., Hayward, L., Galhardo, L. M. T., Tierno, A. P., Jagadeesh, K. A., Stenson, P. D., Cooper, D. N., Bernstein, J. A., Haeussler, M., & Bejerano, G. (2020). AVADA: toward automated pathoge-nic variant evidence retrieval directly from the full-text literature. Genetics in Medicine, 22(2), 362-370. https://doi.org/10.1038/s41436-019-0643-6 google scholar
  • Caporaso, J. G., Baumgartner, W. A., Jr, Randolph, D. A., Cohen, K. B., & Hunter, L. (2007). MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics (Oxford, Eng-land), 23(14), 1862-1865. https://doi.org/10.1093/bioinformatics/btm235 google scholar
  • Chunn, L. M., Nefcy, D. C., Scouten, R. W., Tarpey, R. P., Chauhan, G., Lim, M. S., Elenitoba-Johnson, K. S., Schwartz, S. H., & Kiel, M. J. (2020). MasterMind: a comprehensive Genomic Association search engine for empirical evidence curation and genetic variant interpretation. Frontiers in Genetics, 11. https://doi. org/10.3389/fgene.2020.577152 google scholar
  • Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv (Cornell University). https://arxiv.org/pdf/1810.04805v2 google scholar
  • Etzrodt, K., Gentzel, P., Utz, S., & Engesser, S. (2022). Human-machine-communication: introduction to the special issue. Publizistik, 67(4), 439-448. https://doi.org/10.1007/s11616-022-00754-8 google scholar
  • Gauthier, J., Vincent, A. T., Charette, S. J., & Deröme, N. (2018). A brief history of bioinformatics. Briefings in Bioinformatics, 20(6), 1981-1996. https://doi.org/10.1093/bib/bby063 google scholar
  • Guo, C., Ma, X., Gao, F., & Guo, Y. (2023). Off-target effects in CRISPR/Cas9 gene editing. Frontiers in Bio-engineering and Biotechnology, 11. https://doi.org/10.3389/fbioe.2023.1143157 google scholar
  • Heather, J., & Chain, B. M. (2016). The sequence of sequencers: The history of sequencing DNA. Genomics, 107(1), 1-8. https://doi.org/10.1016/j.ygeno.2015.11.003 google scholar
  • Himmelstein, D. S., Lizee, A., Hessler, C., Brueggeman, L., Chen, S., Hadley, D., Green, A., Khankhanian, P., & Baranzini, S. E. (2017). Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife, 6. https://doi.org/10.7554/elife.26726 google scholar
  • Joshi, A. K. (1991). Natural language processing. Science, 253(5025), 1242-1249. https://doi.org/10.1126/scien-ce.253.5025.1242 google scholar
  • Khan, M. Y., Ziyadi, M., & AbdelHady, M. (2020). MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers. arXiv (Cornell University). https://arxiv.org/ pdf/2001.08904.pdf google scholar
  • Khurana, D., Koli, A., Khatter, K., & Singh, S. (2022). Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713-3744. https://doi.org/10.1007/ s11042-022-13428-4 google scholar
  • Liu, K., Liu, W., & He, A. J. (2022). Evaluating health policies with subnational disparities: a text-mining analy-sis of the Urban Employee Basic Medical Insurance Scheme in China. Health Policy and Planning, 38(1), 83-96. https://doi.org/10.1093/heapol/czac086 google scholar
  • Liu, Q., Cheng, X., Gan, L., Li, B., & Liu, X. (2020). Deep learning improves the ability of sgRNA off-target propensity prediction. BMC Bioinformatics, 21(1). https://doi.org/10.1186/s12859-020-3395-z google scholar
  • Luo, R., Sun, L., Xia, Y., Qin, T., Zhang, S., Poon, H., & Liu, T. (2022). BioGPT: generative pre-trained transfor-mer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6). https://doi.org/10.1093/ bib/bbac409 google scholar
  • Machiraju, S., & Modi, R. (2017). Natural Language Processing. In Developing Bots with Microsoft Bots Fra-mework (pp. 203-232). https://doi.org/10.1007/978-1-4842-3312-2_9 google scholar
  • Miyagawa, S., Ojima, S., Berwick, R. C., & Okanoya, K. (2014). The integration hypothesis of human language evolution and the nature of contemporary languages. Frontiers in Psychology, 5. https://doi.org/10.3389/ fpsyg.2014.00564 google scholar
  • Nicholson, D. N., Himmelstein, D. S., & Greene, C. S. (2022). Expanding a database-derived biomedical know-ledge graph via multi-relation extraction from biomedical abstracts. BioData Mining, 15(1). https://doi. org/10.1186/s13040-022-00311-z google scholar
  • Park, S., Wang, A. Y., Kawas, B., Liao, Q. V., Piorkowski, D., & Danilevsky, M. (2021). Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models. IUI ’21: 26th International Conference on Intelligent User Interfaces. https://doi.org/10.1145/3397481.3450637 google scholar
  • Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethi-cs, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121-154. https://doi. org/10.1016/j.iotcps.2023.04.003 google scholar
  • Sanger, F., Nicklen, S., & Coulson, A. (1977). DNA sequencing with chain-terminating inhibitors. Proceedin-gs of the National Academy of Sciences of the United States of America, 74(12), 5463-5467. https://doi. org/10.1073/pnas.74.12.5463 google scholar
  • Shi, J., Liu, S., Pruitt, L. C. C., Luppens, C. L., Ferraro, J. P., Gundlapalli, A. V., Chapman, W. W., & Bucher, B. T. (2020). Using Natural Language Processing to improve EHR Structured Data-based Surgical Site Infec-tion Surveillance. AMIA ... Annual Symposium proceedings. AMIA Symposium, 2019, 794-803. google scholar
  • Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., Clark, K., Pfohl, S., Cole-Lewis, H., Neal, D., Schaekermann, M., Wang, A., Amin, M., Lachgar, S., Mansfield, P., Prakash, S., Green, B., Dominowska, E., Arcas, B. a. Y., . . . Natarajan, V. (2023). Towards Expert-Level Medical Question Answering with Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.09617 google scholar
  • Tanwar, A., Zhang, J., Ive, J., Gupta, V., & Guo, Y. (2022). Phenotyping in clinical text with unsupervised nume-rical reasoning for patient stratification. Experimental Biology and Medicine, 247(22), 2038-2052. https:// doi.org/10.1177/15353702221118092 google scholar
  • Waltl, B., Bonczek, G., & Matthes, F. (2018). Rule-based information extraction: advantages, limitations, and perspectives. Jusletter IT. https://jusletter-it.weblaw.ch/issues/2018/IRIS/rule-based-informati_1b3baf214e. html google scholar
  • Wei, C., Phan, L., Feltz, J., Maiti, R., Hefferon, T., & Lu, Z. (2017). tmVar 2.0: integrating genomic variant information from literatüre with dbSNP and ClinVar for precision medicine. Bioinformatics, 34(1), 80-87. https://doi.org/10.1093/bioinformatics/btx541 google scholar
  • Wei, C. H., Kao, H. Y., & Lu, Z. (2015). GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. BioMed Research International, 2015, 1-7. https://doi.org/10.1155/2015/918710 google scholar
  • Zhang, X., Xiao, C., Glass, L. M., & Sun, J. (2020). DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2001.08179 google scholar
  • Zirkle, J., Han, X., Racz, R., Samieegohar, M., Chaturbedi, A., Mann, J., Chakravartula, S., & Li, Z. (2023). Deep learning-enabled natural language processing to identify directional pharmacokinetic drug-drug interactions. BMC Bioinformatics, 24(1). https://doi.org/10.1186/s12859-023-05520-9 google scholar


SHARE




Istanbul University Press aims to contribute to the dissemination of ever growing scientific knowledge through publication of high quality scientific journals and books in accordance with the international publishing standards and ethics. Istanbul University Press follows an open access, non-commercial, scholarly publishing.