Araştırma Makalesi


DOI :10.26650/d3ai.003   IUP :10.26650/d3ai.003    Tam Metin (PDF)

Analysis of Word Similarities in Tax Laws Using the Word2Vec Method

Ali İhsan Özgür Çilingir

This paper describes word similarity analysis in tax law using the Word2Vec model. By similarity analysis, we mean identifying relationships between similar terms in tax terminology. The Word2Vec model represents the meanings of words with vectors and identifies the semantic relationships of words through the proximity between these vectors.

This article analyzes the semantic proximity of terms frequently used in tax law and visualises the relationships between these words. For example, the close relationships of the word ‘mükellef’ with words such as ‘kişi’, ‘tam’, ‘dar’, ‘firma’, and ‘imalatçı’ are represented through vectors. The paper also explains the mathematical structure of the models. Then, the features of the NumPy, Gensim, Scikit-learn, and Matplotlib libraries of the Python programming language are explained and used for this paper. For the visualisation of the similarity analysis, the t-SNE algorithm, which allows the visualisation of highdimensional data on a two-dimensional plane, was used.

The main purpose of this paper is to enable AI systems that can be used as tax advisors to better understand tax law by modelling the conceptual relationships between the terms of tax law, thus contributing to the provision of more accurate and consistent information by AI.


PDF Görünüm

Referanslar

  • [1] Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. Efficient Estimation of Word Representations on Vector Space. arXiv preprint arXiv:1301.3781 (2013). google scholar
  • [2] Pervan, Nergis. DERİN ÖĞRENME YAKLAŞIMLARI KULLANARAK TÜRKÇE METİNLERDEN ANLAMSAL ÇIKARIM YAPMA. Ankara, 2019. google scholar
  • [3] Onan, Aytuğ. Evrişimli Sinir Ağı Mimarilerine Dayalı Türkçe Duygu Analizi. Avrupa Bilim ve Teknoloji Dergisi (Aug. 31,2020), 374-380. google scholar
  • [4] Tezgider, Murat, Yıldız, Beytullah, and Aydın, Galip. Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model. İn International Artificial Intelligence and Data Processing Symposium (IDAP) (Malatya 2018), IEEE, 1-7. google scholar
  • [5] Arabacı, Mehmet Ali, Esen, Ersin, Atar, Muhammed Selim, Yılmaz, Eyüp, and Kaltalıoğlu, Batuhan. Kelime Gömevi Yöntemi Kullanarak Benzer Cümle Tespiti. In 2018 26th Signal Processing and Communications Applications Conference ( 2018). google scholar
  • [6] Aydoğan, Murat and Karcı, Ali. Kelime Temsil Yöntemleri ile Kelime Benzerliklerinin İncelenmesi. Çukurova Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 34, 2 (June 2019), 181-195. google scholar
  • [7] Acı, Çiğdem İnan and Çırak, Adem. Türkçe Haber Metinlerinin Konvolüsyonel Sinir Ağları ve Word2Vec Kullanılarak Sınıflandırılması. BİLİŞİM TEKNOLOJİLERİ DERGİSİ, 12, 13 (July 31, 2019), 219-228. google scholar
  • [8] Xia, Chunyu, He, Tieke, Li, Wenlong, Qin, Zemin, and Zou, Zhipeng. Similarity Analysis of Law Documents Based on Word2vec. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C) (Sofia 2019), IEEE, 354-357. google scholar
  • [9] Tatchum, Ghislain Wabo, Makembe, Fritz Sosso, Nzeko, Armel Jacques Nzekon, and Djam, Xaviera Youh. Class-Oriented Text Vectorization for Text Classification: Case Study of Job Offer Classification. Journal of Computer Science an Engineering (JCSE), 5, 2 (Aug. 01, 2024), 116-136. google scholar
  • [10] Wei, Wei, Liu, Wei, Zhang, Beibei, Scherer, Rafal, and Damasevicius, Robertas. Discovery of New Words in Tax-related Fields Based on Word Vector Representation. Journal of Internet Technology, 24, 4 (July 2023), 923-930. google scholar
  • [11] Chalkidis, Ilias and Kampas, Dimitrios. Deep learning in law: early adaptation and legal word embeddings trained on large corpora. Artificial Intelligence and Law ( (Dec. 2019), 171-198. google scholar
  • [12] Mandal, Arpan, Ghosh, Kripabandhu, Ghosh, Saptarshi, and Mandal, Sekhar. Unsupervised approaches for measuring textual similarity between legal court case reports. Artificial Intelligence and Law, 29 (2021), 417-451. google scholar
  • [13] Saha, Rohan. Influence of various text embeddings on clustering performance in NLP. arXiv, 44 (May 04, 2023), 1-22. google scholar
  • [14] Zhong, Ziyuan, Verma, Nakul, and Lia, Vincent. Lecture 8 - t-Distributed Stochastic Neighbor Embedding. New York, 2018. google scholar
  • [15] Linderman, George C. and Steinerberger, Stefan. CLUSTERING WITH T-SNE, PROVABLY. arXiv (June 08, 2017), 1-15. google scholar
  • [16] Arora, Sanjeev and Hu, Wei. An Analysis of the t-SNE Algorithm for Data Visualization. In Conference on Learning Theory (COLT) 2018 (Stockholm 2018), arXiv, 1-32. google scholar
  • [17] Maaten, Laurens van der and Hinton, Geoffrey. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 86 (Sep. 2008), 2579-2605. google scholar
  • [18] Mueller, John Paul and Massaron, Luca. Deep Learning for Dummies. John Wiley & Sons, Inc., New Jersey, 2019. google scholar
  • [19] Nelson, Hala. Essential Math for AI - Next Level Mathematics for Efficient and Succesful AI Systems. O’Reilly Media, Sebastopol, 2023. google scholar
  • [20] Kelleher, John D. Deep Learning. The MIT Press, London, 2019. google scholar
  • [21] Anonymous. NumPy documentation. 2024. google scholar
  • [22] https://scikit-learn.org/stable/">https://scikit-learn.org/stable/. https://scikit-learn.org/stable/. 2024. google scholar
  • [23] https://matplotlib.org/">https://matplotlib.org/. https://matplotlib.org/. 2024. google scholar
  • [24] Anonim. Osmanlı Türkçesi Sözlüğü. google scholar
  • [25] Haider, Mofiz Mojib, Hossin, Arman, Mahi, Hasibur Rashid, and Arif, Hossain. Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm. In 2020 IEEE Region 10 Symposium (TENSYMP) (Dhaka 2020), 283-286. google scholar
  • [26] Li, Zhie and Rao, Zhuyi. Text classification model based on Word2vec and SF-HAN. In 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC 2020) (Shenzhen 2020), 978-1-7281-4323-l/20/$31.00 ©2020 IEEE, 1385-1390. google scholar
  • [27] Mao, Yushang, Zhang, Guixuan, and Zhang, Shuwu. Word Semantic Similarity Based on CiLin and Word2vec. In 2020 International Conference on Culture-oriented Science & Technology (ICCST) (Beijing), 978-1-7281-8138-7/20/$31.00 ©2020 IEEE, 304 - 307. google scholar
  • [28] Bissiri, Pier Giovanni and Walker, Stephen G. Converting Information into probability measures with the Kullback-Leibler diver-gence. Ann Inst Stat Math (2012), 1139-1160. google scholar
  • [29] Jaya, Putra Syopiansyah, Nur, Gunawan Muhamad, and Akbar, Hidayat Arief. Feature Engineering with Word2vec on Text Classi-fication Using The K-Nearest Neighbor Algorithm. İn The 1Oth International Conference on Cyber and İT Service Management (CITSM 2022) (Yogyakarta 2022), ©2022 IEEE. google scholar
  • [30] Kurian, Jeomoan Francis and Allali, Mohamed. Detecting drifts in data streams using Kullback-Leibler (KL) divergence measure for data engineering Applications. Journal of Data, Information and Management (2024), 207-2016. google scholar
  • [31] Polat, Buğra. TÜRKÇE ÜRÜN YORUMLARI VERİSİ İLE DUYGU ANALİZİ. Ankara, 2021. google scholar
  • [32] Çalışkan, Sedrettin, Yazıcıoğlu, Selahattin A., Demirci, Ulaş, and Kuş, Zeki. YAPAY SİNİR AĞLARI, KELİME VEKTÖRLERİ VE DERİN ÖĞRENME UYGULAMALARI. İstanbul, 2018. google scholar
  • [33] Pirana, Gurur, Sertbaş, Ahmet, and Ensari, Tolga. Sanal Asistan Uygulamaları İçin Derin Öğrenme Yöntemiyle Cümle Sınıflandırma. In 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2019 (Ankara 2019), Institute of Electrical and Electronics Engineers Inc. google scholar
  • [34] Kılıç, Berker and Öner, Yüksel. Yargıtay Kararlarının Suç Türlerine Göre Makine Öğrenmesi Yöntemleri İle Sınıflandırılması. VERİ BİLİMİ DERGİSİ (2021), 61-71. google scholar
  • [35] Law, Jarvan, Zhuo, Hankui Hankz, He, Junhua, and Rong, Erhu. LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations. arXiv preprint arXiv (Feb. 23, 2017). google scholar
  • [36] Önal, Zeynep. Derin Öğrenme. Nobel Akademik Yayıncılık, Ankara, 2022. google scholar
  • [37] Guthrie, David, Allison, Ben, Liu, Wei, Guthiere, Louise, and Wilks, Yorick. A Closer Look at Skip-gram Modelling. İn Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (Genoa 2006), ACL Anthology, 1222-1225. google scholar
  • [38] Srivastava, Rajendra P. New Measure of Similarity in Textual Analysis: Vector Similarity Metric versus Cosine Similarity Metric. JOURNAL OF EMERGING TECHNOLOGIES İN ACCOUNTING, 20, 1 (2023), 77-90. google scholar
  • [39] Pudaruth, Sameerchand, Soyjaudah, Sunjiv, and Gunputh, Rajendra. Classification of Legislations using Deep Learning. The International Arab Journal of Information Technology, 18, 5 (Sep. 2021), 651-663. google scholar
  • [40] Robaldo, Livio, Villiata, Serena, Wyner, Adam, and Grabmair, Matthias. Introduction for artificial intelligence and law: special issue “natural language processing for legal texts”. Artificial Intelligence and Law (Apr. 2019), 113-115. google scholar
  • [41] Tagarelli, Andrea and Simeri, Andrea. Unsupervised law article mining based on deep pre-trained language representation models with Application to the Italian civil code. Artificial Intelligence and Law, 30 (Sep. 2022), 417-473. google scholar
  • [42] Makawana, Mayur and Mehta, Rupa G. A novel network-based paragraph filtering technique for legal document similarity analysis. Artificial Intelligence and Law (Oct. 2023). google scholar
  • [43] Bilgin, Metin. Kelime Vektörü Yöntemlerinin Model Oluşturma Sürelerinin Karşılaştırılması. BİLİŞİM TEKNOLOJİLERİ DERGİSİ, 12, 2 (Apr. 2019), 141-146. google scholar
  • [44] Ahmetoğlu, Hüseyin and Daş, Resul. Türkçe Otel Yorumlarıyla E"gitilen Kelime Vektörü Modellerinin Duygu Analizi ile 'Incelenmesi. Fen Bilimleri Enstitüsü Dergisi, 24, 2 (2020), 455-463. google scholar
  • [45] Çelik, Özer and Koç, Burak Can. TF-IDF, Word2vec ve Fasttext Vektör Model Yöntemleri ile Türkçe Haber Metinlerinin Sınıflandırılması. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 23 (2021), 121-127. google scholar
  • [46] Kınık, Doğancan and Güran, Aysun. TF-IDF ve Doc2Vec Tabanlı Türkçe Metin Sınıflandırma Sisteminin Başarım Değerinin Ardışık Kelime Grubu Tespiti ile Arttırılması. Avrupa Bilim ve Teknoloji Dergisi (Jan. 2021), 323-332. google scholar
  • [47] Hongnan, Tian and Xin, Guo. Research on Improved Sentence Similarity Calculation Method Based on Word2Vec and Synonym Table in Interactive Machine Translation. In 2021 5th International Conference on Robotics and Automation Sciences (Wuhan 2021), IEEE , 255-261. google scholar
  • [48] Xiao, Lu, Li, Qiaoxing, Ma, Qian, Shen, Jiasheng, Yang, Yong, and Li, Danyang. Text classification algorithm of tourist attractions subcategories with modified TF-IDF. PLOS ONE (Oct. 2024), 1-34. google scholar
  • [49] Gupta, Megha, Dheekonda, Venkatasai, and Masum, Mohammad. Genie: Enhancing information management in the restaurant industry through AI-powered chatbot. International Journal of Information Management Data Insights (May 25, 2024), 1-9. google scholar
  • [50] G, Dhamodharan and A, Kaleemullah. An Innovative Algorithm for Enhanced PDF-Based Chatbot in Domain-Specific Question Answering. Library Progress International, 44, 3 (Sep. 01, 2024), 27648-27653. google scholar
  • [51] Godghase, Gauri Anil, Agrawal, Rishit, Obili, Tanush, and Stamp, Mark. Distinguishing Chatbot from Human. arXiv:2408.04647v1 [cs.CL] (Aug. 12, 2024), 1-47. google scholar
  • [52] Becha, Rahma, Sellami, Asma, Bouassida, Nadia, Idri, Ali, and Abran, Alain. BotCFP: A Machine Learning based Tool for COSMIC Chatbots Sizing. CEUR, 3852 (Apr. 30, 2024), 1-16. google scholar
  • [53] https://pypi.org/project/gensim/">https://pypi.org/project/gensim/. https://pypi.org/project/gensim/. 2024. google scholar
  • [54] Leshem, Ido. Skip-Gram Word2Vec Algorithm Explained. 2023. google scholar

Atıflar

Biçimlendirilmiş bir atıfı kopyalayıp yapıştırın veya seçtiğiniz biçimde dışa aktarmak için seçeneklerden birini kullanın


DIŞA AKTAR



APA

Çilingir, A.İ. (2025). Analysis of Word Similarities in Tax Laws Using the Word2Vec Method. Journal of Data Analytics and Artificial Intelligence Applications, 0(0), -. https://doi.org/10.26650/d3ai.003


AMA

Çilingir A İ. Analysis of Word Similarities in Tax Laws Using the Word2Vec Method. Journal of Data Analytics and Artificial Intelligence Applications. 2025;0(0):-. https://doi.org/10.26650/d3ai.003


ABNT

Çilingir, A.İ. Analysis of Word Similarities in Tax Laws Using the Word2Vec Method. Journal of Data Analytics and Artificial Intelligence Applications, [Publisher Location], v. 0, n. 0, p. -, 2025.


Chicago: Author-Date Style

Çilingir, Ali İhsan Özgür,. 2025. “Analysis of Word Similarities in Tax Laws Using the Word2Vec Method.” Journal of Data Analytics and Artificial Intelligence Applications 0, no. 0: -. https://doi.org/10.26650/d3ai.003


Chicago: Humanities Style

Çilingir, Ali İhsan Özgür,. Analysis of Word Similarities in Tax Laws Using the Word2Vec Method.” Journal of Data Analytics and Artificial Intelligence Applications 0, no. 0 (Feb. 2025): -. https://doi.org/10.26650/d3ai.003


Harvard: Australian Style

Çilingir, Aİ 2025, 'Analysis of Word Similarities in Tax Laws Using the Word2Vec Method', Journal of Data Analytics and Artificial Intelligence Applications, vol. 0, no. 0, pp. -, viewed 5 Feb. 2025, https://doi.org/10.26650/d3ai.003


Harvard: Author-Date Style

Çilingir, A.İ. (2025) ‘Analysis of Word Similarities in Tax Laws Using the Word2Vec Method’, Journal of Data Analytics and Artificial Intelligence Applications, 0(0), pp. -. https://doi.org/10.26650/d3ai.003 (5 Feb. 2025).


MLA

Çilingir, Ali İhsan Özgür,. Analysis of Word Similarities in Tax Laws Using the Word2Vec Method.” Journal of Data Analytics and Artificial Intelligence Applications, vol. 0, no. 0, 2025, pp. -. [Database Container], https://doi.org/10.26650/d3ai.003


Vancouver

Çilingir Aİ. Analysis of Word Similarities in Tax Laws Using the Word2Vec Method. Journal of Data Analytics and Artificial Intelligence Applications [Internet]. 5 Feb. 2025 [cited 5 Feb. 2025];0(0):-. Available from: https://doi.org/10.26650/d3ai.003 doi: 10.26650/d3ai.003


ISNAD

Çilingir, Aliİhsan Özgür. Analysis of Word Similarities in Tax Laws Using the Word2Vec Method”. Journal of Data Analytics and Artificial Intelligence Applications 0/0 (Feb. 2025): -. https://doi.org/10.26650/d3ai.003



ZAMAN ÇİZELGESİ


Gönderim06.12.2024
Kabul03.01.2025
Çevrimiçi Yayınlanma20.01.2025

LİSANS


Attribution-NonCommercial (CC BY-NC)

This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.


PAYLAŞ




İstanbul Üniversitesi Yayınları, uluslararası yayıncılık standartları ve etiğine uygun olarak, yüksek kalitede bilimsel dergi ve kitapların yayınlanmasıyla giderek artan bilimsel bilginin yayılmasına katkıda bulunmayı amaçlamaktadır. İstanbul Üniversitesi Yayınları açık erişimli, ticari olmayan, bilimsel yayıncılığı takip etmektedir.