A New Programming Language for Data Science: Julia
Münevver Turanlı, Ünal Halit ÖzdenUsers working in scientific programming and data science need a fast, flexible, and dynamic high-performance programming language with easy code writing and prototyping. Many programming languages exist that are used in the data science world. Some of these languages are very fast but difficult to learn and code, while others are very easy to write code for but have a very slow running speed. In comparison to other programming languages, the relatively new Julia is a high performance programming language that aims to overcome these problems by being both fast and easy to code. Therefore, the purpose of this article is to introduce Julia and to compare it to the other programming languages used in statistics and data science. In addition, this article also aims to help researchers, especially those interested in statistics and data science, learn about the Julia programming language and to choose the language best suited for them.
Veri Bilimi İçin Yeni Bir Programlama Dili: Julia
Münevver Turanlı, Ünal Halit ÖzdenBilimsel programlama ve veri bilimi ile ilgili olarak kullanıcılar; yüksek performanslı, hızlı, kod yazım ve protipleme kolaylığına sahip, esnek ve dinamik programlama diline ihtiyaç duymaktadır. Veri bilimi dünyasında kullanılan, bir çok programlama dili vardır. Bu dillerden bazıları çok hızlı ama öğrenmesi ve kod yazması zordur. Bazılarında ise kod yazmak çok kolaydır, ancak yazılan kodların çalışma hızları çok yavaştır. Diğer dillerle karşılaştırıldığında, nispeten daha yeni bir dil olan Julia programlama dili, hem yüksek performanslı, hem hızlı, hem de kod yazması kolay olduğu için bu sorunları aşmayı amaçlayan programlama dillerinden biridir. Bu makalenin amacı da, diğer programlama dillerine nazaran daha yeni olan Julia’yı tanıtmak, istatistik ve veri biliminde kullanılan diğer diller ile karşılaştırmak ve araştırmacıların ve özellikle istatistik ve veri bilimi konuları ile ilgilenen kişilerin, Julia dili ile ilgili bilgi edinmesini sağlayıp, kendileri için en uygun dili seçmelerine yardımcı olmaktır.
Many programming languages are used in the world of data science. Some of these languages are very fast but difficult to learn and code, while others are very easy to write code for but have a very slow code speed. This is known in the literature as the two-language problem, and Julia is one programming language that aims to overcome this problem.
With regard to the field of data science, Python is mostly used in the business world, while R is more used in the academic world. Despite being a general programming language, Julia is also one of the strongest languages used in the field of data science and is even considered to be the future of scientific computing and scientific data analysis.
Julia (Bezanson et al., 2017) is a relatively new programming language that was first released in 2012 and aims to be both easy and fast compared to other programming languages. Julia has been described as working like C while reading like Python (Perkel, 2019). Julia is a dynamically typed language, which means that the type of variable is not specified when declared but rather determined while in operation. Julia can easily process large amounts of data and calculations, and manipulating, creating, and prototyping code are very easy in Julia. Julia is known for its speed and ability to easily interface with other programming languages. This makes it a popular choice for data science and machine learning applications. Julia is also one of the rare programming languages grouped with C, C++, and Fortran in the in the Petaflop Club, these languages having been able to achieve the highest performances in excess of one petaflop per second.
Julia is an MIT-certified, open-source programming language. While it operates as a general-purpose language for use in any application development, many of its features are customized for numerical analysis and data science. In other words, Julia is a high-level programming language designed for use in numerical and scientific computing. Therefore, Julia offers a high success rate for numerical computational operations. In addition to these, Julia’s syntax has a distributed (parallel) processing facility and large library of mathematical functions, similar to programming languages such as Python and Ruby. Julia is also at the forefront of many other computational fields such as data science, machine learning, artificial intelligence, and statistics.
Julia is newer compared to other programming languages. This suggests that learning about Julia and its programming language ecosystem will be difficult due to the small amount of knowledge and number of users involved in the language. On the other hand, difficulties may exist in terms of where to start or how to install and use different packages in relation to the programming language. For this reason, this article aims to introduce Julia as the newest programming language and to compare it with other programming languages used in statistics and data science. This article additionally aims to help researchers learn about Julia, especially those interested in statistics and data science, and choose the most suitable language for them.
Julia’s most important features are its speed, performance, and ease of use. It has been designed to perform like C++, have the general programming ability of Python, and the statistical power of R. One of Julia’s outstanding is its ability to use code written in other programming languages such as Python, R, Java, and C with the help of packages such as PyCall, RCall, and JavaCall. In this way, codes written in these languages can be referenced and executed in Julia. Meanwhile, for those who want to learn Julia, many printed and online resources exist that are being rapidly developed for teaching the basic features of this language, although this number is still not as high as for other languages. However, due to being a new programming language, Julia can also be said to have technical disadvantages: Similar to more established high-level languages such as Python, Julia requires a little more development to become more stable.
Ultimately, coding languages are simply tools, and as tools change and develop, they can be widely used for a time before eventually losing popularity and becoming obsolete. Therefore, focusing on the fundamentals of the field of science (i.e., the domain knowledge for statistics and data science) and most importantly how to apply them would be more important than giving too much focus to the tools used. Whether one knows how to code in Python, R, or Julia, if they lack proficiency in data science or statistics and don't know how to identify the problem or how to ask the right questions, then one’s knowledge of coding simply won’t do much good. In contrast, having a good foundation in statistics and data science will facilitate one’s ability to learn any programming language.