Blog Listing

Languages of Machines and their Makers

Welcome to thePortus! For my inaugural post, I thought I would share general thoughts on a question my students asked last semester: are programming languages anything like human languages?

Although the people have long recognized that human and programming languages share a number of characteristics, the extent to which this is true, and its full ramifications, have only just begun to be plumbed by scholars.

Humans have human languages with which we communicate. These are fractured and split, and so not all humans can communicate with one another. Machines have their internal language of expressing commands (machine code). Finally, there are the means of bridging these two incompatible means of expression: programming languages.

How are they (or math) language? You might have heard that math is the universal language. This is more than a platitude, mathematicians, linguists, and philosophers have found that mathematics contains a full range of semantic expression. Thus, computation, even on the level of machine code, is still language in some form.

  • Parsing Language
  • Language Families
  • Under the Hood
  • Interpretations
  • The Human Machine

As for programming languages, the connection is much more clear.  Someone with a training in formal logic would find that programming simply seems like a much more robust manner of logical expression. Formal logic ultimately traces its root back to Aristotelian logic, which was certainly expressed in a thoroughly linguistic manner.

You can break down programming code into a series of statements, which can be parsed just like a sentence. In fact, parsing is a common term in programming, referring to any process whereby you break a larger bit of information into discrete chunks that can be analyzed, (for example, a compiler parses a computer program into understandable machine code, a program parses the credit card number you entered into a series of discrete digits…

Though human and computer brains have very different styles of parsing, the fact remains that this is still the most fundamental system for both to interpret information. Take a look below at how we parse sentences traditionally, and how a computer attempts to break it down into semantic meaning. The first is a traditional grammar exercise of parsing and diagramming a sentence. The second is an illustration of how sentence parsing is similar to binary expression. The last image is from the Python module, the Natural Language Processing Toolkit

Let’s parse a bit of code here. Take a look at a bit of my own code. What you see is a part of a short function (we’ll talk about that another day). This bit here is receiving a variable ($rowvalue), which represents a row in a 2 dimensional array (similar in effect to row in a spreadsheet). This bit is instructs the machine to receive this row variable and to break it up into individual cells. In fact, as it goes through each cell (foreach) it creates two variables, $cellkey, which represents the column name under which the cell would be found on a spreadsheet, and $cellvalue, which represents the actual value found in the cell.

Then, cell by cell and line by line, this function takes whatever array you send to it and spits it out as a .csv file (here is more code to this elsewhere).

PHP Snippet



1|     foreach($rowvalue as $cellkey=>$cellvalue){
2|          if($firstvalue==true){
3|               fwrite($writefile,$cellvalue);$firstvalue=false;
4|          }
5|          else{
6|               fwrite($writefile,';' . $cellvalue);
7|          }
8|      }
 

First, notice the two semicolons at the end of lines 3 and 6. These indicate the end of a command statement. Although this involves far more than that line alone, these signify the end of what is, in essence, a sentence. Just like real sentences, these statements can be broken down and parsed. If you think of it like this, suddenly computers seem comfortably home in the world of humanities.

Line 1 contains a temporal clause, as it specifies (foreach) that the expression in lines below only valid at certain points along a sequence (in this case entries in an array of values). The temporal aspect is even clearer in the variation on the foreach clause known as a while clause (e.g. while (x < 3){x=x+1;} meaning, while x is less than three, add one to x, causing the program to loop until x hits 3).

Line 2 contains a conditional statement if($firstvalue==true){, which could be expressed in English as if the variable named firstvalue should happen to be ‘true’ (a ‘1’) then…. Thus, the code that is indented in the line below is only executed if the above condition is met (If so, the whole statement itself becomes a ‘1’). On line 5 you can see that there is an option for an alternate conditional expression, there could be infinitely more.

This is just a smattering, programming languages are so thoroughly flexible. Even within one language, the range of ways that one programmer might express commands (thus, the way they accomplish getting their ‘point’ across to the computer) can vary as widely as human writers. Some programmers are short and concise, others are verbose complex. Some are elegant, others are sloppy. There are many different ways to do things, and what in one situation calls for supremely minimalistic efficiency, in another demands a complex approach. As a programmer becomes more familiar with a language, they find their own style as distinctive as any author.

As much variation as you see between programmers within a language, the amount of variation between programming languages is enormous. These languages both give space and freedom to enable programmers to do things easily that would not normally be possible, but also limit them in other ways. And, just like real languages, there is no single solution. There is no simple ‘advancing’ in computer programming languages until we get the perfect one.

While things have gotten better, different languages are good at doing different things. And most often, it is precisely what makes a language good at one task that makes it bad at another. Think of the difference between the often cutting efficiency of Latin as an efficient means of expression against the enormous range of semantic meaning due to the large vocabulary available to the Greek speaker, enabling them a very different kind of powerful expression.

In programming, sometimes you want a language that gives you the power and flexibility to do the same thing a dozen different ways (PERL), and sometimes you want a language with Spartan like efficiency (Python): if there is but one way to skin a cat, you don’t need many words, make it easier to write faster, and to write less.

To give you a sense of how seriously they take this, here are a few select quotes from the Zen of Python, which reflects the aesthetic principles of the language.

Zen of Python


Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity

Sounds a lot like Latin to me! By the way, these principles apply both to how the language expresses itself to a computer, but also how it expresses itself to a person. This can be from matters as crucial as grammar and syntax, to what seem like cosmetic differences. For example, in MySQL, as noted before, whitespace (meaning what you get when you hit the space key) is insignificant. In Python, it is not, spaces, indentation all matter critically. But this cosmetic difference affects its functionality. By using spaces alone you can express the relationship of various statements to another, much like adding clauses on to a sentence. This allows coders to write code that is easier for other humans to read. This makes it so that one coder can write on one side of the planet and cooperate others on the other side of the globe.

The chart above shows the differences in major design principles behind a number of current languages, as well as how they relate to one another through their adherence to stylistic principles. These design guidelines affect the approaches in the deep grammar of these languages, and drastically affect how programmers write their code. Many coders end up having strong opinions on different language, often on the basis of their tastes viz. design principles.

Languages have developed over the years, and forked in almost exactly the same manner as human languages. There are branches and sub-branches, families and even superfamilies of languages. One can more easily move from C++ to javaScript than to PERL just as one can more easily move from Latin to Italian than to Japanese. These dialectical differences are very real, and the deep grammatical assumptions of mother languages often very much control the grammatical possibilities of their descendants, no matter how different they look on the surface (think of Indo-European).

Above are three images, the first represents the evolution of the earliest computers, from the earliest ancestor of modern computers ENIAC, to about the 1970’s. Much of this represents an era before full programming languages, forcing programmers to use punch cards to enter commands mathematically. The second is the family free of Indo-European Languages. The third represents a chart of the lifespans and timelines of the most major modern programming languages. As you can see, the chart is huge, and that only represents a tiny fraction of the linguistic diversity of the programming world.

There are languages that have become the mother languages of the majority of written code out there, and there are languages that were developed between a group of friends. There are even languages that were developed for the sole purpose of acting as an interpreter between two other languages, allowing programs to utilize more than one language at a time. Some languages are not used very much on their own, but primarily in conjunction with another language, because they act as the sort of lingua franca, allowing a highly specialized language to talk to scripts from others. In the same way that English (and other languages, thus the franca) allows speakers of hundreds of different languages to all access each other through a common middle ground. Just like human languages, the ‘common tongue’ in vogue at any one time has changed too.

Internally, machines have ‘machine code’, by which they operate on a very deep level. Machine code sets the states of every bit (short for binary digit). This bit can be set to one of two states (thus the bi- in binary), most often represented as 0’s and 1’s. These are the most fundamental unit of computer memory storage and the source of all the calculation power of modern computing. This is too abstract and complex for any human to process in anything even close to real-time. So, those scenes in the Matrix are kind of bunk… and no one is shocked.

For now, all computers run on this binary system, by shrinking individual components so that we can simply jam more circuits per square inch on every board and chip. The famous theoretical physicist Richard Feynman posited that this was one of the most central issues facing the future of computing, and that at a certain point, the rapid developments in processor speeds would slow down, eventually reaching an upper limit that would halt further technological development….

… which is why he suggested the concept of a quantum computer. At long last, it seems that researchers are closing in on making this a reality. They currently have demonstrated that the concept works, but it is far from being workable. Instead of just 2 states of a circuit (on or off), the qubit (or qbit) utilizes uses different states of electrons (32 for quantum computing), in order to make calculations that are… well… exponentially faster doesn’t even cut it. Calculations are made off of electrons that, according to Quantum theory, both are and are not at exactly the same. This same principle underlies the Quantum Multiverse theory, that differing probabilistic outcomes of these electron states are the fundamental differences which separate the different ‘dimensions’ (used loosely here) of the multiverse. Thus, using mathematics, a bit of know-how, and of course, magic, quantum computers rely on calculations involving other worlds! Don’t hit format c:\ (!)

But in order to translate the complex and infinitely flexible desires of a user into a series of machines have programming, which allows them to do two things.

  • (A) Communicate with human beings.
  • (B) Communicate with other computers.

Thus, we enter into somewhat of a linguistic relationship with machines (namely, by programming). That relationships is more two-way than you might think. Programmers find that they not only tend to think about programming in terms of the range of possibilities offered by the languages that they know, but this can bleed into their approaches to everyday life. Wittgenstein long ago noted that language both gives us the range with which to think while also setting limits on precisely the range for thought.

Just as the individual characteristics of human languages offer different opportunities and limitations, so do programming languages. Thus, the range of that relationship is determined by several factors.

  • First, the programmer’s fluency is one or more languages with which to communicate with the machine.
  • Second, the ability of a language to be able to express syntactically, semantically, and on a very deep level, grammatically, what it is the programmer wishes to code.
  • Third, the ability of a compiler or script reader to successfully parse and interpret those human commands and to translate them into machine code.
  • Fourth, the ability of all the other pieces of hardware in receiving further instructions.

As one machine (your CPU) receives input from another, much more basic machine (your keyboard, which is not a computer but is a machine), and then in turn displays the results of that input on a third machine, your monitor, very likely through an installed graphics card. This is technically another computer which receives calculation instructions from the CPU and runs the intensive mathematical calculations required for graphic displays (especially 3D), relieving the main CPU of the burden.

So, these ‘languages’ need to interpret a large semantic range expressed through grammatical relationships that are nearly infinitely flexible that can at the same time be turned into what are hugely complex patterns of binary states.

Is it so strange, after all? Think of the terminology involved in coding. First off, there is code, coding, and encoding. You might first tend to think of the meaning secret. However this actually harkens back to the original meaning of the word, from the Latin for codex. The codex was what you would today consider a book (as opposed to a papyrus scroll). Though they were around for most of antiquity, they were not really popular until the later Roman Empire.

Think of the other words involved, parsing, syntax, …. all linguistic terms as well, and not by accident. Perhaps the most ubiquitous word in computing, the program, is simply from taken from the sense meaning ‘writing’ from the Greek word programmata.1|2. We should remember that the basis for computers and mathematical knowledge is logic, from that oh-so-flexible word logos. Which is Greek for… well… word (and so many oh so many other things).

Should it really weird us out so much? After all, our brains are gigantic calculation machines. As it currently stands, no matter how impressive we think computers are (and they are), all of the data storage on all of the hard drives in the entire world does not enough information to store the data from a cubic centimeter of the human neural network.

Our brains are very much machines, constantly running millions of calculations per second that so that we can successfully walk and talk (or not, if you are me). In such a process innumerable calculations are made in the Primary Motor Cortex, about the balance and placement of the body with little conscious thought, as one section of the brain receives and interprets signals and sends them out to the nervous system for the peripheral hardware to execute (i.e. the limbs).

At the same time a different CPU on the motherboard, the Prefrontal Cortex (home of much of your conscious thought) can decide that it wishes to eat a meal, and so sends a signal to the PMC to turn around and head to the kitchen. At the same time, the PFC only decided to go the kitchen after receiving instructions from the PMC that one’s blood sugar was low.

All of these is dependent upon these different CPUs being able to both run calculations internally, and have systems to communicate vast amounts of information among one another. In addition, both use a base level of communication (the brain stem) which, like the BiOS is employed to send direct commands to the ‘dumb’ peripherals (legs, hands, eyes… or monitors, keyboards, and speakers).

So, despite the all too common trope that machines are the antithesis of humanity, the digital world and the human mind are intimately connected. From programming languages to the mechanical operations of the machines themselves, we should not be surprised to find the machines reflect their human makers.



Leave a Reply