English as a programming language

Hello from 1980!The first programming language I ever learned was called BASIC. This is ancient history now but back when I was a kid, BASIC was the gateway drug for any aspiring computer geek. As a programming language, BASIC was quite, well, basic – it consisted of a small number of keywords like DATA, READ, LET and PRINT (yes, you were supposed to write them in uppercase), you had to number your lines (which you were recommended to do in increments of 10 so you could insert additional lines later) and I’m not sure if it could even do loops. If it could, it probably had to be done with the GOTO command followed by a line number, which even back then had the elegance of a bucket of sludge.

I’m not saying all this to reminisce about prehistoric programming languages, though. The point is, I knew BASIC before I knew English. When I finally started studying English in secondary school, it didn’t feel so strange because, helpfully, the keywords were the same. I was delighted to discover that the English for PRINT is “print”, the English for READ is “read” – just like BASIC! Constructing a basic (!) sentence in English felt like constructing a line of programming code.

This is the complete opposite of how it must have been for kids of my age in English-speaking countries. The reason why pedagogical programming languages like BASIC were so popular with bespectacled kids was because they were deliberately designed to resemble English. Many kids found programming easy because they already knew English. I, on the other hand, found English easy because I already knew programming. Either way, it worked.

Of course, the parallels between programming languages and natural languages are deeper than just pedagogical usefulness. If you’ve studied formal linguistics, you must have come across the observation that formal grammars (including regular grammars and regular expressions, finite-state grammars and stuff like that), even though they were originally invented as an attempt to describe natural languages, are infinitely more useful in parsing and compiling programming languages. But that’s not what I want to talk about now. The deep spiritual parallels between computer languages and human languages don’t interest me now. Instead, I want to talk about some fairly superficial similarities between computer languages and one particular human language: English.

Every programming language has a vocabulary of keywords and those almost always come from English. This was the case with BASIC and it still is the case today in modern “industrial-strength” programming languages like Java and C#. Even declarative and mark-up languages like HTML and XSLT take their vocabulary from English. Query languages like SQL also take their vocabulary from English. The recent rise in demand for agile and functional programming has spawned a lot of new programming languages and those, again, tend to take their vocabulary from English. Has anyone noticed how incredibly monocultural, arbitrary and unnecessary this is?

In theory, the keywords of a programming language can come from any old natural language or from none at all. Previously invented formal notations, such as those of algebra and chemistry, have tended to use abstract symbols that are largely independent of any natural language. But in computer languages, it seems an unquestioned convention that the vocabulary of symbols must be borrowed from a natural language, and that this language must be English.

English is the default choice for two reasons, one cultural and one linguistic. The cultural reason is that, since the beginning of IT history, English has been the working language of those who drive innovation in the field. The linguistic reason is that, by pure coincidence (or perhaps not?), English has an extremely uncomplicated morphology and syntax which lends itself easily to the terseness required of a programming language.

Neither of these reasons is likely to go away any time soon, and so anglophone computer languages are likely to stay with us for the foreseeable future. That has not stopped certain people from experimenting with programming languages that take their vocabulary from elsewhere. When I was a teenager, at around the same time BASIC was all the rage, another home-made programming language called Karel was being promoted as an alternative for kids who wanted to learn programming. Karel takes its name from the Czech writer Karel Čapek who is credited with popularizing (although not inventing) the word robot. In Karel, you programmed a robot using keywords from Czech, not English. The theory was that kids will find programming easier to learn if they don’t have to grapple with the additional obstacle of keywords in an unfamiliar language. Karel never really took off though, and it was BASIC that became the gateway drug, even for non-English speaking teenagers. There’d be enough material there for someone’s doctoral thesis in sociolinguistics to explain why!

A more recent attempt to create a non-anglophone programming language is the Babylscript project. The language is identical to JavaScript but the keywords are allowed to be from any natural language. In theory, there can be as many implementations of Babylscript as there are natural languages. There are seventeen so far. Babylscript is good fun but, like Karel, I am not sure if it will ever mature into an industrial-strength language. After all, programmers care mainly about the design principles of a programming language; where the keywords came from is viewed as inconsequential. And in its design principles, Babylscript is identical to JavaScript.

There just doesn’t seem to exist an overwhelming reason why anybody should want to adopt a non-anglophone programming language. In other areas of computing, areas where non-programming users are involved, the English-centric origins of the industry have been, with more or less success, hidden from view by the software globalization and localization industry. This has been driven by consumer demand. There is no such demand from programmers. And so, for many future generations of bespectacled geeky kids, the first programming language they learn will be a derivative of English, like it was once for me.

P. S. The picture accompanying this article is a picture of me holding a copy of what had once been my first programming textbook. A textbook of BASIC, of course. I found it last summer by accident in a used-book sale in my family’s home town. I remember how excited I had been to read it as a kid, learning to program years before I ever laid hands on an actual computer. For all I know this might even be the very copy I used to own, as my mother had been clearing out my “childhood junk” (her terminology, not mine) that summer and had donated some old books to the local second-hand bookshop. Rediscovering it there is what inspired me to write this article.


5 thoughts on “English as a programming language

  1. Tá’s agat tuairim Djikstra faoi BASIC, is dócha?

    Ach is fíor dhuit.

  2. Some years ago my elder son and I developed a Plain English programming and development system in the interest of answering the following questions:

    1. Can low-level programs (like compilers) be conveniently and efficiently written in high level languages (like English)?

    2. Can natural languages be parsed in a relatively “sloppy” manner and still provide a stable enough environment for productive programming?

    3. Is it easier to program when you don’t have to translate your natural-language thoughts into an alternate syntax?

    We can now answer each of these three questions, from direct experience, with a resounding “Yes”.

    Our parser operates, we think, something like the human brain. Consider. A father says to his baby son:

    “Want to suck on this bottle, little guy?”

    And the kid hears,

    “blah, blah, SUCK, blah, blah, BOTTLE, blah, blah.”

    But he properly responds because he’s got a “picture” of a bottle in the right side of his head connected to the word “bottle” on the left side, and a pre-existing “skill” near the back of his neck connected to the term “suck”. In other words, the kid matches what he can with the pictures (types) and skills (routines) he’s accumulated, and simply disregards the rest. Our compiler does very much the same thing, with new pictures (types) and skills (routines) being defined — not by us, but — by the programmer, as he writes new application code.

    A typical type definition looks like this:

    A polygon is a thing with some vertices.

    Internally, the name “polygon” is now associated with a type of dynamically-allocated structure that contains a doubly-linked list of vertices. “Vertex” is defined elsewhere (before or after this definition) in a similar fashion; the plural is automatically understood.

    A typical routine looks like this:

    To append an x coord and a y coord to a polygon:
    Create a vertex given the x and the y.
    Append the vertex to the polygon’s vertices.

    Note that formal names (proper nouns) are not required for parameters and variables. This, we believe, is a major insight. My real-world chair and table are never (in normal conversation) called “c” or “myTable” — I refer to them simply as “the chair” and “the table”. Likewise here: “the vertex” and “the polygon” are the natural names for such things.

    Note also that spaces are allowed in routine and variable “names” (like “x coord”). This is the 21st century, yes? And that “nicknames” are also allowed (such as “x” for “x coord”). And that possessives (“polygon’s vertices”) are used in a very natural way to reference “fields” within “records”.

    Note, as well, that the word “given” could have been “using” or “with” or any other equivalent since our sloppy parsing focuses on the pictures (types) and skills (routines) needed for understanding, and ignores, as much as possible, the rest.

    At the lowest level, things look like this:

    To add a number to another number:
    Intel $8B85080000008B008B9D0C0000000103.

    Note that in this case we have both the highest and lowest of languages — English and machine code (albeit in hexadecimal) — in a single routine. The insight here is that (like a typical math book) a program should be written primarily in a natural language, with appropriate snippets in more convenient syntaxes as (and only as) required.

    You can get our development system here: http://www.osmosian.com/cal-3040.zip . It’s a small Windows program, less than a megabyte in size. If you start with the PDF in the “documentation” directory, before you go ten pages you’ll be recompiling the whole shebang in itself (in less than three seconds on a bottom-of-the-line machine from Walmart).

    Questions and comments should be addressed to gerry.rzeppa@pobox.com

  3. Here’s another example of a “localized” programming language: “Spiffing CSS”, a CSS preprocessor that allows you yo write Cascading Stylesheets (CSS) in British spelling (“color: gray”) instead of the W3C-standardized American spelling (“colour: grey”): http://spiffingcss.com/

    This is basically the same idea as Babylscript (mentioned above) and many others (http://en.wikipedia.org/wiki/Non-English-based_programming_languages), except at a much higher level of hair-splitting!

  4. Perhaps an example of a non-anglophile (or any human language for that matter) based computer language is PCRE (Perl Compatible Regular Expressions). Admittedly this is because of the need to intersperse regular expression commands with normal text, and it would be hard to describe it as a complete programming language, but it perhaps gives us a model for what language neutral programming language might look at.

    However, while most modern programming languages have a limited set of defined keywords these days, most real development tends to be built around one or more class libraries that provide most of the actual functionality, and it seems unlikely that these are going to stop being based around a natural language any time soon.

    Multilingual languages like Babelscript may have their uses, but an important goal for most software projects is maintainability, and the need for new developers to take over and understand existing code when one moves on. While it’s probably easy to run a Babelscript program through a filter and convert all the core commands from one language to another (or if it isn’t, it should be), that only helps a little, and if comments and variables names are in written in Polish, a French developer is going to struggle to take on the code.

    There are more native Chinese speakers than native English speakers, but the advantage of English for use by programming languages is the number who speak English as a second language.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s