The Lure of Graphical Programming
Every so often some programmer somewhere enters a strange mood and begins working on what they perceive to be an utterly brilliant and perhaps even original idea - a programming language unconstrained by the oppressive limits of readable text, that instead takes on a more graphical form, usually with a bunch of colored boxes connected by lines, paired with a drag-n-drop UI.
Usually it is claimed or at least implied that this will make code much easier to understand, perhaps to amateurs and non-programmers who might be intimidated by text-based coding, but anyone who tries to use these inevitably just develops a giant, messy hairball of a program that is the exact opposite of “easier to understand”.
Of course, one often-toted “definition of stupidity” is “doing the same thing repeatedly and expecting different results”. This, and similar ideas, have been tried many times, and have yet to catch on as any serious, common engineering tool.
Are there some that have found niche success? Sure. But if they were really all that revolutionary or superior to text-based programming they would have become dominant a long time ago - LabVIEW is only a year younger than C++.
Composition, Programming the Visual Cortex
There is a certain allure to the idea of graphical programming. At the very least, there seems to be an intuitive sense among people that it’s better in some way, even if it’s difficult to elucidate why.
Contrary to how many programmers often seem to think of it, the human visual system is not merely a camera with a grid of pixels. The majority of the eye’s “pixels” are focused in a tiny, very-high-resolution region called the fovea at the center of the visual field. This is sorrounded by low-resolution peripheral vision filling the rest of the visual field.
Human vision works very differently from how, say, deep neural networks accomplish similar tasks. The visual cortex invests most of its resources into categorizing little pieces of images at a time, and uses the peripheral vision as a mix of low-resolution categorization and as a tool for guiding saccades, which are tiny, rapid eye movements to scan the fovea to different points of interest.
The effect is that vision is not about big grids of pixels, symbols, and details, but rather many little grids arranged in space, read as they are demanded.
This actually makes human vision an incredibly interactive system - the eye focuses on what it needs to, and can locate and scan different parts of objects based on what is currently relevant or reduces the most ambiguity.
There are many ways to convey meaning in the composition or arrangement of details across space within an image. The visual arts have been exploring these ideas for a very long time. Things like the “rule of thirds”, various methods of arranging shapes, textures, empty space, and colors, and the introduction of leading lines have been well-understood for centuries.
I find leading lines particularly interesting, as the eye will often latch onto and follow them, settling wherever they lead - putting the most important details of an image in locations where many lines or curves converge is very effective in conveying meaning and making an image easy to intuitively understand.
I like to think of it as the visual system having a certain kind of algorithm for navigating scenes, and tools like composition are essentially a visual tool for “programming” it to produce meaning.
These are powerful, thoroughly-tested tools for conveying meaning to the human visual system. Unfortunately, modern programming makes extremely little use of them.
Indentation
We should keep in mind that text-based programming already has at least successfully conveys a little meaning spatially however - that being indentation. Granted, it’s really only used to denote scope and nesting, but I find it shocking how nearly every language makes use of it, often in almost exactly the same ways, while programming language theory seems to have paid it little to no deep attention.
If anything, indentation is an anomaly - something based in rules completely foreign to formal programming language theory, but that persists more stubbornly than almost any other feature because it just works so extremely well. It’s worth taking a deeper look at this anomaly and considering if there’s more to learn.
Syntax highlighting is another slightly interesting thing, leaning slightly on some color theory. It’s definitely an improvement over monochrome, probably by letting the low-resolution peripheral vision more easily narrow down on points of interest for upcoming saccades. I’m not convinced we’re anywhere close to using it optimally though.
I think that a few esoteric languages like AsciiDots and Befunge present an interesting idea, giving syntactic meaning to long chains of line symbols, turning code into a form of ASCII art filled with meaningful leading lines.
Do I plan to copy this idea for Bzo and make the language an ASCII art path-based language? Not exactly - AsciiDots suffers from the same hairball problem that other graphical languages have, but with an even more tedious UI than drag-n-drop. However, reasonable ways of introducing leading lines into the syntax to guide the eye to important details is something I at least will be very seriously considering once backend stuff is done and syntax is a more immediate concern.
A Mountain Through a Microscope
One other factor that I think people at least subconsciously miss that I suspect might partially drive this allure for graphical programming is the lack of any ability to “zoom out” of a codebase.
The human eye can view anything from the not-quite-microscopic if we look down at something closeby up to cosmic scales if we simply look upward to the sky.
If we want to view a mountain and get a good idea of its overall structure, the most natural way to do so is to look at it from a distance.
A remarkably stupid idea would be to walk around the mountain inspecting individual rocks with microscopes and magnifying glasses, inspecting rocks one at a time, and trying to guess at and gradually map the overall structure of the mountain that way.
Yet, when we are viewing code in a text editor, the individual characters of text correspond to individual bytes of memory. Very little exists that can meaningfully zoom out on a whole codebase at a time.
Sure, some IDEs have “minimap” features, but those tend to be limited to individual files rather than focusing on whole codebases. Congrats, you’re looking at a few boulders at a time now instead of individual rocks.
A Little Demo
I have a tool I’ve been building a little bit on the side for a couple weeks. Right now it just loads files and displays a couple visualizations of the first 256kB or so, the main visualization being a hilbert curve. Each pixel corresponds to an individual byte, colored based on the byte value and some derived properties of it. The initial goal is to have something useful for visually inspecting the binary structure of arbitrary files with (eventually) many more options than a normal text/hex editor.
It’s not a terribly impressive tool yet - I’m busy with many other things right now - but I think even in the current primitive state it gets my point across.
Hilbert curves are a pretty nifty tool, being able to map a 1D space to a 2D space, turning long, pixel-thin lines into easily-recognized blocks. This is great for visualizing large amounts of data in the form of a simple, linear sequence and getting an intuitive sense of contiguous chunks.
Files are just linear sequences of bytes, and code really isn’t any different aside from often consisting of several files.
Here is a PDF file drawn with my tool - the green spots seem to be PDF script text. The black spots are all blocks of zeroed bytes, I’m not sure why they’re there - perhaps suboptimal packing code? The teal background seems to be all the compressed data that makes up the PDF, and has some interesting structure of its own. Some regions appear to just be high-frequency noise and others, such as the lower-right quadrant, seem to have more low-frequency, longer-range structure. My guess right now is that the more structured teal regions are compressed images.
If you know what to look for, just staring at this image for long enough tells almost as much of a story as the actual PDF when properly read.
With some more work, this could probably be a fairly useful tool. Bigram plots can be really useful for categorizing data based on byte values. The tool already computes bigram-bitvector embeddings for 256-byte segments of the file, and I’ve started work on a k-means classifier, which should allow for the computer to automatically categorize distinct parts of the file that “look similar”.
I had a side project a couple years back trying to do this with large codebases - I had a similar hilbert curve plot of the entire Linux codebase, visualizing a whole gigabyte of code in one image. I’d have to do some digging through some old hard drives sometime to find that project though.
The nice thing about doing it with code rather than raw binary data is that there’s a lot of easily parsed structure in text-based code - identifiers, keywords, comments, indentation, parentheses/brackets/braces, operators, literals, and so on. A much wider and richer set of tools for deciding exactly how to color the big array of pixels should be possible that are more specific and useful to understanding code than relatively crude tools like plotting raw byte value, or k-means clustering.
If you want to zoom out on your code, you don’t need to limit yourself to just one method of doing so; much like the interactive nature of the human visual system, an interactive approach to zooming in and out and adjusting the color-coding of the map would undoubtedly be very valuable.
And of course, all of this says nothing of the many other visualization methods beyond hilbert curves.
Overall, I don’t think what is needed is any more drag-n-drop lines-n-boxes graphical programming languages. Text is a pretty robust medium and it’s far from its actual limits if we’re willing to be a little creative (and maybe take some cues from ASCII art).
Could someone actually make a graphical programming language that works? Sure. I’ve had a few rough ideas of my own in the past along those lines. For any chance of success though, something actually unique needs to be done.
The real value however is not in replacing text as a programming medium, but rather augmenting it with more visual tools to produce understandings of code that would be impractical to gain from merely reading the code.
Thanks for reading my article today. I might have some big announcements soon, so keep an eye out. If any readers are in Austin for SXSW this week, maybe I’ll see you around town - I don’t have tickets for anything, but I do have friends who know about some of the after-parties and will likely be at the FUTO event on Friday.