Questioning the Ubiquity of Text Files

computer programmer holding text document

Text files are universal in computing. They can be found everywhere, being used to store our code, config files, logs, books, and sometimes even more esoteric places like font bitmap files, the Adobe BDF (Not PDF) format being a prime example. Whole operating systems, namely Unix and its many derivatives, are built around the transference of data through text streams and files.

And no wonder: A text file is extremely simple, just a string of characters. Because of this simplicity it can be used to encode any information without too much overhead, and any format can be read by a text editor, though semantic mileage may vary.

Since so many formats are encoded as text, there is a great amount of universality in editing. Just take your favorite text editor, be it Vim, Emacs, Acme, Notepad, or any of the myriad others, and you can easily read from and write to a great host of files. You can write a program, compile and run said program, edit a website, make a font, write an email, even send said email if you’re using Emacs, all from within one program.

This is all fantastic, but is this really the best way of doing things?

Source Code as Syntax Trees

Take, for example, code. First of all, in an abstract sense, is code really even text? Certainly not at the structural level. When code is compiled, a parser turns it from a giant text string into what it conceptually is, a syntax tree (ST). This is a tree (no surprise there) of statements, conditions, operations, and all the constructs that make up code. So why don’t we store it as such? What would be the advantages if we did, and had a specialized ST editor?

1. Faster compile times

As mentioned before, a decent chunk of compile time is spent on parsing a source file, something which wouldn’t have to be done if the code was already stored as an ST.

2. Visual structure

Different programmers have different ways of visually structuring their code, and the many aspects of this are baked into the source files: bracket placement, tab length, ++ before or after variables… the list goes on. With text files, if you work on code someone else wrote, you inherit those choices, however much you may dislike them. You can use auto-formatting, if your editor has that, but it can be error-prone and it’s a hassle to do it on a whole project, and if we get in that habit then the next person to work on the code might do the same, and so a great back-and-forth of reformatting could ensue.

That’s just a fun hypothetical, of course, but it shows the annoyance of having hard-coded visual structure. Were code stored as STs, this issue would be non-existent, as that information would not be encoded. Instead, editors could display code however the user configures them to, and not store those choices in the code itself.

3. Syntax highlighting

Most code editors have this feature, and I’m sure many programmers have had the experience of it being inaccurate or slow to update. I don’t think this should be too surprising, though. To produce the proper colors, the editor has to slog through every character of your code to make sure it doesn’t miss any brackets, parenthesis, or other characters that would impact the highlighting. Though this is usually not a huge amount of processing power, it is something to consider, especially when editing large amounts or sizes of files.

With STs the computation needed for syntax highlighting would be drastically reduced, as it would only have to look at a statement’s place in a tree as opposed to having to parse its meaning within an entire text file. Even discounting the time required to interpret the meanings of words and symbols in a text file, traversing a tree to one leaf is much more efficient than any kind of linear traversal.

4. The editing experience

Many features common to IDEs (Integrated Development Environments) and some text editors would be much easier to implement and faster to execute. Code folding, jumping in and out of code blocks, and tab completion being a few examples. There are likely many interesting navigational techniques that would be possible with STs that haven’t been implemented in current editors due to the difficulty or processing time. Current editors are generally just text buffer displays, but just imagine the possibilities beyond that.

Efficiency Matters

Though many of these improvements are about processing efficiency, which seems to many like a non-issue given the speed of modern computers, I contend that this is an irresponsible mindset which has led to many of the ills of the software world. Compilation speed, for example, is still quite important as our software grows larger and larger (which is an issue in itself).

Take a modern web browser, Firefox, for example. Compiling the entire program on my computer took about an hour. This is on a soldered SSD and a CPU running at 4GHz. 4GHz is 4 billion cycles/second, and, even using only part of the CPU, that’s a ridiculous amount of cycles in an hour. I’m not going to go into comparisons with older computers and how much they got done with thousandths of the power since it’s a bit cliché, but my point is that software optimization absolutely matters.

Other Generalities

Alright, enough with the code example. There was quite a lot to say for that one, hence why I used it as an example of how a non-text format can be rather superior to just using text. For brevity I’ll avoid getting so specific with any other file types, but instead make some generalizations.

Text files are pretty much always slower to parse, and, without the help of libraries or higher-level constructs, are usually more difficult than a specialized encoding. They also take up more space. Representing something which would just be an enum or integer as a string of characters takes up far more space. And similarly to the point before about efficiency, although we have massive storage capacities in our computers and speedy internet connections, they won’t be so large or fast for long if we choose not to care about file sizes.

The Point

This is a bit hyperbolic, but imagine if we edited images in text editors. I’m sure you can imagine the horror, so I’ll refrain from describing it. The idea that we should treat so many data types as strings of characters is limiting, and there are likely countless editing and visualizing methods that we’ve missed out on just because we’ve restricted ourselves to thinking about lines of text and little else.

In terms of editors, I think it is quite doable to have editors be extendable to support, at the very least, converting to and from text as an intermediary for other file formats. Editors like Vim, for example, can easily open folders and archive files as text and interactively open their sub-files and sub-folders. Beyond that though, as stated before, I think there is much untapped potential in the realm of file editing just waiting to be uncovered, and many fascinating routes that more specialized editors could take and normalize.

I suppose the idea I’m trying to get across is to not be afraid to use more unique solutions, and to not take for granted the irreplaceability of the tools we currently use. Only by questioning what is so often thought of as natural and optimal can we improve things.

← Previous article Next article →

← back to main Blog page

Questioning the Ubiquity of Text Files

Source Code as Syntax Trees

1. Faster compile times

2. Visual structure

3. Syntax highlighting

4. The editing experience

Efficiency Matters

Other Generalities

The Point

Written by Aiden Foley

Dokku and MySQL Root Control

VueJS and $forceUpdate()

Takeaways from Networking@Rev: Empowering the Ecosystem