Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Interview Transcript – Donald Knuth [pdf] (philipkiely.com)
170 points by asicsp on Dec 31, 2020 | hide | past | favorite | 38 comments


"Volume 2 of The Art of Computer Programming was the first big test case [of TeX and Metafont]. I was looking forward so much to the day when I got my first copy and I could open it up and I could see, “Oh, yes, my project has succeeded.” But it was maybe the worst day of my life because it looked awful to me...."

As the guy who ran the Alphatype CRS phototypesetter to generate the final camera-ready copy for ACP Volume 2, 2nd edition, let me add a little color to this. Pages were (slowly) generated on large sheets of special photographic paper (maybe 2ft by 3ft), at real size (so six pages per sheet) at an effective resolution of 5000dpi. The results, after being run through a normal 3-bath b&w developer and dried out, were then shipped off to the publisher. There they were photographed and processed into metal printing plates for the offset printing press, which actually applies ink to paper, creating the pages that are bound into books.

The point being that each of those subsequent steps (photography, making metal plates, applying ink, pressing into paper, plus the specs of the particular ink and paper being used) acts as a filter on the image. So, the book you get back doesn't look exactly like the camera-ready copy you provided; not only might the type look lighter or darker, but there may be other effects like "fill-in" of little crevices like the inner wedge of a "v" or where serifs are attached. Old-time punch cutters knew how to anticipate this sort of thing, but it takes experience. Of course, Knuth knew of such things, and got lots of expert assistance, but still it wasn't right on the first time through that final step that we didn't control.


Wow, thanks for that story and those details! Always exciting to hear about those times. Knuth also mentions this "worst day of my life" incident (without much detail) in his Kyoto Prize lecture[1] and Web of Stories interview[2], and (with slight details similar to what you said) in his 2007 CHM Oral History.[3]

Only loosely related: just a few days ago there was a historical article posted online[4] about the Digital Typography group at Stanford in the 1980s, where much of the collaboration happened that led to the final versions of those fonts.

[1] https://web.archive.org/web/20190523200235/https://www.kyoto... or https://web.archive.org/web/20201111041445/https://www.kyoto... — search for “burned with disappointment”

[2] https://www.youtube.com/watch?v=b5wb9TEHkCI&list=PLVV0r6CmEs...

[3] Pages 44–46, especially 46 at https://archive.computerhistory.org/resources/access/text/20... transcribing roughly 1:17:00 to 1:26:00 of https://www.youtube.com/watch?v=gqPPll3uDa0

[4] https://doi.org/10.1016/j.sheji.2020.08.006 (I actually suspect that the author, from the Design program at the time, may have missed/misunderstood some details, but I'm not confident enough to say what exactly, if anything, is wrong.)


Lining up this interview was a process that spanned the latter half of 2020. Professor Knuth has not maintained a public email inbox since 1990, so I typeset a letter in LaTeX, printed it out, figured out where to put a stamp on an envelope, and sent it to Professor Knuth's office. Some months later, I received an email from his assistant, saying that he would be interested in doing an interview, but wouldn't be available until December. Well, a couple of weeks ago I found myself on the phone clutching a list of questions that I prepared in consultation with a friend of mine pursing a Ph.D. in computer science at the University of Chicago, and thus could understand not just the pictures in The Art of Computer Programming, but also most of the words (where my abilities begin to falter).


Thanks a lot for conducting this interview, transcribing it, and sharing it! It's lovely, and covers a lot of interesting/new ground; your questions are well-chosen. I've read/heard/watched a lot of Knuth interviews the last few years (there's a list of some of them at https://cs.stanford.edu/~knuth/news20.html#:~:text=Oral%20hi... and at https://tug.org/interviews/#knuth), and he tends to be remarkably consistent in his answers, but you've managed to ask interesting questions that elicit new answers. :-) And of course even the repeated stuff is still interesting.

Yes reaching him can be hard, as joked about by Wilf on his 64th birthday (https://www2.math.upenn.edu/~wilf/website/dek.pdf). :-) If I understand correctly, this interview was conducted over the phone? If so, I also imagine that transcribing it must have been quite a bit of effort; his speech tends to have a fair bit of "backtracking" / "buffering" as I've found when I tried to transcribe something a while ago. So thanks for producing something so readable.

Some notable (to me) highlights, for anyone else who has not yet read the linked interview:

- The story of how The Art of Computer Programming came about. He was approached by the publisher in 1962 when he was a recently married second-year graduate student, to write a book about how to write compilers. He thought it needs more background and came up with a 12-chapter outline with Compilers as the final chapter, and thought it would be done before his son was born (1965), as of 2020 it's less than half done. :-) He actually had a handwritten manuscript of 3000 pages completed in early 1966 and thought that when printed it would be about 400 printed pages, but he was off and it would have been 2000 pages if printed. So the plan became 7 volumes. (He goes into more detail in his 2007 CHM Oral History for example: https://archive.computerhistory.org/resources/access/text/20...)

- How he tries to make precise statements ("maximize my chance of error"), and writes programs to see for himself which methods are faster ("I can't write about something unless I've experienced it personally"). And how he does this while staying "smooth", adding technical jokes, and trying to be surprising! His motivation is always to share: "Isn't this cool?"

- On staying motivated: his simple motivation is that he enjoys learning and enjoys telling others what he's learned. His main computer is also not connected to the internet; he goes to the internet-connected computer only to upload stuff.

- He actually doesn't seem to worry at all about impact or importance; he just writes stuff that he needs or feels ought to exist. (I think his taste helps here!) He's unbothered by what others might consider failures of adoption (Metafont, literate programming, his composition Fantasia Apocalyptica?).

- His usual thoughts on literate programming, e.g. "by taking a little bit of extra time to explain what the code is doing, I get my thoughts in order and I make many fewer mistakes". (Aside: He mentions the PBRT book (http://www.pbr-book.org/) and thinks there is other software being written at Pixar/elsewhere using literate programming; does anyone know if this is actually true? Or is this program/book an exception?)

- He mentions some problem in constraint satisfaction (the section of TAOCP he's currently working on) on pp 14–15, maybe someone can guess details of what this specific problem is?

- One finding ways to enjoy what you're doing. (He's mentioned here, and elsewhere too, how he's never bored; how one can always find some way to enjoy things. Recall the example of buying janitor uniforms to make toilet-cleaning at home more fun! From an earlier article, discussed here: https://news.ycombinator.com/item?id=22889195) He's not bothered by commercialization/things being spoiled for example; he just focuses on what's beautiful and one can be proud of.

(A few typos I noticed in the current version of the PDF: "Apocalypitca" (p1), "wonderfull" (p9), and in a few places, words are joined it appears that (probably) "—I" (an em dash followed by the word "I") turned into just "I": "I was blown awayI" (p3), "core memoryI" (p4), "practiceI" (p4). And the phrase "from two different slaps" (p6) sounds odd; sure he didn't say something else? Then again, maybe that's exactly what he said, things being slapped into one's brain.)

Good luck with your book too!


> He's unbothered by what others might consider failures of adoption ...his composition Fantasia Apocalyptica...

He somehow figured out I liked Bach and so a couple of years ago played me some of it (a recording) then we went over the score. He is a much better organ player than I could ever hope to be! Naturally the score was designed to reflect the book of revelation, so you could see elements of music that keyed off that text but it's sort of conventional and unconventional at the same time. It was fun to read as the music is sort of witty, but I don't think I'd enjoy listening to the whole thing.

I've only known two people who had organs at home, Alan Kay and Don Knuth. Steve Jobs bought a house in Woodside that had an organ in it, but I heard he removed it. I never heard that he played music himself.


Ha! On that composition: I have no ear for Western classical music, and familiarity with neither the organ as a musical instrument nor the Bible's Book of Revelation, but even so, I can guess this project is… unique. He calls it (https://cs.stanford.edu/~knuth/fant.html) a "somewhat literal translation" of the book into music. So for example he goes over each verse of the book, and (almost) each time it mentions "angel" he plays an 8-note arpeggio, every time it mentions "blood" he has "coagulated" (clustered) notes, and so on, setting up "dictionary" of 100+ motifs (https://cs.stanford.edu/~knuth/motif-dict-utf8.html). He explains it in these videos (https://www.youtube.com/watch?v=-JvdanSygfg, https://www.youtube.com/watch?v=f1R9eY-8jL4, https://www.youtube.com/watch?v=e_1a6bHGQGo). Looking through his list of musical allusions (https://cs.stanford.edu/~knuth/rev-cite) I see things like Bruce Springsteen, even a washing machine's power on/off cycle sound. He mentions a couple of pieces of rap music for verse Revelation 3:20; maybe the reason is the same joke as https://xkcd.com/133/? Overall, the project seems so crazy (is this the way anyone writes music?) that one can't help being impressed.

The full performance is here: https://www.youtube.com/watch?v=MjxsZa3Ylao — haven't listened to it, but online I've seen both some extremely negative and some extremely positive comments. :-)

I have heard that he is a skilled organ player as you said; just earlier while searching for something else I saw a blog post about Rajeev Motwani's memorial (http://blog.geomblog.org/2009/09/rajeev-motwani-memorial.htm...) which mentions “Don Knuth was the organist for the entire ceremony, and played pieces you didn't think could be played on a church organ”.


Asking questions that elicit new answers is the #1 thing I focus on in an interview, so I'm glad to have accomplished it here. I think that if you're doing an interview and don't cover new ground, it just wastes everyone's time.

Asking new questions, just like transcribing, just requires focused effort! I think interviewing is a really cool skill because anyone can get good at it just with practice!


that is an amazing story. once again you're showing how to hustle well beyond what most people think their limits are. well done.


Having never heard of literate programming, I started poking around.

"Physically Based Rendering"'s intro to literate programming can be found here: http://www.pbr-book.org/3ed-2018/contents.html (chapter 1.1)

And then answering my own first question about this: how is this different from copious comments? (Source: Wikipedia)

> Literate programming is very often misunderstood[12] to refer only to formatted documentation produced from a common file with both source code and comments – which is properly called documentation generation – or to voluminous commentaries included with code. This is the converse of literate programming: well-documented code or documentation extracted from code follows the structure of the code, with documentation embedded in the code; while in literate programming, code is embedded in documentation, with the code following the structure of the documentation.

> This misconception has led to claims that comment-extraction tools, such as the Perl Plain Old Documentation or Java Javadoc systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.[12][13]

I was then curious if any tooling has been created for literate programming in TypeScript. I haven't given a close look yet, but here are some relevant links:

- https://github.com/microsoft/TypeScript/issues/31364

- https://johtela.github.io/litscript/

- https://mistlog.github.io/ray-tracing/index.html


I've done a few projects using literate programming, a couple of which were fairly large for a single developer.

If you use noweb, it doesn't care about the language you use. You can even have a section in the document which contains chunks of code in different languages and how they interact. For example, one chunk could be a part of the render function of a React component, the next chunk could be the Flask API endpoint which generates the data displayed by the component.

Literate programming has its problems, and it is made less necessary with modern languages and tools, but I've repeatedly had the experience where it was LP that allowed me to return to a project after a few months and know exactly what was going on and where to continue.


My earlier post inspired me to push one of my Literate Programming, if inauspicious, projects to my public github.

https://github.com/AndrewOwenMartin/nestor

This repo implements an all-but-unknown ANN called NESTOR, but you can use it as an example of how it would be relatively easy for someone who knows nothing (or has forgotten everything) about NESTOR to contribute to the project. I make no claims that it is a good example of such.


I've been deeply fascinated by the idea of literate programming, but every time I dive into the programs themselves, they leave something to be desired.

First, though, I glossed over the PBRT shoutout and thought the Academy Award mention was about lkesteloot's team's award, until I scanned back and saw that it was about PBRT. To qualify that, here's one of lkesteloot's comments from his Oscars page on his personal site:

> Our renderer and lighting tool were individually good, but lots of renderers and lighting tools are good. What made ours unusual (and it’s unusual in this respect to this day) is that they were designed and built at the same time, with each other in mind. <https://www.teamten.com/lawrence/oscar/>

It might not be obvious, but this really touches on a part of the philosophy underpinning literate programming. If you look at any of the recent discussions about Knuth being "framed" in relation to the seminal Programming Pearls guest column, one of the recurring themes is programs as monoliths. In a really spectacular post "An app can be a home-cooked meal" <https://www.robinsloan.com/notes/home-cooked-app/>, the author explains his own work and likens it to Shirky's description of "situated software". Shirky writes, "Situated software isn't a technological strategy so much as an attitude about closeness of fit between software and its group of users" <https://web.archive.org/web/20040411202042/http://www.shirky...>. This obviously sounds a lot like PDI's renderer and very much characterizes Knuth's monoliths. I always liked this quote from literateprogramming.com:

> The fundamental logic of the WEB system encourages "top-down" programming and "structured" design. Quoting from Kernighan and Plauger, 'Top-down design and successive refinement attack a programming task by specifying it in the most general terms, then expanding these into more and more specific and detailed actions, until the whole program is complete. [...] The WEB system encourages you to work top-down by giving you the ability to break up your code into independent segments (called "sections").

Coincidentally, in addition to lkesteloot's Oscar comments, he's got a pretty great essay called "Write code top-down" which really delves into the matter <https://www.teamten.com/lawrence/programming/write-code-top-...>. Opinionated and well worth the read.

As for this interview, PBRT, and Knuth's flavor of literate programming: the PBRT book is available for free online <http://www.pbr-book.org/3ed-2018/Introduction/Literate_Progr...>, and the intro starts out with a section on literate programming and the WEB/noweb family in particular <http://www.pbr-book.org/3ed-2018/Introduction/Literate_Progr...>. This, I think, is already adequate to highlight some of the issues with the WEB/noweb implementation of literate programming. It's still too mired by constraints of C (or maybe since the language isn't technically C, and Knuth is also known for literate programming with Pascal, we should chalk it up to something akin to Sapir–Whorf?). akkartik has a good essay on the matter of Knuth's programs <http://akkartik.name/post/literate-programming>. akkartik:

> just about every literate program out there begins with cruft like this: `// Some #includes` or: `-- Don't mind these imports.` I used to think people just didn't understand Knuth's vision. But then I went and looked at his literate programs.

Indeed, I've also looked at Knuth's programs and felt a similar twinge. There's the seminal column, available from literateprogramming.com and linked from akkartik's post. Having tried to read it, a big issue there is that the problem just isn't very interesting in my opinion, at least on practical grounds. I've mentioned before that I think a literate programming advocate and true believer would do well to do something like publish a piece that focuses on implementing support for DEFLATE and the ZIP file format, which strikes me as a much stronger hook than a program to "print the k most common words in the file in decreasing frequency" or "Dijkstra's program [that] prints a table of the first thou-sand prime numbers". No matter, though. There are "real" programs published by practitioners of the style, right? How about Knuth's rendition of Colossal Cave Adventure? <http://literateprogramming.com/adventure.pdf> Once again, we're really running into sharp edges of C that WEB—or at least this particular text—doesn't do a good job of softening up, for the reasons akkartik rightly points out. Just look at the second page and right away with `main` we've got unexplained a priori structure, including stuff like local variables j, k, and p.

Other programs I've looked into are lcc, and the glimpses I've gotten of Inform 7. On my first encounter with the former, I was motivated to jot down in my notes a proposed law that said, roughly, a poor/failed attempt to write a literate program will produce a piece of text that's harder to understand than if it had just been written "straight".

Still, I'm hopeful that there's something there to be found in literate programming. In fact, I'm pretty sure there is something there. I wrote a recent comment soliciting info and critiques/criticism about literate programming for pedagogy <https://news.ycombinator.com/item?id=25498685>.

And it's funny that you mention TypeScript. As I said, I think part of the inadequancy of CWEB comes down to C. C is fundamentally opposed to one's attempts to practice top-down programming—one of the first things that "every" programmer learned, back when "every programmer" meant programmers who learn C, is that when you read a C program, you're supposed to start at the bottom and kind-of-sort-of read it backwards. Although I suppose there's an argument to be made that were it not the case for C (or Pascal) inflicting this upon you, then Knuth may never have been sufficiently motivated to undertake his literate programming crusade in the first place.

I've had niggling thoughts that a language like JS with support for forward declarations in the form of function hoisting and its dynamism at runtime, including allowance for function redefinition (or at least if not JS, then something with similar properties) would be a more perfect match for the top-down style associated with literate programming. TypeScript, though, might be a little too strict that causes it to suffer for similar reasons as C. Also the fact that JS's roots lie in a (much maligned) system for conveying documents (as opposed to an application platform) makes the fit even better. It's just that nobody seems to be noticing this feature-not-a-bug, because they've been focused on trying to recreate mobile app conventions in the browser. And I've actually sat down to experiment and play around with the affordances that could prove TBL's Web (as opposed to Knuth's WEB) a more attractive substrate for a literate programming system. More recently, I've gotten more serious about actually testing the limits of the medium while thinking of things to put together that are meant to be consumed by other people as pieces of written works to be read from top to bottom in Knuthian style, but it comes down to a matter of having enough time.

A final note is that in Bob Nystrom's post on 'Crafting "Crafting Interpreters"' <http://journal.stuffwithstuff.com/2020/04/05/crafting-crafti...>, he more or less lays out the case for a literate programming system to help with actual book authoring.

I hope that if you or anyone else decides to take an interest and try hashing something out that you're successful.


You are right that literate programming first occurred to Knuth as a way to overcome the deficiencies of Pascal. Specifically, it was the product of a few things:

1. The first version of TeX was written in 1978 and spread like wildfire in a couple of years: ported by others to 200 programming environments i.e. (OS, language) pairs! So when he started writing his new version in 1980 trying to be portable to all those OSes, he was basically forced to use Pascal, which was at the time in widest use at universities for teaching, and thus most widely available on computer installations. (C was still confined to Bell Labs and maybe a few other places.)

2. Tony Hoare had suggested that the source code should be published as a book. This raises questions of how best to present messy real-world code written with real-world constraints (rather than "clean" toy examples).

3. Pascal, especially the subset of Pascal he was targeting (the common denominator across Pascal runtimes/compilers in use at the time) was extremely limited and had poor support for strings, array parameters, even control-flow (no "break" and "continue" in loops, and goto "labels" had to be numeric); moreover doing anything nontrivial practically necessitated many global variables that had to be initialized at the start of the program. I've written a bit more about this at https://shreevatsa.github.io/tex/program/pooltype

4. This is a joke, but he was/is also annoyed by the anti-goto moralists championing their vision "structured programming" (pejoratively implying anything else is unstructured), so he called his vision "literate programming" (as no one wants to be accused of writing illiterate programs).

So after some experiments he was led to the idea of a macro preprocessor, with sections that would be reordered in the way needed by the compiler, but could be written in the order the programmer wanted to write or present them.

As for monoliths, partly that's how Knuth seems to think. And he seems to have no trouble keeping a lot of program detail in his head that would overwhelm most of us, and he has always been a machine-level programmer at heart and is acutely conscious of wasteful stuff like—a real concern at the time—subroutine overheads. Even in this interview he mentions things like APL versus his preference of languages closer to what a computer actually does. And partly the Pascal subset he was targeting dictated the program structure: the compiler should get all its code in a single file the way toy programs are written, like

    var <declare lots of globals here>;
    <declare lots of functions/procedures here>;
    begin
      <initialization code>
      <rest of "main">
    end.
(In fact his earlier non-literate TeX78 in the SAIL language was written as a bunch of separate files/modules, rather than a single 25000-line file like the literate/"better" tex.web.)

So that's a bit about the genesis of literate programming. Now about some of your other comments:

- Your quote about top-down programming from literateprogramming.com is from the author of fweb, but actually Knuth himself doesn't advocate (or practise) top-down programming. One of the things he likes about literate programming is that the program can be written in a mixture of bottom-up and top-down programming without being forced to commit to either; from studying many of his programs it's clear that his preference is to build things mostly bottom-up, though each small "layer" he may write top-down and sometimes he may switch it up. (For example, in his TeX program he starts with "chapters" on string handling, arithmetic, memory allocation, data structures etc, building up to the main program as the climax of the book. Each "chapter", which is sometimes just a single function, is written top-down, more or less.)

- You linked to how the PBRT book introduces literate programming, and said it shows noweb/LP is "still too mired by constraints of C", but to be clear, it's just that the example they used for the idea of rearranging sections is one of overcoming some constraints of C using LP. (It's not really specific to C though; the same thing would apply whenever one had a lot of initialization to do in different places, like the giant object constructors one can see in many C++ codebases.)

- Tastes are subjective, but IMO the program he wrote for Bentley's question of "top k most common words" (not his choice of problem!) is actually very interesting: it's an exposition of an ingenious data structure called a (packed) hash trie that has, to the best of my knowledge, never been described anywhere else, and which he probably made up on the spot for this problem. (Packed tries are described in an exercise in TAOCP and the TeX program uses them to store hyphenation patterns—his student Liang's thesis was about this—but packing them randomly using an open-addressing hash table seems to be a twist particular to this program.) I was planning to write a blog post last year illustrating this with pictures and all that, but never got around to it. I did write a translation into (non-literate) C++ here: https://codegolf.stackexchange.com/a/197870

- Returning to a couple of your other comments and the phrase "top-down style associated with literate programming": I think another way of looking at this is historical. When Knuth learned to program, in machine code on an IBM 650 (I say this was the first and last generation of truly self-taught programmers) and later writing compilers in machine code (or with his own assemblers) for other machines, even the idea of subroutines (with standard calling conventions and all that) was hardly well-established, which is why TAOCP Chapter 1 has, under 1.4 "Some Fundamental Programming Techniques" an explanation of what subroutines are, followed by coroutines and interpretive routines. When "Structured Programming" took over in the 1970s (just the idea of using if/while/for and subroutines instead of arbitrary control-flow jumps), it was a revolution, and even that Knuth is not still fully on board with. (If anyone is interested I'll try to string together a blog post on all mentions I've seen of Knuth's opinion on "goto". Some juicy bits, e.g. https://twitter.com/svat/status/885344334735745024 https://twitter.com/svat/status/913114286951505920 https://twitter.com/svat/status/1336935057877991424.) Since then, the programming / software-engineering world at large has settled on certain ideas of structuring programs, in order to keep complexity manageable: the use of abstraction, encapsulation, modules, information-hiding, objects, avoiding global variables, etc. In some sense what Knuth came up with for himself with WEB/CWEB is a wholesale alternative to all these ways of structuring programs, bypassing the mainstream. This seems to suffice for him, but for the rest of us it's a highly unfamiliar/unpalatable way of writing programs. This doesn't mean "Knuth is doing it wrong" in the sense of Kartik's post, because what you should take away as "Literate Programming" from reading his programs is not the diff w.r.t. how a modern programmer would write it, but the diff w.r.t. how he would write it without LP. That is, if you take for granted that the program is going to have very few functions, lots of global variables, etc., the question is whether LP is a better way of writing or presenting that program than non-LP. (It seems the answer is yes, though even this is debatable. Besides, Knuth does say that without LP he'd probably write more levels of functions but with LP he doesn't "have to", so you may disagree with the question too.) You should instead look at examples of programs that a modern programmer familiar with modern idioms wrote with and without LP. I imagine these days one might have better results searching for people using org-babel (rather than cweb/noweb), say. I see also there's https://github.com/AndrewOwenMartin/nestor mentioned in another thread above, or "nbdev" that came up recently (https://www.fast.ai/2019/12/02/nbdev/ , https://github.blog/2020-11-20-nbdev-a-literate-programming-...).

- But that apart, returning once again to Knuth's own literate programs. I have struggled a lot with them. Many people have easily understood much more than me, but I suspect I have struggled more than anyone :D I had some grand plans to make it more readable etc (https://shreevatsa.net/tex/program), but eventually they started making sense. (I should update those pages…) The point is, literate programming still does not mean writing code in a vacuum for readers who know nothing. Knuth does not consider explaining basic language features (as you pointed out with the INFORM example in another comment here), and moreover he also assumes familiarity with his LP conventions (the overall structure of the program, how to use the index and flip back-and-forth) and even I think his usual programming conventions. So the fact that the ADVENTURE program has the source file's outline (with #includes and everything at the top) on the second page, with a typedef of the "boolean" type and a couple of general-purpose variables declared at the top of "main" (which will probably be (ab)used for lots of different purposes, I expect)—all of these are not a serious problem, for the "literate" reader. To Knuth, literate programming does not mean belabouring and explaining everything; it's just a way of structuring your program. And though he mentions reading the program like a book, he doesn't think of books as things to be read strictly word-for-word in the sequence they appear; I expect he imagines reading any book non-linearly (looking up the index, flipping back-and-forth, skimming, etc) and that's how the programs are to be read too. The order of sections in the literate program is only more or less the "right" one, not strictly.

About the ADVENTURE program in particular, which is a useful case study, let me post a separate comment (as reply to this one), as this is already very long.


So with that understanding in the comment above, a case study of the ADVENTURE program (http://literateprogramming.com/adventure.pdf or download from https://github.com/shreevatsa/knuth-literate-programs/blob/m... for working hyperlinks). I think that even though the program was written in the order it's written (and he thinks best read that way), the following is the structure of the ADVENT program (and one is supposed to glean that first). The whole program is divided into numbered sections, grouped into what I'll call "chapters". They are:

- In "Introduction" (sections 1–3), the program's overall outline is given,

- In "The Vocabulary" (sections 4–17), we have data structures for storing words (the hash table), how they map to motions and objects and other responses from the program,

- In "Cave Data" (sections 18–20) there are the 100+ locations one can be in during the game,

- In "Cave Connections" (sections 21–62) there's the game's "map" data: how you get around, what messages you get, etc,

- In "Data structures for objects" (sections 63–68), there's data structures for the list of (game) objects at a given location, for groups of objects, and object properties (like its name and in-game note),

- In "Object data" (sections 69–70) these are populated,

- "Low-level input" (sections 71–73), is just asking yes/no questions and taking in commands,

- The next chapter, "The main control loop" (sections 74–91) begins with some comments very revealing about the author's programming style / thinking:

> Now we’ve got enough low-level mechanisms in place to start thinking of the program from the top down, and to specify the high-level control. […] Here is our overall strategy for administering the game. […] The execution consists of two nested loops: There are “minor cycles” inside of “major cycles.” Actions define minor cycles in which you stay in the same place and we tell you the result of your action. Motions define major cycles in which you move and we tell you what you can see at the new place.

etc. And this chapter, in turn using many of the following ones, is what populates one of the main sections that were mentioned in the outline of the program on Page 2. There's parsing of the user's input, dealing with edge cases, until we figure out where to `goto`.

- The next chapter, "Simple verbs" (sections 92–103) starts with “Let’s get experience implementing the actions by dispensing with the easy cases first” i.e. it's the start of several chapters implementing the action procedures we invoke (or to which we "goto") from the previously mentioned main control loop.

- The next chapter, "Liquid assets" (sections 104–115) starts with something that I also find revealing: "Readers of this program will already have noticed that the BOTTLE is a rather complicated object, since it can be empty or filled with either water or oil" (so he expects that your reading will have noticed this already?)

- Then "The other actions" (sections 116–139) is what you'd expect: "Now that we understand how to write action routines, we’re ready to complete the set" (must say, this whole game looks a lot of fun! Thanks Crowther and Woods!)

- Then "Motions" (sections 140–153) (filling out a section referred to in the main control loop above), "Random Numbers" (sections 154–158, half a page), "Dwarf stuff" (sections 159–176), "Closing the Cave" (177–182), and "Death and Resurrection" (183–192), "Scoring" (193–199), and "Launching the program" (section 200).

Well, with my hard-won familiarity with WEB/CWEB conventions of font choice and numbering and so on (I bet you didn't expect the table of contents to be on the last page!) and with Knuth's style (one can often expect a main control loop branching off to various action procedures), the above is what I understand of the overall structure of the program after a quick reading taking about a second or two per page, maybe a bit more here and there, less than five minutes overall. After that, I feel oriented enough that I can now start reading the program, and I think literate programming (though I haven't even started actually reading it!) makes it easier to understand than if it had been written in "for the compiler" order with the same soup of functions and gotos. But to reiterate, LP in this case is about better presentation of programs written in straightforward old-style "just tell the computer what to do" idiom. It is conceivable that with modern programming conventions, with just the right abstractions and objects and whatnot, one could produce something more readable, and which did not have so many global variables and so on. But that's not the relevant comparison here, I think (and besides maybe that program too could be reordered, written literately?)

Now, the main question that arises is: why not just write out this outline at the beginning? Why is the heart of how the program works, the main control loop, described neither at the beginning nor the end but in the middle of the book, section 74 of 200? Is it fair to the reader to have so many conventions that are not explained in the program itself? Well, I have no end of frustrations with trying to read Knuth's literate programs (see my complaints and jokes in the page I linked above and in the slides), but still I think the answer would be: (1) Every house style has its conventions that must be learned (like: "variables ending with an underscore are private member variables in this codebase"), (2) It would be extra work for the programmer to write this all out when the reader can just skim and find out. And for Knuth, literate programming (and programming in general) is supposed to be fun, not drudgery. So while I think Knuth-style literate programming has problems, I think there are more interesting problems than the surface-level ones. (For example, someone's observation that there are approximately as many LP systems as programmers practising LP: no one is comfortable using others' tools and ultimately wants to write their own tool.)


Thanks for the detailed, informed response. It has given me a lot to think about wrt reading, writing, and discussing literate programs and literate programming.


I think this shares some of my early gripes with literate programming. That it seemed very hung on c or pascal.

It eventually stuck me as being part of the point. It is not supposed to be a new language. It is supposed to be how you would explain an implementation to another person.

That is, it is not an attempt to hide the structure of a program. Such that having an intro section that basically lays out the structure of the language being used for the computer is sort of on key.

Seeing a literate implementation of "wc" is not supposed to give more clarity to how "wc" exists as an algorithm, per se. It is supposed to give a view of an implementation that is readable to a person, though.

And, often, the idioms of how you read a whole program should guide how you write the literate one. This is probably not line by line. But it probably is guided by your existing knowledge of the implementation language.


Thanks for the thoughtful response, but I don't think I can say I agree with your conclusion. One example I cited was Knuth's Adventure. Even knowing C well enough, I'm not struck by how Knuth's presentation makes it easier to understand in the way he desires.

Knuth very much seems to advocate for programs that can be read cover-to-cover (e.g. while, say, sitting in your favorite chair). I think the failure in his literate programs come down to the blindspots he's developed working on his own programs, from having focused so much on the trees in their respective forests. You can see something similar at play in Graham Nelson's struggles to make Inform 7 appropriately "literate", which funnily enough exhibits the opposite of expectations of familiarity with the original implementation language. <http://inform7.com/talks/2020/06/07/narrascope-ii.html> Have a look at this slide in particular: <http://inform7.com/assets/images/NS2/slide036.jpg> Here's an excerpt:

"Once The Historian is done, the instructions are passed to Instruction::read, which parses them more fully (see below) and returns an intest_instructions object".

It goes on in that style. This is reminiscent of every bad attempt I saw during Java's heyday to ensure that a given codebase was 100% commented. People just end up writing "documentation" that consists of low-level repetition of what the code is doing, but without explaining why it's doing those things, and that why is key. It's the only real reason to be concerned with making sure things are documented. The code itself explains the how (in excruciating detail, even), the selection of identifier names more or less does the job of explaining the what. It's the role of the comments to explain the why, because (crucially) there is otherwise no substitute for this reified in the code itself, unlike the case with identifier names and procedural control flow. From my original notes upon seeing that slide from the Inform 7 talk:

> You don't need to tell us that the instructions are passed to `Instructions::read`, which returns an `intest_instructions` object, because if that's true, then we can already see that. This type of documentation contributes nothing except to people who are new to the programming language you're using, but that's not your audience.

> The ONLY way to document a program is to understand that almost everything you write needs to be an answer to the question, "What problem existed that led to this piece of code being created?" Your job is not to describe your solution. We already have your solution and can read through it. Your job is to describe the problem(s) for which your program is a solution. Namely, a problem is defined by its constraints. Detail all of those constraints [before ever thinking of telling us how it is you've decided to work with them].

What do you think of akkartik's take on the imperative to effectively communicate whole-program concerns at the global level? <http://akkartik.name/about>


I'm honestly not sure what my conclusion is, yet. :)

For the Adventure example, I think I agree with you. It did not do a great job conveying the program to me as a blank slate. My suspicion is that I need more familiarities with game programs, though. That is, it is more than just c idioms that need to be understood. One could argue that that should have been a starting section of the program. (I'll actually have to look at it again. Don't have it with me, and been way too long.)

I think my conclusion is that I don't know how to read full programs that well. I'm not sure anyone does. :(

Your links on top down versus bottom up, I agree, are spot on. Such that I am constantly trying to get people to think of the whole more in what we do at work. To the point where I find myself sneaking in global variables if only to put an anchor to places it is vital to remember the outside picture.

But, it can't be argued against that bottom up is far easier to read in pieces. Such that, with most tools, it is not possible to have a piece that is cross cutting to many other pieces. Literate, it seems, makes that possible. But only by making it so that I can justify what are otherwise bad practices.

This is highlighted by the index that cweb makes. The identifiers of variables are presented as a whole. Such that it is expected of you have to modify a value in a block, you can easily reference where it was defined and where else it is updated. By and large, modern styles are that you shouldn't have to reference this, as it should be in the same small function.

But this gets back to the wholistic critique. By not having a whole state that you can reference, you wind up with an API that is so large you can't remember it well, either. :(


Look into the Leo programming editor, if this interests you. It's an elegant solution, it doesn't scale well to teams and it's a bit quirky but I think it embodies the concept very well.


Something over 25 years ago, I downloaded and installed some program for Solaris 2.x that was written in CWEB. I wish I could remember what the program was.

Oh, and thanks for the link to the PBR book.


This is such an insightful interview. I love reading interviews like this because they always give away a bit of the secret sauce that enables people like Knuth to be so creative, and passionate about their fields. I thought this quote was especially awesome, because you get to understand that what one sees on the surface, is not the whole picture, especially for someone like Knuth.

"I think that what I’m most guilty of is that I say something but I say it in such a smooth way that people don’t realize that I’m being precise. They think I’m being informal when I’m really meaning every syllable."


Well, I was like that until I realised that nobody else is. I still haven't figured out the sweet spot, so I am now too repetitive and too verbose.


"My mother had a scheduling algorithm: look for something that you can to do and do it. Instead of planning, she just said look for something that needs to be done and do it. It worked for her for ninety years."


For an interview with someone who cared so much about formatting and elegance, that PDF is kind of ugly I just have to say. How about just keeping everything at the same font size, putting the questions in italic, and indenting his responses in blocks? Even better, LateX?


Indeed, I was going to say exactly this but I am glad I found your comment instead. The first question in the interview is about the reason good typesetting was important to Knuth and the .pdf sort of answers it. It is an excellent read but a bit painful to look at. Proper typesetting of TeX and LaTeX would go a long way to alleviate that, not to mention jagged vertical spacing in the titles and elsewhere.


I started reading the title and almost had a heart attack thinking Donald Knuth died

Edit: Title was "Professor Donald Knuth on Writing and More, an Int..."


I got exactly the same reaction. I feel lucky to be in a field so young that many of the "founding fathers" are still alive.


I think this decade will be the last one where we get to say that :( Most of them are now in their 70s or 80s.


We merged some comments from https://news.ycombinator.com/item?id=25582712 into this thread, so if you notice weird timestamp mismatches, that's why. Other than that they're in the right place :)


Thanks!

I suppose you must have some sort of automation that notifies you when the same comment is posted twice, or at least posted twice on submissions of the same name & url? I submitted earlier without a ton of success, so when I saw it on the front page this evening I went and copied in my previous comment. Wondering what the method is for identifying that, unless it is too secret to share...


Nothing so fancy. We got an email asking us to merge the threads.


Dr Knuth contribution to academic typesetting is such a significant accomplishment as it aided the sharing and discourse of information among universities over the last 4 years


Missing a '0'?


Yes :-)


"SAT solvers became a billion-dollar industry" Does anyone have a reference for this?


SAT is a key technology in electronic design automation (EDA), the industry making the tools allowing designing chips. From [1]:

"Within EDA, SAT solvers are used for such disparate tasks as test pattern generation, circuit delay computation, logic optimization, combinational equivalence checking, bounded model checking and functional test vector generation."

Improvement in SAT solving allows designing more and more complex chips.

In software it's used for formal checking, through SMT (Satisfiability Modulo Theory [2]). SMT is an extension of SAT. For an example application, Microsoft is using their z3 SMT solver [3], now open source, for the automated Windows driver verification process.

[1] https://semiengineering.com/knowledge_centers/eda-design/ver... [2] https://en.wikipedia.org/wiki/Satisfiability_modulo_theories [3] https://en.wikipedia.org/wiki/Z3_Theorem_Prover


FDIV cost Intel hundreds of millions, formal verification is big money.


unrelated but no xmas lecture this year?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: