A lightweight, extensible literate programming tool.

Overview

This tool consists of two programs, tangle and weave, both of which accept an input file from stdin and write output to stdout. There are no command-line arguments or other sources of input data. The literate programming interface is language-agnostic with respect to both the source code language and the document formatting language used.

Usage

These programs are intended to be intermediary text processing functions, used as components within build automation routines or shell scripts. The build processes for this tool itself, build.sh and gen-docs.sh, serve as example use cases.

Dependencies

For compiling the executables:

Zig (0.9.X)
binutils

For building the (freestanding) documentation:

lowdown

File Structure

A literate programming file consists of plain text sections, code sections, and output configuration commands. Inclusion of the top-level code section, named *, is mandatory within input to tangle, and certain configuration commands are mandatory within input to weave, described in a later section.

Section Commands

Code sections are created as follows:

@: <section name>
<section content>
@.

Source code may also be later appended to previously defined sections:

@+ <section name>
<additional content>
@.

Code sections may contain references to other sections, written as follows:

@= <section name>

References may be prepended by any amount of whitespace, whereas other commands must be placed at the immediate start of the line.

Configuration Commands

These commands define the format of the code sections and references as they would appear in the output of weave, written as follows:

@start <str>
@add <str>
@end <str>
@ref <str>

The @start command defines the format of the start of a new code section.
- This configuration is mandatory.
The @add command defines the format of the start of an appended content section.
- This configuration is optional, and defaults to the value given in @start if not supplied.
The @end command defines the format of the end of a code section.
- This configuration is mandatory.
The @ref command defines the format of a reference to another code section.
- This configuration is mandatory.

The format strings may contain the following control sequences:

\n, which represents a newline character,
@@, which represents the name of the current section being defined, if it occurs within the format string for @start, @add, or @end, or the name of the section being referenced, if it occurs within the format string for @ref.

Literate Source Code

The source code of this tool is rendered as a set of literate program documents, listed below.

Caveats

The text generation routine performed by weave does not verify the correctness of the section commands within the input file; as such, its output upon receiving malformed input should be considered undefined behaviour. It is therefore recommended to pass newly created or modified files into tangle first, which performs the necessary input validation.
The defined control sequences within configuration format strings are not able to be escaped; that is, the literal character sequences “\n” and “@@” are unable to be reproduced within formatted code section commands in the output.
Attempting to use the Texinfo typesetting language in conjunction with this tool results in namespace collisions with multiple command names.

Future Work

The treatment of whitespace within formatting and configuration commands is currently rather strict, as no trimming of leading or trailing whitespace around the command contents is applied. This may be changed for the sake of convenience, at the expense of added parsing complexity.
A ‘prefix’ mode where all text section lines and formatting commands are started with a certain prefix token which is recognized and omitted from the output. When the prefix specified is the line comment prefix of the programming language being used, this would allow for the use of conventional syntax highlighting while editing literate source code files.