summaryrefslogtreecommitdiff
path: root/doc/online/data.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/online/data.html')
-rwxr-xr-xdoc/online/data.html560
1 files changed, 560 insertions, 0 deletions
diff --git a/doc/online/data.html b/doc/online/data.html
new file mode 100755
index 0000000..9fddef4
--- /dev/null
+++ b/doc/online/data.html
@@ -0,0 +1,560 @@
+<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<link rel="stylesheet" href="/fonts/fonts.css">
+<link rel="stylesheet" href="/css/main.css">
+<link rel="apple-touch-icon" sizes="180x180" href="/favicon/apple-touch-icon.png">
+<link rel="icon" type="image/png" sizes="32x32" href="/favicon/favicon-32x32.png">
+<link rel="icon" type="image/png" sizes="16x16" href="/favicon/favicon-16x16.png">
+<link rel="manifest" href="/favicon/site.webmanifest">
+<title>data.lp &mdash; DistressNetwork°</title>
+<style>h3,.lp-ref {font-family: neue-haas-grotesk-text, var(--fs-sans); font-size: 1rem; font-weight: normal; font-style: italic;} h3 {margin: 1rem;}</style>
+</head>
+<body>
+<div class="contentlevel">
+<main>
+<div class="leading">data.lp</div>
+<hr>
+<p>This file contains the various data processing-related constants and functions referenced by the tangling and weaving processes.</p>
+
+<h3 id="section">*:</h3>
+
+<pre><code><span class="lp-ref">(License)</span>
+
+<span class="lp-ref">(Imports)</span>
+
+<span class="lp-ref">(Processing limits)</span>
+
+<span class="lp-ref">(Formatting keywords)</span>
+
+<span class="lp-ref">(Configuration keywords)</span>
+
+<span class="lp-ref">(Data structure types)</span>
+
+<span class="lp-ref">(Error set)</span>
+
+<span class="lp-ref">(Line splitting function)</span>
+
+<span class="lp-ref">(Configuration searching function)</span>
+
+<span class="lp-ref">(Section searching function)</span>
+
+<span class="lp-ref">(Command type detection function)</span>
+
+<span class="lp-ref">(Parsing functions)</span>
+
+<span class="lp-ref">(Code generation functions)</span>
+
+<span class="lp-ref">(Text generation function)</span>
+</code></pre>
+
+<h3 id="license">License:</h3>
+
+<pre><code>// Copyright 2022 DistressNetwork° &#60;uplink@distress.network&#62;
+// This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.
+</code></pre>
+
+<h2 id="constants">Constants</h2>
+
+<p>We first import the standard library and the logging function from <code>log.zig</code>.</p>
+
+<h3 id="imports">Imports:</h3>
+
+<pre><code>const std = @import("std");
+const log = @import("log.zig").log;
+
+const Allocator = std.mem.Allocator;
+</code></pre>
+
+<p>We define the maximum input file size of 4GiB, and the code generation function&#8217;s maximum recursion depth of 250 nested calls.</p>
+
+<h3 id="processing-limits">Processing limits:</h3>
+
+<pre><code>pub const input_max = 0x1_0000_0000;
+pub const dereference_max = 250;
+</code></pre>
+
+<p>We then define the recognized formatting keywords. These consist of the following:</p>
+
+<ul>
+<li><code>@:</code>, which begins a new code section;</li>
+<li><code>@+</code>, which appends content to a previous code section;</li>
+<li><code>@.</code>, which terminates the definition of a code section;</li>
+<li><code>@=</code>, which creates a reference to another code section;</li>
+<li><code>*</code>, which is a reserved section name representing the root level of the source code.</li>
+</ul>
+
+<h3 id="formatting-keywords">Formatting keywords:</h3>
+
+<pre><code>pub const k_start = "@: ";
+pub const k_add = "@+ ";
+pub const k_end = "@.";
+pub const k_ref = "@= ";
+pub const k_root = "*";
+</code></pre>
+
+<p>We similarly define the recognized configuration keywords, consisting of:</p>
+
+<ul>
+<li><code>@start</code>, which defines the leading formatted code delimiter when beginning new sections;</li>
+<li><code>@add</code>, which defines the leading formatted code delimiter when appending to existing sections;</li>
+<li><code>@end</code>, which defines the trailing formatted code delimiter;</li>
+<li><code>@ref</code>, which defines the format for section references;</li>
+<li><code>@@</code>, which is the escape sequence representing the current section name;</li>
+<li><code>\n</code>, which is the escape sequence representing a newline.</li>
+</ul>
+
+<h3 id="configuration-keywords">Configuration keywords:</h3>
+
+<pre><code>pub const kc_start = "@start ";
+pub const kc_add = "@add ";
+pub const kc_end = "@end ";
+pub const kc_ref = "@ref ";
+pub const kc_esc = "@@";
+pub const kc_nl = "\\n";
+</code></pre>
+
+<p>We then define the data structure used for parsing the input into code sections, described as follows:</p>
+
+<ul>
+<li>The overall structure of the file is an array of <code>Section</code>s.</li>
+<li>A <code>Section</code> consists of the section name and an array of <code>Content</code> elements.</li>
+<li>A <code>Content</code> element may be either a range of literal lines of code or a reference to another section.</li>
+<li>A <code>LineRange</code> is a pair of integers indicating the starting and ending line numbers of the section.</li>
+</ul>
+
+<h3 id="data-structure-types">Data structure types:</h3>
+
+<pre><code>pub const Section = struct {
+ name: []const u8,
+ content: []const Content,
+};
+
+pub const CodeType = enum { literal, reference };
+pub const Content = union(CodeType) {
+ literal: LineRange,
+ reference: []const u8,
+};
+
+pub const LineRange = struct {
+ start: u32,
+ end: u32,
+};
+</code></pre>
+
+<p>We also define the set of errors which may be encountered by the various processing functions, consisting of:</p>
+
+<ul>
+<li>Unexpected section start commands,</li>
+<li>Unexpected section end commands, </li>
+<li>Recursive dereferencing exceeding the specified depth limit,</li>
+<li>References to nonexistent section names or configuration commands.</li>
+</ul>
+
+<h3 id="error-set">Error set:</h3>
+
+<pre><code>pub const Errors = error {
+ UnexpectedStart,
+ UnexpectedEnd,
+ DereferenceLimit,
+ NotFound,
+};
+</code></pre>
+
+<h2 id="preprocessing-searching">Preprocessing &#38; Searching</h2>
+
+<p>The line splitting function is defined, which operates on a buffer as follows.</p>
+
+<h3 id="line-splitting-function">Line splitting function:</h3>
+
+<pre><code>pub fn split_lines(file: []const u8, alloc: Allocator) ![][]const u8 {
+ var buffer = std.ArrayList([]const u8).init(alloc);
+ defer buffer.deinit();
+
+ <span class="lp-ref">(Split file at each newline)</span>
+
+ return buffer.toOwnedSlice();
+}
+</code></pre>
+
+<p>The function simply iteratively splits the file at each newline, and appends each resulting line to the buffer.</p>
+
+<h3 id="split-file-at-each-newline">Split file at each newline:</h3>
+
+<pre><code>var iterator = std.mem.split(u8, file, "\n");
+while (iterator.next()) |line| {
+ try buffer.append(line);
+}
+</code></pre>
+
+<p>In addition, the final empty line created by the trailing newline at the end of the file (inserted automatically by some text editors) is removed, if it exists. This may only be performed if the file is non-empty, to avoid out-of-bounds indexing.</p>
+
+<h3 id="split-file-at-each-newline-1">+ Split file at each newline:</h3>
+
+<pre><code>if ((buffer.items.len &#62; 0) and std.mem.eql(u8, buffer.items[buffer.items.len - 1], "")) {
+ _ = buffer.pop();
+}
+</code></pre>
+
+<p>We define the configuration command searching function, which returns a list containing the segments of the split format string. The function will return from within the for loop if the declaration is found, otherwise an error is reported.</p>
+
+<h3 id="configuration-searching-function">Configuration searching function:</h3>
+
+<pre><code>pub fn get_conf(lines: [][]const u8, key: []const u8, alloc: Allocator) ![][]const u8 {
+ for (lines) |line| {
+ if (std.mem.startsWith(u8, line, key)) {
+ return try fmt_conf(line, key, alloc);
+ }
+ }
+ log(.err, "config declaration '{s}' not found", .{std.mem.trimRight(u8, key, " \t")});
+ return error.NotFound;
+}
+
+<span class="lp-ref">(Auxiliary formatting function)</span>
+</code></pre>
+
+<p>If the declaration is found, its contained format string is split along instances of the section name escape sequence, and each substring has its instances of the newline escape sequence replaced with a literal newline.</p>
+
+<h3 id="auxiliary-formatting-function">Auxiliary formatting function:</h3>
+
+<pre><code>fn fmt_conf(line: []const u8, key: []const u8, alloc: Allocator) ![][]const u8 {
+ var buffer = std.ArrayList([]const u8).init(alloc);
+ defer buffer.deinit();
+
+ var iterator = std.mem.split(u8, line[(key.len)..], kc_esc);
+ while (iterator.next()) |str| {
+ try buffer.append(try std.mem.replaceOwned(u8, alloc, str, kc_nl, "\n"));
+ }
+
+ return buffer.toOwnedSlice();
+}
+</code></pre>
+
+<p>We define the code section searching function, which returns the index (into the section list) of the first section with a matching name, or returns an error if none exist.</p>
+
+<h3 id="section-searching-function">Section searching function:</h3>
+
+<pre><code>fn search(list: []Section, name: []const u8) !usize {
+ for (list) |section, index| {
+ if (std.mem.eql(u8, section.name, name)) return index;
+ }
+ log(.err, "section '{s}' not found", .{name});
+ return error.NotFound;
+}
+</code></pre>
+
+<h2 id="parsing">Parsing</h2>
+
+<p>We first define a function which, for a given line, determines whether it consists of a formatting command, and which type of command it contains. This is done in order to enable the use of switch statements in later functions using this routine.</p>
+
+<h3 id="command-type-detection-function">Command type detection function:</h3>
+
+<pre><code>const CommandType = enum { start, add, end, ref, none };
+
+fn command_type(line: []const u8) CommandType {
+ if (std.mem.startsWith(u8, line, k_start)) {
+ return .start;
+ } else if (std.mem.startsWith(u8, line, k_add)) {
+ return .add;
+ } else if (std.mem.eql(u8, line, k_end)) {
+ return .end;
+ } else if (std.mem.startsWith(u8, std.mem.trimLeft(u8, line, " \t"), k_ref)) {
+ return .ref;
+ } else {
+ return .none;
+ }
+}
+</code></pre>
+
+<p>We then define the parsing functions, consisting of the main <code>parse</code> function which builds the list of <code>Section</code>s, and its auxiliary <code>parse_code</code> subroutine which builds the contents of each <code>CodeSection</code>.</p>
+
+<h3 id="parsing-functions">Parsing functions:</h3>
+
+<pre><code>pub fn parse(lines: [][]const u8, alloc: Allocator) ![]Section {
+ var sections = std.ArrayList(Section).init(alloc);
+ defer sections.deinit();
+
+ <span class="lp-ref">(Main parsing routine)</span>
+
+ return sections.toOwnedSlice();
+}
+
+fn parse_code(lines: [][]const u8, index: u32, alloc: Allocator) !CodeReturn {
+ var content = std.ArrayList(Content).init(alloc);
+ defer content.deinit();
+
+ <span class="lp-ref">(Code parsing subroutine)</span>
+
+ return CodeReturn{ .content = content.toOwnedSlice(), .index = i + 1 };
+}
+</code></pre>
+
+<p>The latter function takes as arguments the list of lines and the allocator similarly to the main function, but it is also passed the index of the current line being processed, and returns the line at which the main function should resume parsing after the code section is parsed. It thus returns a struct consisting of the contents of the code section and the next line number index, as follows.</p>
+
+<h3 id="parsing-functions-1">+ Parsing functions:</h3>
+
+<pre><code>const CodeReturn = struct {
+ content: []const Content,
+ index: u32,
+};
+</code></pre>
+
+<p>The main parsing routine iterates over the list of lines, adding code sections where they occur, and otherwise ignoring text sections. If a section end command is encountered in the absence of a preceding starting command, an error is returned.</p>
+
+<h3 id="main-parsing-routine">Main parsing routine:</h3>
+
+<pre><code>var i: u32 = 0;
+while (i &#60; lines.len) {
+ const line = lines[i];
+ switch (command_type(line)) {
+ .start =&#62; {
+ <span class="lp-ref">(Add new section)</span>
+ },
+ .add =&#62; {
+ <span class="lp-ref">(Append to section)</span>
+ },
+ .end =&#62; {
+ log(.err, "line {d}: unexpected section end", .{i + 1});
+ return error.UnexpectedEnd;
+ },
+ else =&#62; {
+ i += 1;
+ },
+ }
+}
+</code></pre>
+
+<p>To add a new section, the name (consisting of everything after the starting token) is first retrieved from the starting command. Then the code parsing subroutine is called, beginning at the line after the starting command, and it returns the resulting code section (<code>section.content</code>) and the next line at which to resume parsing (<code>section.index</code>). The code section is appended to the section list, and the parsing routine continues at the provided index.</p>
+
+<h3 id="add-new-section">Add new section:</h3>
+
+<pre><code>const name = line[(k_start.len)..];
+log(.debug, "({d}) starting section '{s}'", .{ i + 1, name });
+
+const section = try parse_code(lines, i + 1, alloc);
+try sections.append(.{ .name = name, .content = section.content });
+
+log(.debug, "({d}) ending section '{s}'", .{ section.index, name });
+i = section.index;
+</code></pre>
+
+<p>To append to an existing section, the section name and the code section contents to be appended are retrieved as above. The index of the section is located, along with its address within the section list. Next, the new contents of the section are created by concatenating the old contents with the newly parsed code section contents. The section list is then updated to point to the new section contents, and the parsing routine continues.</p>
+
+<h3 id="append-to-section">Append to section:</h3>
+
+<pre><code>const name = line[(k_add.len)..];
+log(.debug, "({d}) appending to section '{s}'", .{ i + 1, name });
+
+const section = try parse_code(lines, i + 1, alloc);
+const index = try search(sections.items, name);
+const old = &#38;sections.items[index];
+const new = try std.mem.concat(alloc, Content, &#38;[_][]const Content{ old.*.content, section.content });
+old.*.content = new;
+
+log(.debug, "({d}) ending section '{s}'", .{ section.index, name });
+i = section.index;
+</code></pre>
+
+<p>The code parsing subroutine iterates over the list of lines similarly to the main routine. If a starting or appending command is encountered (lacking a matching ending command), an error is raised. Reference commands may be preceded with any amount of whitespace. The loop exits upon encountering an ending command. Otherwise, the line is appended as a literal element.</p>
+
+<h3 id="code-parsing-subroutine">Code parsing subroutine:</h3>
+
+<pre><code>var i = index;
+while (i &#60; lines.len) {
+ const line = lines[i];
+ switch (command_type(line)) {
+ .start, .add =&#62; {
+ log(.err, "line {d}: unexpected section start", .{i + 1});
+ return error.UnexpectedStart;
+ },
+ .ref =&#62; {
+ <span class="lp-ref">(Add reference)</span>
+ },
+ .end =&#62; {
+ break;
+ },
+ else =&#62; {
+ <span class="lp-ref">(Add literal range)</span>
+ },
+ }
+}
+</code></pre>
+
+<p>To add a reference, the name of the referenced section is retrieved, consisting of the characters following the leading whitespace and the command token. The resulting string is appended to the section contents list, and the parser continues at the next line.</p>
+
+<h3 id="add-reference">Add reference:</h3>
+
+<pre><code>const ref_name = std.mem.trimLeft(u8, line, " \t")[(k_ref.len)..];
+try content.append(.{ .reference = ref_name });
+log(.debug, "({d}) \tappended reference '{s}'", .{ i + 1, ref_name });
+i += 1;
+</code></pre>
+
+<p>To add a literal range, the parser either updates the end index of the previous literal element, or creates a new literal element if the last element added is a reference. This action of switching on the previous section element must only occur if the section contents list is non-empty, in order to prevent out-of-bounds indexing. Otherwise, the parser unconditionally appends a new literal element to the list. After either case, parsing continues at the next line.</p>
+
+<h3 id="add-literal-range">Add literal range:</h3>
+
+<pre><code>if (content.items.len &#62; 0) {
+ switch (content.items[content.items.len - 1]) {
+ .literal =&#62; |*range| {
+ range.*.end = i;
+ },
+ .reference =&#62; {
+ try content.append(.{ .literal = .{ .start = i, .end = i } });
+ log(.debug, "({d}) \tappending literal", .{i + 1});
+ },
+ }
+} else {
+ try content.append(.{ .literal = .{ .start = i, .end = i } });
+ log(.debug, "({d}) \tappending literal", .{i + 1});
+}
+i += 1;
+</code></pre>
+
+<h2 id="code-generation">Code Generation</h2>
+
+<p>We define the source code generation procedure which is split into two functions, consisting of a wrapper function which begins code generation at (the index of) the top-level section, and the main procedure which iterates over the current section contents, recursively resolving section references and appending literal elements to the list of source code lines.</p>
+
+<h3 id="code-generation-functions">Code generation functions:</h3>
+
+<pre><code>pub fn codegen(lines: [][]const u8, list: []Section, alloc: Allocator) ![][]const u8 {
+ const root = try search(list, k_root);
+ return try codegen_main(lines, list, root, 0, alloc);
+}
+
+fn codegen_main(lines: [][]const u8, list: []Section, index: usize, depth: u8, alloc: Allocator) anyerror![][]const u8 {
+ var buffer = std.ArrayList([]const u8).init(alloc);
+ defer buffer.deinit();
+
+ const section = list[index];
+ log(.debug, "generating section '{s}'", .{section.name});
+ for (section.content) |content| switch (content) {
+ .literal =&#62; |range| {
+ <span class="lp-ref">(Append literal range)</span>
+ },
+ .reference =&#62; |name| {
+ <span class="lp-ref">(Resolve reference)</span>
+ },
+ };
+
+ log(.debug, "ending section '{s}'", .{section.name});
+ return buffer.toOwnedSlice();
+}
+</code></pre>
+
+<p>To append a literal range, the range of lines is simply appended to the buffer.</p>
+
+<h3 id="append-literal-range">Append literal range:</h3>
+
+<pre><code>log(.debug, "adding literal range {d}-{d}", .{ range.start + 1, range.end + 1 });
+try buffer.appendSlice(lines[(range.start)..(range.end + 1)]);
+</code></pre>
+
+<p>To resolve a section reference, the function must first check whether the current recursion depth has exceeded the configured limit, and return an error if this occurs. Otherwise, the index of the referenced section is retrieved, its contents are recursively parsed (with an incremented recursion depth), and the resulting source code lines are appended to the buffer.</p>
+
+<h3 id="resolve-reference">Resolve reference:</h3>
+
+<pre><code>if (depth &#62; dereference_max) {
+ log(.err, "section dereferencing recursion depth exceeded (max {d})", .{dereference_max});
+ return error.DereferenceLimit;
+}
+const ref = try search(list, name);
+const code = try codegen_main(lines, list, ref, depth + 1, alloc);
+try buffer.appendSlice(code);
+</code></pre>
+
+<h2 id="text-generation">Text Generation</h2>
+
+<p>Finally, we define the text generation function which iterates over the list of lines and produces the literate program text to be passed to an external document processor. In order to keep track of the name of the code section currently being formatted at any given point, the variable <code>current_name</code> is continually updated to contain the current name string. Configuration declarations are skipped, and lines which do not contain any formatting commands are appended as they are.</p>
+
+<h3 id="text-generation-function">Text generation function:</h3>
+
+<pre><code>pub fn textgen(lines: [][]const u8, alloc: Allocator) ![][]const u8 {
+ var buffer = std.ArrayList([]const u8).init(alloc);
+ defer buffer.deinit();
+
+ <span class="lp-ref">(Process configuration declarations)</span>
+
+ var current_name: []const u8 = undefined;
+ for (lines) |line| {
+ if ( std.mem.startsWith(u8, line, kc_start)
+ or std.mem.startsWith(u8, line, kc_add)
+ or std.mem.startsWith(u8, line, kc_end)
+ or std.mem.startsWith(u8, line, kc_ref)) {
+ continue;
+ } else switch (command_type(line)) {
+ .start =&#62; {
+ <span class="lp-ref">(Format starting command)</span>
+ },
+ .add =&#62; {
+ <span class="lp-ref">(Format appending command)</span>
+ },
+ .ref =&#62; {
+ <span class="lp-ref">(Format reference command)</span>
+ },
+ .end =&#62; {
+ <span class="lp-ref">(Format ending command)</span>
+ },
+ else =&#62; {
+ try buffer.append(line);
+ },
+ }
+ }
+
+ return buffer.toOwnedSlice();
+}
+</code></pre>
+
+<p>The formatting strings given by each configuration declaration are first retrieved. If the declaration of the format string for the section appending command is omitted, the format string for the section starting command is used in its place.</p>
+
+<h3 id="process-configuration-declarations">Process configuration declarations:</h3>
+
+<pre><code>const conf_start = try get_conf(lines, kc_start, alloc);
+const conf_add = get_conf(lines, kc_add, alloc) catch conf_start;
+const conf_end = try get_conf(lines, kc_end, alloc);
+const conf_ref = try get_conf(lines, kc_ref, alloc);
+</code></pre>
+
+<p>To process a section starting command, the current section name is updated, and the contents of the corresponding formatting command (that is, the segments of the split formatting string) are interspersed with copies of the current section name. The resulting string is then appended to the buffer.</p>
+
+<h3 id="format-starting-command">Format starting command:</h3>
+
+<pre><code>current_name = line[(k_start.len)..];
+try buffer.append(try std.mem.join(alloc, current_name, conf_start));
+</code></pre>
+
+<p>Processing a section appending command is performed similarly.</p>
+
+<h3 id="format-appending-command">Format appending command:</h3>
+
+<pre><code>current_name = line[(k_add.len)..];
+try buffer.append(try std.mem.join(alloc, current_name, conf_add));
+</code></pre>
+
+<p>To process a reference command, the index of the reference command keyword is first extracted. Then the formatted reference string is created, to which the reference command line&#8217;s leading whitespace is prepended (to preserve indentation). </p>
+
+<h3 id="format-reference-command">Format reference command:</h3>
+
+<pre><code>const start = std.mem.indexOf(u8, line, k_ref).?;
+const ref = try std.mem.join(alloc, line[(start + k_ref.len)..], conf_ref);
+try buffer.append(try std.mem.concat(alloc, u8, &#38;[_][]const u8{ line[0..start], ref }));
+</code></pre>
+
+<p>Processing a section ending command is performed similarly to the starting and appending commands, however it does not require updating the current section name.</p>
+
+<h3 id="format-ending-command">Format ending command:</h3>
+
+<pre><code>try buffer.append(try std.mem.join(alloc, current_name, conf_end));
+</code></pre>
+</main>
+<nav>
+</nav>
+</div>
+<footer>
+<p><a href="/info">About.</a> <a href="mailto:uplink@distress.network">Contact.</a> <a href="/cw.html">Content Warning.</a> <a href="https://git.distress.network">Git.</a> <a href="/meta/sitemap">Sitemap.</a> <a href="https://creativecommons.org/licenses/by-sa/4.0">CC BY-SA 4.0.</a></p>
+<img src="/media/distressnetwork-w.svg" alt="">
+</footer>
+</body>
+</html>