stred: Streaming Tree Editor - Like sed but for tree data. This is the documentation and specification. For the Go implementation, see
git clone
Log | Files | Refs

counting.gmi (9380B)

      1 # Counting elements in stred
      3 In this example, we will use stred to count how many values there are in the root value of the input. Our input will be JSON, and our output will be a single number. This example is on the more complex side, so if you haven't had a look at any other examples yet it might be one to come back to. We will assume that the root of our input is either a map (an object in JSON) or an array.
      5 Because we are processing and then discarding the input, rather than making a change inside it, we will be using the '-n' flag to discard the contents of the token register after each token is processed.
      7 We will make extensive use of decomposition while we use stred. This involves looking at a large problem and breaking it down into smaller problems until each one is trivial to encode. stred will always split the input up into tokens and run our commands on every token, so we are trying to decompose in a way that fits that execution pattern. Our first large problem is this:
      9 Count how many values are in the root value of the input.
     11 To break this down, we consider what counting *is*. I've decided to decompose it into the following 3 step process.
     13 * 1 Create a counter and set it to 0
     14 * 2 Go through every child value of the root and for each one, increment the counter.
     15 * 3 Output the counter
     17 Step 2 in particular feels like a nice decomposition because stred is already going through each token for us so the looping part of the step is sort of already done.
     19 # Step 1
     21 These steps are still too large, so we will go through each of them, splitting them into smaller steps. We need to decide where the counter will be stored. stred provides 3 general purpose registers, so lets pick the X register to hold a single number which will be our counter. We know that stred will run our commands on every token in the input tree, meaning they'll be run many times. We need some way to only run step 1 at the very beginning of execution. We detect this by testing if the current token is the very first token in the input.
     23 * 1.1 Is the current token the very first token in the input?
     24 * 1.2 If it is, set the X register to 0
     26 ## Step 1.1
     28 These steps are *still* too large. How exactly can we test to see if the execution is currently on the very first token in the input? We assumed that our root would be either a map or an array, meaning the first token must be either the beginning of a map or of an array. It will also be the only one with an empty path, as only root tokens have empty paths. If and only if both of these conditions are true, the execution is currently processing the first token in the input.
     30 * 1.1.1 Is the token the beginning of a map or an array?
     31 * 1.1.2 Is the path emtpy?
     33 ### Step 1.1.1
     35 Finally, we have steps we can deal with. The current token will be stored in the token register, and we want to test to see if it is either the beginning of a map or an array. In stred, we do almost all transformations and computations with the substitute family of commands. 's' is the simplest of these and runs a subex on the token register. We want a subex that accepts if the register contains exactly the beginning of a map or array, and reject anything else. We'll do this using a range subex, written with '[]'. This will pass over unchanged exactly one atom, as long as it's one of the atoms listed inside the brackets. To pass over either the beginning of a map or of an array, we will use '[`{[`]'. This subex happens to leave the result unchanged but we wouldn't care if it destroyed it or replaced it (though we need to be careful we don't mess up the tests done by later commands!), we won't be outputting it and all we care about here is testing for what the token is. A substitution will skip to just after the first non substitution command after itself if the subex rejects (effectively, chaining subexes ANDs them), so this gives us the test we want. To use the subex in a substitution command, we just wrap it in / and put it after the 's':
     37 ```
     38 s/[`{[`]/
     39 ```
     41 ### Step 1.1.2
     43 Once again we want to do a test, but this time on the path register. No problem, the 'S' command is identical to 's' but acts on the path register. A subex that accepts only an empty path is easy as it's just nothing in itself. After this substitution we can put one command that will be skipped if either of our tests fail. We'll probably want more than one command, so we'll wrap them in '{}' so they're treated as one.
     45 ```
     46 S//
     47 ```
     49 ## Step 1.1 Revisited
     51 With our substeps completed, we can combine them into a step 1.1 that has a nice space for commands to run only on the very first token.
     53 ```
     54 s/[`{[`]/S//{ }
     55 ```
     57 ## Step 1.2
     59 Unfortunately, we can't interact directly with the X register, so we will need to decompose this step once again.
     61 * 1.2.1 Set the token register to 0
     62 * 1.2.2 Swap the token register with the X register
     64 ### Step 1.2.1
     66 Here, we will use the 's' command again but instead of to test on the token register, we'll use it to manipulate it's contents. We could use a subex to remove the existing content and add the 0, but it's a bit easier to use a 'd' command to delete the current content, and then another 's' to add the 0. The '=`0`=' subex will add a 0, and we can even use a shorthand '~0~' to abbreviate it a bit more. Even better, we can use any character to mark the start and end of a subex for a substitution and ~ are a special case that will be included in the subex when used, so we can also drop the /s from the substitution.
     68 ```
     69 ds~0~
     70 ```
     72 ### Step 1.2.2
     74 This step is really easy, the 'x' command does exactly what we want.
     76 ```
     77 x
     78 ```
     80 ## Step 1.2 Revisited
     82 ```
     83 ds~0~x
     84 ```
     86 # Step 1 Revisited
     88 Now we can put our step 1.2 commands in the curly brackets of our step 1.1 commands so they will be run only at the start of execution.
     90 ```
     91 s/[`{[`]/S//{ds~0~x}
     92 ```
     94 # Step 2
     96 stred will already iterate through every token that is a direct child of the root for us, amongst every other token, so we need a filter for only children of the root.
     98 * 2.1 Is the value a child of the root?
     99 * 2.2 If it is, increment the number in the X register
    101 ## Step 2.1
    103 There is a subtle consideration here. We want to count values, but stred is iterating tokens. For scalars, these are the same, but arrays and maps will each have two tokens for only one value. To make sure we are counting values, we will test that the token is a child of the root, and also that it is either a scalar or a beginning terminal.
    105 * 2.1.1 Is the token a child of the root?
    106 * 2.1.2 Is the token a scalar or a beginning terminal?
    108 ### Step 2.1.1
    110 A child of the root occurs exactly when the path to a token is exactly one value. This is easy to test for using the ',' subex which accepts exactly one scalar and passes over it unchanged.
    112 ```
    113 S/,/
    114 ```
    116 ### Step 2.1.2
    118 Here we again use ',' to see if we have a scalar, and '[`{[`]' as we saw earlier to check for a beginning terminal. To test for either of these, we use the '|' subex operator.
    120 ```
    121 s/,|[`{[`]/
    122 ```
    124 ## Step 2.1 Revisited
    126 We combine the substeps and add some '{}' ready for step 2.2.
    128 ```
    129 S/,/s/,|[`{[`]/{ }
    130 ```
    132 ## Step 2.2
    134 Because we can't increment the x register directly, we will need to swap the counter into the token register, increment it and then swap it back. The swaps are easy with the 'x' command, the increment will need another substitution.
    136 Subexes have a '+' operator that captures the output of the subex immediately before it. The '%' operator reads in a single number and outputs it unchanged. This would normally pass over it unchanged, but if we later use a '+' then the original number will be removed and instead of being put back the output goes into a sum. We want to also provide a 1 for that sum to achieve a single increment, so we use the subex '%~1~+'. From left to right, we read in a number and output it, then we output a 1, then we capture all the outputs so far, sum them, and output the result.
    138 Combined with the swap commands, we get:
    140 ```
    141 xs/%~1~+/x
    142 ```
    144 # Step 2 Revisited
    146 Once again, we just put step 2.2 into the brackets in step 2.1.
    148 ```
    149 S/,/s/,|[`{[`]/{xs/%~1~+/x}
    150 ```
    152 # Step 3
    154 Hopefully we're masters of decomposition by now:
    156 * 3.1 Is the current token the very last token in the input?
    157 * 3.2 If it is, output the contents of the X register
    159 ## Step 3.1
    161 This is very similar to step 1.1 but filters on the ending terminals instead of the beginning.
    163 ```
    164 s/[`}]`]/S//{ }
    165 ```
    167 ## Step 3.2
    169 This is straightforward, we can output from the token register with 'p' (think print) and so we just need to swap the X register in first with 'x'.
    171 ```
    172 xp
    173 ```
    175 # Step 3 Revisited
    177 ```
    178 s/[`}]`]/S//{xp}
    179 ```
    181 # Assembling the final set of commands
    183 It doesn't actually matter what order these three steps go in for the final command list, stred is really good at letting you type the commands in the order you think of them, so we'll do step 1, then 2, then 3, since that's the order we came up with them.
    185 ```
    186 s/[`{[`]/S//{ds~0~x} S/,/s/,|[`{[`]/{xs/%~1~+/x} s/[`}]`]/S//{xp}
    187 ```
    189 It looks really scary, even though we've seen where it came from! stred is best used when a set of commands doesn't need to be read. Just write and run. Fortunately, as we've seen, it lends itself to being written in the order that problems are thought about, so I very rarely find myself going back to a set of commands to make changes.