stred

stred: Streaming Tree Editor - Like sed but for tree data. This is the documentation and specification. For the Go implementation, see http://shtanton.xyz/git/stred-go/
git clone https://shtanton.xyz/git/stred.git
Log | Files | Refs

counting.gmi (9380B)


      1 # Counting elements in stred
      2 
      3 In this example, we will use stred to count how many values there are in the root value of the input. Our input will be JSON, and our output will be a single number. This example is on the more complex side, so if you haven't had a look at any other examples yet it might be one to come back to. We will assume that the root of our input is either a map (an object in JSON) or an array.
      4 
      5 Because we are processing and then discarding the input, rather than making a change inside it, we will be using the '-n' flag to discard the contents of the token register after each token is processed.
      6 
      7 We will make extensive use of decomposition while we use stred. This involves looking at a large problem and breaking it down into smaller problems until each one is trivial to encode. stred will always split the input up into tokens and run our commands on every token, so we are trying to decompose in a way that fits that execution pattern. Our first large problem is this:
      8 
      9 Count how many values are in the root value of the input.
     10 
     11 To break this down, we consider what counting *is*. I've decided to decompose it into the following 3 step process.
     12 
     13 * 1 Create a counter and set it to 0
     14 * 2 Go through every child value of the root and for each one, increment the counter.
     15 * 3 Output the counter
     16 
     17 Step 2 in particular feels like a nice decomposition because stred is already going through each token for us so the looping part of the step is sort of already done.
     18 
     19 # Step 1
     20 
     21 These steps are still too large, so we will go through each of them, splitting them into smaller steps. We need to decide where the counter will be stored. stred provides 3 general purpose registers, so lets pick the X register to hold a single number which will be our counter. We know that stred will run our commands on every token in the input tree, meaning they'll be run many times. We need some way to only run step 1 at the very beginning of execution. We detect this by testing if the current token is the very first token in the input.
     22 
     23 * 1.1 Is the current token the very first token in the input?
     24 * 1.2 If it is, set the X register to 0
     25 
     26 ## Step 1.1
     27 
     28 These steps are *still* too large. How exactly can we test to see if the execution is currently on the very first token in the input? We assumed that our root would be either a map or an array, meaning the first token must be either the beginning of a map or of an array. It will also be the only one with an empty path, as only root tokens have empty paths. If and only if both of these conditions are true, the execution is currently processing the first token in the input.
     29 
     30 * 1.1.1 Is the token the beginning of a map or an array?
     31 * 1.1.2 Is the path emtpy?
     32 
     33 ### Step 1.1.1
     34 
     35 Finally, we have steps we can deal with. The current token will be stored in the token register, and we want to test to see if it is either the beginning of a map or an array. In stred, we do almost all transformations and computations with the substitute family of commands. 's' is the simplest of these and runs a subex on the token register. We want a subex that accepts if the register contains exactly the beginning of a map or array, and reject anything else. We'll do this using a range subex, written with '[]'. This will pass over unchanged exactly one atom, as long as it's one of the atoms listed inside the brackets. To pass over either the beginning of a map or of an array, we will use '[`{[`]'. This subex happens to leave the result unchanged but we wouldn't care if it destroyed it or replaced it (though we need to be careful we don't mess up the tests done by later commands!), we won't be outputting it and all we care about here is testing for what the token is. A substitution will skip to just after the first non substitution command after itself if the subex rejects (effectively, chaining subexes ANDs them), so this gives us the test we want. To use the subex in a substitution command, we just wrap it in / and put it after the 's':
     36 
     37 ```
     38 s/[`{[`]/
     39 ```
     40 
     41 ### Step 1.1.2
     42 
     43 Once again we want to do a test, but this time on the path register. No problem, the 'S' command is identical to 's' but acts on the path register. A subex that accepts only an empty path is easy as it's just nothing in itself. After this substitution we can put one command that will be skipped if either of our tests fail. We'll probably want more than one command, so we'll wrap them in '{}' so they're treated as one.
     44 
     45 ```
     46 S//
     47 ```
     48 
     49 ## Step 1.1 Revisited
     50 
     51 With our substeps completed, we can combine them into a step 1.1 that has a nice space for commands to run only on the very first token.
     52 
     53 ```
     54 s/[`{[`]/S//{ }
     55 ```
     56 
     57 ## Step 1.2
     58 
     59 Unfortunately, we can't interact directly with the X register, so we will need to decompose this step once again.
     60 
     61 * 1.2.1 Set the token register to 0
     62 * 1.2.2 Swap the token register with the X register
     63 
     64 ### Step 1.2.1
     65 
     66 Here, we will use the 's' command again but instead of to test on the token register, we'll use it to manipulate it's contents. We could use a subex to remove the existing content and add the 0, but it's a bit easier to use a 'd' command to delete the current content, and then another 's' to add the 0. The '=`0`=' subex will add a 0, and we can even use a shorthand '~0~' to abbreviate it a bit more. Even better, we can use any character to mark the start and end of a subex for a substitution and ~ are a special case that will be included in the subex when used, so we can also drop the /s from the substitution.
     67 
     68 ```
     69 ds~0~
     70 ```
     71 
     72 ### Step 1.2.2
     73 
     74 This step is really easy, the 'x' command does exactly what we want.
     75 
     76 ```
     77 x
     78 ```
     79 
     80 ## Step 1.2 Revisited
     81 
     82 ```
     83 ds~0~x
     84 ```
     85 
     86 # Step 1 Revisited
     87 
     88 Now we can put our step 1.2 commands in the curly brackets of our step 1.1 commands so they will be run only at the start of execution.
     89 
     90 ```
     91 s/[`{[`]/S//{ds~0~x}
     92 ```
     93 
     94 # Step 2
     95 
     96 stred will already iterate through every token that is a direct child of the root for us, amongst every other token, so we need a filter for only children of the root.
     97 
     98 * 2.1 Is the value a child of the root?
     99 * 2.2 If it is, increment the number in the X register
    100 
    101 ## Step 2.1
    102 
    103 There is a subtle consideration here. We want to count values, but stred is iterating tokens. For scalars, these are the same, but arrays and maps will each have two tokens for only one value. To make sure we are counting values, we will test that the token is a child of the root, and also that it is either a scalar or a beginning terminal.
    104 
    105 * 2.1.1 Is the token a child of the root?
    106 * 2.1.2 Is the token a scalar or a beginning terminal?
    107 
    108 ### Step 2.1.1
    109 
    110 A child of the root occurs exactly when the path to a token is exactly one value. This is easy to test for using the ',' subex which accepts exactly one scalar and passes over it unchanged.
    111 
    112 ```
    113 S/,/
    114 ```
    115 
    116 ### Step 2.1.2
    117 
    118 Here we again use ',' to see if we have a scalar, and '[`{[`]' as we saw earlier to check for a beginning terminal. To test for either of these, we use the '|' subex operator.
    119 
    120 ```
    121 s/,|[`{[`]/
    122 ```
    123 
    124 ## Step 2.1 Revisited
    125 
    126 We combine the substeps and add some '{}' ready for step 2.2.
    127 
    128 ```
    129 S/,/s/,|[`{[`]/{ }
    130 ```
    131 
    132 ## Step 2.2
    133 
    134 Because we can't increment the x register directly, we will need to swap the counter into the token register, increment it and then swap it back. The swaps are easy with the 'x' command, the increment will need another substitution.
    135 
    136 Subexes have a '+' operator that captures the output of the subex immediately before it. The '%' operator reads in a single number and outputs it unchanged. This would normally pass over it unchanged, but if we later use a '+' then the original number will be removed and instead of being put back the output goes into a sum. We want to also provide a 1 for that sum to achieve a single increment, so we use the subex '%~1~+'. From left to right, we read in a number and output it, then we output a 1, then we capture all the outputs so far, sum them, and output the result.
    137 
    138 Combined with the swap commands, we get:
    139 
    140 ```
    141 xs/%~1~+/x
    142 ```
    143 
    144 # Step 2 Revisited
    145 
    146 Once again, we just put step 2.2 into the brackets in step 2.1.
    147 
    148 ```
    149 S/,/s/,|[`{[`]/{xs/%~1~+/x}
    150 ```
    151 
    152 # Step 3
    153 
    154 Hopefully we're masters of decomposition by now:
    155 
    156 * 3.1 Is the current token the very last token in the input?
    157 * 3.2 If it is, output the contents of the X register
    158 
    159 ## Step 3.1
    160 
    161 This is very similar to step 1.1 but filters on the ending terminals instead of the beginning.
    162 
    163 ```
    164 s/[`}]`]/S//{ }
    165 ```
    166 
    167 ## Step 3.2
    168 
    169 This is straightforward, we can output from the token register with 'p' (think print) and so we just need to swap the X register in first with 'x'.
    170 
    171 ```
    172 xp
    173 ```
    174 
    175 # Step 3 Revisited
    176 
    177 ```
    178 s/[`}]`]/S//{xp}
    179 ```
    180 
    181 # Assembling the final set of commands
    182 
    183 It doesn't actually matter what order these three steps go in for the final command list, stred is really good at letting you type the commands in the order you think of them, so we'll do step 1, then 2, then 3, since that's the order we came up with them.
    184 
    185 ```
    186 s/[`{[`]/S//{ds~0~x} S/,/s/,|[`{[`]/{xs/%~1~+/x} s/[`}]`]/S//{xp}
    187 ```
    188 
    189 It looks really scary, even though we've seen where it came from! stred is best used when a set of commands doesn't need to be read. Just write and run. Fortunately, as we've seen, it lends itself to being written in the order that problems are thought about, so I very rarely find myself going back to a set of commands to make changes.