commit b6402e6ed03371b29aefe0685774c66c9fd2b801
parent 150ee3e8084ed2e657b789094aa4baeec0efffdc
Author: Charlie Stanton <charlie@shtanton.com>
Date: Wed, 8 Sep 2021 22:22:10 +0100
Finish first draft of piping post
Diffstat:
1 file changed, 80 insertions(+), 4 deletions(-)
diff --git a/posts/better_than_stdio.gmi b/posts/better_than_stdio.gmi
@@ -13,7 +13,7 @@ I will now dissect and destroy my own arguments against the shell, hopefully sav
I was used to writing C that looked like this (only with way better style), as this is the way stdio is used most commonly in my experience:
-main.c
+main.c: writing "hello world" to file descriptor 1 (stdout)
```
#include <unistd.h>
@@ -23,7 +23,7 @@ int main() {
}
```
-receiver.c
+receiver.c: reading and printing from file descriptor 0 (stdin)
```
#include <unistd.h>
#include <stdio.h>
@@ -40,7 +40,7 @@ These can be compiled into executables and run like so:
```
./main | ./receiver
```
-which will output "hello world" to the shell. This is because the data ("hello world") in main is "piped" into receiver which reads it and then displays it. However, there really is nothing about this that stops us sending other data types over the pipe:
+which will output "hello world" to the shell. This is because the data ("hello world") in main is "piped" into receiver which reads it and then displays it, the pipe takes stdout from main and turns it into stdin for receiver. However, there really is nothing about this that stops us sending other data types over the pipe:
main.c
```
@@ -85,11 +85,87 @@ Which can be compiled and run to output "Charlie Stanton" even though the data p
## Multiple input and output streams to/from a process
+What if in my pipeline I want to do further processing on both the first names and the last names of people, but separately? Well, file descriptors 0, 1 and 2 are set aside for stdin, stdout and stderr respectively, but it turns out we can use other file descriptors and still access them from the shell.
+
+main.c
+```
+#include <unistd.h>
+#include <string.h>
+
+#include "types.h"
+
+int main() {
+ struct Person people[2];
+
+ strcpy(people[0].first_name, "Charlie");
+ strcpy(people[0].last_name, "Stanton");
+
+ strcpy(people[1].first_name, "John");
+ strcpy(people[1].last_name, "Smith");
+
+ for (int i = 0; i < 2; i++) {
+ write(1, people[i].first_name, strlen(people[i].first_name));
+ write(1, "\n", 1);
+ write(3, people[i].last_name, strlen(people[i].last_name));
+ write(3, "\n", 1);
+ }
+ return 0;
+}
+```
+Here we are writing the first names to stdout (file descriptor 1) and the last names to file descriptor 3. These are two different output streams and so we can send each one to a different file:
+```
+./main > first_names 3> last_names
+```
+We use 3> to indicate that we want to read from file descriptor 3 and write into last_names, we can do this with input too so the restriction on the number of inputs and outputs of each process is no longer a problem. This works perfectly for reading and writing files, but what if our input is coming from 2 other commands? We can use a pipe for one of them but what about the other? Fortunately, we have fifos.
+
+A FIFO (first in first out) is a file which doesn't behave at all like a regular file. You can read and write them, but they behave more like a queue. When you write something, it joins the end of the queue and when you read, you take something off the front of the queue. These are incredibly versatile for piping our data around, they are also known as named pipes because they are really the same, but are given a name in the filesystem so more than 2 processes can interact with them.
+To use fifos to solve our problem, we can type in the shell:
+```
+mkfifo /tmp/first_names && mkfifo /tmp/last_names
+echo Charlie > /tmp/first_names & echo Stanton > /tmp/first_names & ./main < /tmp/first_names 3< /tmp/last_names
+```
+
+This gives us the flexibility we had with files but with streams of data being passed between processes.
## Combining and duplicating data streams
+Combining data streams is somewhat complicated, it is easy to merge 2 byte streams, but there is no guarantee that a message from each won't get mangled together with interleaved bytes. I think a custom executable is needed to properly combine 2 data streams which can properly take into account the type of data being transfered. Duplicating a stream is simpler, and using a fifo and the tee utility, very straightforward.
+
+```
+mkfifo /tmp/fifo
+echo hello | tee /tmp/fifo | cat & cat /tmp/fifo
+```
+
+Here we use tee, which passes its stdin straight through to stdout, but also copies it to any files given as arguments. This allows us to duplicate the stream and provide it as an input to two different calls to cat.
## Cyclical data passing
-Suppose I want the output of one process feeding into the input of another and vice versa. Perhaps my irc bot is both receiving and sending messages through my irc client. I wasn't able to find a perfect solution to this, but we have a very good friend called the FIFO who can help us out greatly. A really simple example is a script to replace the yes utility, which outputs a string to stdout repeatedly until terminated. We could design a similar script using tee, which takes input from stdin and outputs it to stdout as well as to any files provided as arguments.
+With the help of a fifo we can also create cycles in our data dependency. We could recreate the yes utility like so:
+
+```
+mkfifo /tmp/fifo
+echo hello > /tmp/fifo & tee /tmp/fifo < /tmp/fifo
+```
+
+tee is both reading and writing /tmp/fifo but also outputting every time it does so, this creates a very simple cyclical data loop.
+
+## Byte streams are the only form of data passing
+
+POSIX and System V (two sets of standards that many operating systems adhere to) both define 3 types of IPC
+- Message queues
+- Shared memory
+- Semaphores
+but in the shell we only have access to byte streams, is this a limitation?
+
+Message queues and byte streams are very similar, in fact a byte stream could easily act as a message queue. If the messages are all of the same size then they can very easily be passed and received without any encoding at all and if the length varies then prepending each with an 8 byte length which is read to tell the receiver how big the rest of the message is could be used.
+
+Shared memory is more performant and more natural than message passing for some tasks, but in my experience there aren't that many of these and they can often be very easily translated into a message passing style. For example, if a small amount of data is being shared, then the data could be resent every time it changes, if it is a larger amount then a message could indicate a part of the data and what to set it to. Shared memory opens up a lot of potential for data races without careful consideration so I don't think it belongs on the shell.
+
+Semaphores can be very easily simulated by sending a single byte one way for up and the other way for down.
+
+Byte streams aren't always the most natural or the fastest solution to a problem, but they work very well most of the time and they are intuitive, easy to reason with, versatile and robust. I don't think the shell needs anything else.
+
+## Problems
+
+I do find it annoying that the syntax for using fifos is so verbose, which is my one criticism of this system.