So glad to see Odin up on the site now! A while back I promised you a write up on my number one feature request, and I've had it sitting in my email draft for a while, so I'll go ahead and put it up right now.
An example of the problem
Iterating over doubly linked lists is not hard, but to do it in C, there is some boilerplate to the for loop that can get repeated a lot:
| for (Node *node = list.head; node != 0; node = node->next){ /* work on the node */ }
|
That is a small amount to type, but it's still annoying enough that a lot of programmers ask for a "foreach" construct in new languages. Sometimes it makes sense to make a macro that simplifies the boilerplate. Abstracting the boilerplate becomes particularly important when the loop happens in a lot of places, because if your singly linked list becomes a doubly linked list with a sentinel all the loop controls need to be rewritten:
| for (Node *node = list.sentinel.next; node != &list.sentinel; node = node->next){ /* work on the node */ }
|
Again this is not a lot to type, but if you didn't abstract out how you iterate a linked list, and the for loop boilerplate is everywhere the retyping can be a big hassle and interrupt a train of thought.
A lot of people take this particular example very seriously and spend a lot of time working out a "foreach" system which involves defining an operator/procedure/type or something else that facilitates how to customize "foreach" to everyones own little data structure. Since the only way languages think about custom control flow is through the call stack, these systems always involve something that is ultimately a callback+userdata system. Even then this still only abstracts "foreach" type control flow, and, in my limited experience, is not better than making a macro even in the foreach case.
The idea
Since the user just wants to have the code
| for (Node *node = list.head; node != 0; node = node->next){ /* work on the node */ }
|
written into their code, without using the call stack, in a way that encapsulates iteration rule for the list, I propose a language construct that acts as a typed macro replacement sort of system like the following:
| #control_flow forlist($v : ^Node; $list : expr_const) ${B} :: {
for (v = list.head; v != 0; v = v.next){
B
}
}
|
Then code that loops over a list could be written as:
| forlist (node : ^Node; list) { /* work on the node */ }
|
Here is a break down of the ideas in this example:
marks this as the declaration of a new control flow structure.
is the name of the new control flow structure.
specifies that it expects a variable declaration or the name of an existing variable on the first parameter.
specifies that it expects an expression with no side effects for the second parameter.
captures the first statement after the control flow parameter list.
The rest defines how to configure those pieces in terms of the already defined control flow structures.
A standard way to handle this would be to write a function that takes a callback and user data, and that does the loop code. That is quite similar to this, but it takes a lot more for the user to think in those terms, and I suspect the compiler probably struggles to optimize it as well as it could have optimized this.
Here is an example where control flow abstraction would allow an API writer to trust that the user will use the code correctly, when the API writer could not have made that assumption without control flow abstraction.
When providing a system for iterating through data in a chunk by chunk fashion, there are some things the author can do in the API to help the user keep their indexes straight. The bad way to do this is just report to the user the start and end positions of the chunk they are currently on, and force them to always subtract an offset from the indexes. A nice way to do it, is to provide an array pointer that is rebased so that even if the chunk ranges from indexes 900 to 1000, the user can just plug in index 950 to the array, and it has been shifted so that the absolute index lines up. The only issue left for the author is to help the user get the control flow right, and the best I have come up with in C is to transform all for loops via the following rule:
The Original:
| for (A; i < end; ++i){ /* work */ }
|
The Transformed Version:
| if (init_stream(^stream, i, end)){
still_looping := 1;
A;
do{
for (; i < stream.end; ++i){ /* work */ }
still_looping = stream_move_forward(^stream);
}while(still_looping);
}
|
This is not a trivial rewrite. The end condition has been moved to a completely different place, the pre-statement A has been relocated, there is an extra variable to declare, the whole thing is now two levels of nesting deeper than it should be, and not all of the changes are on the same side of the main work block, meaning you have to go back and forth across potentially lots of code to do this tranformation by hand.
My use for this pattern is all over my code so all this boilerplate has to be repeated because there is no way to abstract over control flow. If I try to ask end users to use this system, there will likely be many many bugs, so I am still telling everyone they should just read all of the data and handle it that way, even though the chunk system would solve their problem more efficiently, and almost just as easily if I could trust that the control flow would not become a problem.