On the Aesthetics of the Syntax of Declarations

Ginger Bill  —  3 months, 1 week ago [Edited 1 day, 8 hours later]
n.b. This is a philosophical article and not a technical article. There are no correct answers to the questions that I will pose -- only compromises.

I'm considering what the "best" declaration syntax would be. Historically, there have been two categories: which I will call qualifier-focused and type-focused.
An example of qualifier-focused would be the Pascal family. An example of type-focused would be the C family. Odin, like Jai, have been experimenting with an name-focused declaration syntax. These categories place emphasis on different aspects of the declarations.

  • Qualifier-focused places emphasis on the kind/qualifier of declaration (`var x = 123; const K = true;`)
  • Type-focused places emphasis on the type of the declaration (`int x = 123; bool const K = true;`)
  • Name-focused places emphasis on the name of the declarations and that the right hand side must be an expression (`x := 123; K :: true;`)

Some languages have a hybrid approach to the syntax of declarations. Python uses a form of name-focused for general variable declaration but qualifier-focused for function, class, and import declarations. Most modern derivatives of the C family may use type-focused for most declarations but use qualifier-focused for import declarations.

There are some issue regarding all three approaches.

Qualifier-Focused

  • Most forms of qualifier-focused require numerous keywords to specify the kind of declaration
  • Qualifier-focused adds verbosity to the syntax due to the extra keywords
  • Qualifier-focused declarations have a tendency to make you define all declarations at the top of a scope and/or group declarations together of the same kind together. This nudges the programmers to not intermingle declarations and assignments. Depending on the views of programmer, this can be viewed as a positive aspect.

  • Type-Focused

    • Every declaration must be associated with a type and thus be part of the type system. This is one of the reasons why `void` is in C. If a function must specify the return type, the solution to this is to create a "non-type", i.e. `void`. Having a `void` type does cause issues in the type system in general but I will not consider those in this article.
    • In the case of C++ and others, type inference must be done through a form of qualifier-focused (e.g. `auto` or `var`)

    Name-Focused

    • Name-focused has three aspects to the declaration, the left hand side (lhs), the right hand side (rhs), and the middle part. The lhs aspect are the names of the entities to be declared whilst the rhs aspect of the declaration are forms of expressions. This means that all declarations must be a declaration of an expression. This does mean that all declarations must be assigned with a form of expression. The middle part denotes the type which could be optional.
    • In a self-contained language, having only expression assigned declarations is not a problem. The difficulty comes from interfacing with foreign code, such as C, and having a consistent syntax. In C, there are two forms of function declarations: function prototypes and full functions with bodies. A function prototype is not a form of expression. A function with a body which can be thought of a named lambda function.
    • An inconsistency with name-focused is with variable declarations without an assignment (`x: int;`). It is implied that the declaration has an implicit rhs expression, the zero/default value. It is also implied that the declaration must be a variable declaration (and not a constant declaration).

    On the Aesthetics of Qualifier-Focused

    I will be open, I have a minor bias towards qualifier-focused due to Pascal being one of my very first languages. So when I started creating Odin, the language that I am designing and making, I started with a very Pascal syntax (including `begin` and `end`) but when the language became "public", the syntax changed to be closer to that of Jonathan Blow's language, Jai. I was intrigued by the idea of the name-focused syntax with its very elegant approach to type inference. However, I have had doubts about the syntax for quite a while. At one point, I struggled to find a solution to the issue of foreign procedures and foreign variables. Originally, I solved the issue for foreign procedures with replacing the procedure body with a `#foreign` tag. However, this approach cannot be applied to a foreign variable declaration and still be consistent. On an impulsive whim, I switched the entire declaration syntax to a Go-like qualifier-focused style for 2 weeks. (I have done this switch twice.) The solution to the foreign entities was to have a procedure lambda without a body by replacing the body with `---` and surrounding all foreign declarations in a `foreign` block. In Odin, a procedure without a body cannot be distinguished from a procedure type and thus there needed to be a way to specify a procedure literal/lambda without a body.

    1
    2
    3
    4
    foreign my_lib {
        some_var: i32;
        amazing_foo :: proc "c" (a, b: i32, c: f32) -> rawptr ---;
    }
    


    There are two reasons for this conflict that I have between qualifier-focused and name-focused. The first is that name-focused is elegant and terse to write compared to qualifier-focused. The second is the "ugliness" of qualifier-focused with conjunction with other forms of statements.

    Qualifier-focused looks "ugly" when it is combined with control statements:
    1
    2
    for var x = 0; x < 10; x += 1 {
    }
    

    The two keywords together in the `for var` block looks "dense and wrong" to me and makes reading the construct much more difficult. However, placing an open parenthesis in between the keywords reduces some of this "density":

    1
    2
    for (var x = 0; x < 10; x += 1) {
    }
    


    It does look slightly better but it is still "dense". These parentheses make it less "ugly" for some reason and it's not self apparent as to why the separation between the two words by punctuation improves matters.

    This is probably a reason as to why Go uses the `:=` operator, especially in this case:
    1
    2
    3
    4
    for x := 0; x < 10; x += 1 {
    }
    for idx, val := range array {
    }
    


    `:=` is a pragmatic solution to this aesthetic problem with the qualifier-focused `var`. However, this is not to say that the `:=` operator is great. In Go, it has extra semantics to make Go feel more like a "dynamic language" (variables will be shadowed with `:=`). Even if you had both `var` and `:=`, and that they did the exact same thing, it does beg the question: why have two things that do exactly the same thing?

    I have been researching the topic of syntax in language design for a long time now. It's been an interesting topic and I think I should actually write most of my findings. I hope this condensed explanation of the issues with regards to declaration syntax has aided others as to my predicament.

    - Bill
#14662 Anikki  —  2 months, 4 weeks ago

Qualifier-focused looks "ugly" when it is combined with control statements:
1
2
for var x = 0; x < 10; x += 1 {
}

The two keywords together in the `for var` block looks "dense and wrong" to me and makes reading the construct much more difficult. However, placing an open parenthesis in between the keywords reduces some of this "density":
1
2
for (var x = 0; x < 10; x += 1) {
}

It does look slightly better but it is still "dense". These parentheses make it less "ugly" for some reason and it's not self apparent as to why the separation between the two words by punctuation improves matters.

This is probably a reason as to why Go uses the `:=` operator, especially in this case:
1
2
3
4
for x := 0; x < 10; x += 1 {
}
for idx, val := range array {
}

`:=` is a pragmatic solution to this aesthetic problem with the qualifier-focused `var`
I think some languages get rid of the C-style for loop completely and replace it with a "for-each" loop and a "numeric range" loop.

Zig's for-each loop:
1
2
for (array) |item| {
}


Rust's "numeric range" loop:
1
2
for i in 0..5 {
}

I would personally like to be able to optionally (for the sake of readability) specify the types of the index and item being iterated over:

Imaginary syntax:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
var a: [3]s32 = [0, 0, 0];

// "for-each" loop with explicit typing on the index and item
for a -> i: s32, x: s32 {
}

// "for-each" loop with type inference on the index and item
for a -> i, x {
}

// "for-each" over the items of an array
for a -> x {
}

// "numeric loop" over the indices of an array
// a .. b  -> [a, b]
// a ..^ b -> [a, b)
// a ^.. b -> (a, b]
// a ^..^ b -> (a, b)
//
for 0 ..^ a.len -> i {
}

Anyway, I think rosettacode is a pretty good place to compare syntaxes of different programming languages:
https://www.rosettacode.org/wiki/Loops/For
https://www.rosettacode.org/wiki/Loops/Foreach
#14803 Ginger Bill  —  2 months, 2 weeks ago [Edited 0 minutes later]
One thing to consider is that C-style for loops are very useful for linked lists and other similar constructs. Especially, in a language that does not contain iterator types.

1
for node := list; node != nil ; node = node.next {}


Odin already supports range-based iteration in for-loops:
1
2
3
for i in 0..10 {} // up to and not including 10
for i in 0...9 {} // up to and including 9
for val, idx in 20..50 {} // val is in the range [20, 50), idx is in the range [0, 30)


However, this init variable assignment is not limited to a for loop in Odin:

1
2
if _, ok := x[key]; ok {}
if x := foo(); x != y {}


---

I am personally not a fan of Zig as I disagree with the fundamental philosophy of the language itself. Nor do I agree with many of the syntax decisions, include the loop syntaxes. I will not discuss my reasons as to why here.

#14826 Andrew Kelley  —  2 months, 2 weeks ago
gingerBill
I am personally not a fan of Zig as I disagree with the fundamental philosophy of the language itself. Nor do I agree with many of the syntax decisions, include the loop syntaxes. I will not discuss my reasons as to why here.


Here's the discussion we had over email:

Subject: concurrency

From: andrewrk
1. does Odin have or plan to have any hidden memory allocations?
2. does Odin plan to do concurrency beyond kernel threads similar to pthreads?

From: gingerBill
Hi Andrew,

1. "Hidden memory allocations" is very unspecific. I already do hidden stack allocations but most compilers do (but the optimizer should remove most of these).
However, all other allocations will use the context's allocator. The `context` is a thread local variable which allows users to "push" data onto it, such as an allocator.

This allocation system is used for the dynamic array and dynamic map types and for the built-in procedures `new` and `make`.

The main reasoning behind having an allocator system is that have the ability to allocate data in very specific ways is better for the program. I find the simplistic model of "the stack and the heap" to not reflect how the memory "flows".

The problems of memory can be spread across two section: memory size, and memory lifetime.

* 90% of the time, you will know the size needed as its lifetime. In this case, something like an arena/stack/pool/permanent allocator would be suitable.
* 9% of the time, you may now the lifetime but not the size (e.g. a dynamically growing array).
* ~1% of the time, you may not know the lifetime of the memory but you do know its size. For this a memory management sytem or reference counting or something may be a better option
* <1% of the time, you may not know the lifetime nor the size. In this case, something like compiler-side high-level ownership semantics or bog-standard garbage collection may the solution.

I _never_ have the last problem. All of the other problems are easily solved by having the ability to control how memory is allocated and freed. Even my compiler uses arenas extensively as I can know what the maximum amount of memory is needed.

------------

2. I have no idea yet. I'm thinking of adding old-fashioned co-routines as virtually all hardware can do it. LLVM has only recently added this feature however, I'm not sure how I should add this on a user-level.

There are numerous other forms of high-level concurrency that I cannot decide upon. However, I do not want any of them to use the allocator system if possible. I don't want it to do "hidden allocations". This does however limit the expressiveness of what is possible compared to other higher-level languages but Odin is meant to be a C replacement with low-level expressiveness.

Regards,
Bill

From: andrewrk
Is it planned for Odin to work in a system where overcommit is off? E.g. what happens when memory allocation fails when doing the "append" operation to a dynamic array, in the situation where you do not know the size?


From: gingerBill
That's entirely a library feature and not a language feature. The allocator determines if it handles that is handled not the language.

From: andrewrk
Can you walk me through a scenario where this happens and how it would be handled?

So for example:



to_c_string :: proc(s: string) -> []u8 {
c_str := make([]u8, len(s)+1);
copy(c_str, cast([]byte)s);
c_str[len(s)] = 0;
return c_str;
}

make_window :: proc(title: string, msg, height: int, window_proc: win32.Wnd_Proc) -> (Window, bool) {
if title[len(title)-1] != 0 {
w.c_title = to_c_string(title);
} else {
w.c_title = cast([]u8)title;
}
}

Let's say that whatever allocator you have set up here, when you call to_c_string, runs out of memory.
So the make([]u8, len(s)+1) does not have enough memory to do the make.
What happens to control flow?
More generally, what is the plan for how to deal with possible allocation failure?
Where is the allocator being used in this code?


From: andrewrk
Also, isn't this an invalid free when title[len(title)-1] == 0?

free(w.c_title);



From: gingerBill
How do you handle it in C or any other language? If you have got that problem, you have even bigger problems than that.

In general, the problem you are referring to is extremely rare and these sorts of problems must be handled appropriately _if_ they have ever happen. For this particular problem, it is probably better to just "panic" or even exit the program.

For small memory environments, this is where custom allocators will be a brilliant solution. You will have the control how that memory is used, allocated, and freed.

I want a language that allows developers to solve there problems easier. The hardware and OS is a part of their problem and not an abstract thing in the aether.


From: andrewrk
On Thu, Apr 27, 2017 at 1:06 PM, Ginger Bill <[email protected]> wrote:

How do you handle it in C or any other language? If you have got that problem, you have even bigger problems than that.


void *memory = malloc(count);
if (!memory) {
// clean up and return an error
}

Is there a way to detect that memory allocation failed with `make` in Odin?



In general, the problem you are referring to is extremely rare and these sorts of problems must be handled appropriately _if_ they have ever happen.


I don't understand the difference between a rare problem that must be handled if it happens, and a common problem that must be handled if it happens. Either way you need the same code, right?


For this particular problem, it is probably better to just "panic" or even exit the program.


Some applications will find it acceptable to panic in out of memory conditions. On the other hand, a robust, reusable library will clean up and return an error code in the event of an out of memory situation.



For small memory environments, this is where custom allocators will be a brilliant solution. You will have the control how that memory is used, allocated, and freed.


Sure, but there's a fundamental problem here. Example:

- Code A uses an allocator interface to allocate memory based on runtime information. The amount of memory allocated is runtime known only and may exceed some value N. Code A is defined in the standard library, maybe it's the to_c_string function.
- Code B defines an allocator and sets the allocator. The amount of memory available in this small memory environment is N - 1. Code B is defined in the programmer's application.

What happens when Code A runs using the allocator from Code B?


From: gingerBill
void *memory = malloc(count);
if (!memory) {
// clean up and return an error
}


With `make` or `new` you would do something similar. Just check to see if the output is `nil` or it's backing pointer is `nil`. In fact, if you know the exact allocator you are using, that allocator could have loads more information stored in it to report. It may even have a logging system or more! That is up to the user to decide what they need and want.

slice := make([]Type, len);
if slice == nil {
// handle error
}

I don't understand the difference between a rare problem that must be handled if it happens, and a common problem that must be handled if it happens. Either way you need the same code, right?


I don't want to design the language around a very very rare case, especially when it's not a problem with the language but the code. I want to "solve" 80% of the problems I normal have. If I wanted a very domain specific language, that is what I would design.

Some applications will find it acceptable to panic in out of memory conditions. On the other hand, a robust, reusable library will clean up and return an error code in the event of an out of memory situation.


In those cases, you have a different problem. I'm trying to make a language which has the swiss army knife for everyone -- does every job put poorly. If you don't think a `panic` is acceptable, you handle it differently. The advantage of having this amount of control with allocations is that you get to decide what is needed to solve your problem.

Sure, but there's a fundamental problem here. Example:
- Code A uses an allocator interface to allocate memory based on runtime information. The amount of memory allocated is runtime known only and may exceed some value N. Code A is defined in the standard library, maybe it's the to_c_string function.
- Code B defines an allocator and sets the allocator. The amount of memory available in this small memory environment is N - 1. Code B is defined in the programmer's application.
What happens when Code A runs using the allocator from Code B?



Have you ever actually encountered the problem you are talking about? How often have you came across it? How did you solve it? I am not that concerned about these very rare and very abstract problems. I want to solve actual real-world problems that I actually have.

---

Sorry for the rant-like tone of this email but I don't want to be concerned about such small problems like this and orientating the language around them.

Concurrency is a big problem which I don't have any definite answers to yet.
"Generics"/parametric polymorphism is another problem I'm not sure what I want, if even at all (i.e. is there a better metaprogramming solution for the problem than making the language "more complex"). Are semantic type-safe macros a better option which can do more?
Metaprogramming is another problem I need to think through more. How far do I go with it? Compile Time Execution? AST modification? Compiler insertion? External code generation?


Regards,
Bill


From: andrewrk
On Thu, Apr 27, 2017 at 2:32 PM, Ginger Bill <[email protected]> wrote:

Sure, but there's a fundamental problem here. Example:

- Code A uses an allocator interface to allocate memory based on runtime information. The amount of memory allocated is runtime known only and may exceed some value N. Code A is defined in the standard library, maybe it's the to_c_string function.
- Code B defines an allocator and sets the allocator. The amount of memory available in this small memory environment is N - 1. Code B is defined in the programmer's application.
What happens when Code A runs using the allocator from Code B?



Have you ever actually encountered the problem you are talking about? How often have you came across it? How did you solve it? I am not that concerned about these very rare and very abstract problems. I want to solve actual real-world problems that I actually have.


I don't think this is a rare and abstract problem. I think this is the most common problem that all code faces. You want to allocate memory, and that allocation can fail.



---

Sorry for the rant-like tone of this email but I don't want to be concerned about such small problems like this and orientating the language around them.

Concurrency is a big problem which I don't have any definite answers to yet.


Right, so I'm not just jerking your chain around. I asked about memory and hidden (non-stack) memory allocation because I think it is tightly coupled with concurrency. For example LLVM coroutines require a memory allocation (see http://llvm.org/docs/Coroutines.html#llvm-coro-alloc-intrinsic). For Zig, I'm not sure how this would work, because we don't have hidden allocations, and also we require explicitly handling allocation failure. So even calling a coroutine could potentially fail.


"Generics"/parametric polymorphism is another problem I'm not sure what I want, if even at all (i.e. is there a better metaprogramming solution for the problem than making the language "more complex"). Are semantic type-safe macros a better option which can do more?


I feel pretty happy about my solution to this problem. I took inspiration from Jai and functions can have `comptime` parameters. This means the parameter is known at compile time, and it's a compile error if you pass a non-compile-time-known value to a comptime parameter. Secondly, types must be comptime parameters. And then that's it, you have generics. So for example:

fn max(comptime T: type, a: T, b: T) -> T {
if (a > b) {
return a;
} else {
return b;
}
}

// call like this: max(f32, 1234, 5678)

Metaprogramming is another problem I need to think through more. How far do I go with it? Compile Time Execution? AST modification? Compiler insertion? External code generation?

As for metaprogramming, check out how printf is done in Zig: https://github.com/zig-lang/zig/blob/master/std/fmt.zig#L23
Not quite a macro, not quite metaprogramming, it's more like partial function evaluation. explanation

I'm not a fan of super crazy metaprogramming like compiler insertion. I think it makes code really hard to understand. It makes you paranoid that something fancy might be going on when you should be able to read straightforward control flow and data structures.

From: gingerBill
I think we fundamental differ on how we should treat error cases and this is probably why are "arguing".

My main questions for you on allocation errors are the following:

Have you ever personally had `malloc` or the likes fail?!
Do you check if `malloc`, et al succeeds or fails every time?
How often (if ever) do you use a form of custom allocators?

From: andrewrk
On Apr 27, 2017 4:56 PM, "Ginger Bill" <[email protected]> wrote:

I think we fundamental differ on how we should treat error cases and this is probably why are "arguing".

My main questions for you on allocation errors are the following:

Have you ever personally had `malloc` or the likes fail?!

Sure. I turned off overcommit and then allocated a big buffer and it returned null.

On Linux, usually overcommit is on, but it's a setting and can be turned off. Windows doesn't do overcommit.

The problem with panicking on out of mem is that some third party process could cause you to crash. Bullshit app A uses all the memory for a split second, and your app goes down.

Do you check if `malloc`, et al succeeds or fails every time?

Yes, every time.

How often (if ever) do you use a form of custom allocators?

Currently in the zig standard library, every function that needs to allocate memory takes an allocator parameter. So every memory allocation in zig uses a custom allocator.
Log in to comment