Passerine


Over the past few years or so, an idea for a novel programming language has been simmering in my mind. I've been calling this language Passerine, out of a lack of a better name. In short, Passerine is a small core language that is easily extensible, which employs clever memory-management and execution techniques. Passerine has roots in Scheme and ML-flavored languages — it arose from the slight dissatisfaction yet enormous potential I feel those languages have.

So, how does Passerine work? At its core, Passerine is lambda-calculus with pattern-matching, structural types, fiber-based concurrency, and syntactic extension. At the core of Passerine lies the pattern, which when given an object, will de-structure it into a set of bindings. It's best to just show you an example. Here, we're de-structuring a tuple into a set of bindings:

(x, y) = (1, (2, 3))

In this case, x would be bound to 1, and y would be bound to the tuple (2, 3). The = sign in the above example denotes assignment, mimetic of other programming languages.

To encapsulate computations, functions are used. Functions take n patterns and an expression whose evaluation is delayed. Upon calling a function, the arguments are matched against the parametric patterns, and the expression in then evaluated in a new scope containing the produced bindings. Here, we're defining a function that increments a number:

n -> n + 1

The -> (read lambda) operator constructs a function, with the pattern being on the left, and the expression being on the right. Of course, the above function merely produces a function object. To evaluate a function, it must be called; to call a function, simply append the necessary arguments:

(n -> n + 1) 3

The value of this expression would be 4. Parenthesis are used for grouping; otherwise, the semantics of the above expression might be ambiguous, or straight-up incorrect. Assignment, in combination with lambdas is a powerful thing. To name a function, simply assign it to an identifier. For example, here's a definition of Fibonacci number calculation through recursive means:

fib = n -> if n <= 1 {
    1
} else {
    fib (n - 1) + fib (n - 2)
}

Curly-braces define a block which can be thought of as fancy grouping. Within a block, n expressions can be listed, separated by newlines or semicolons; the block takes on the value of the last expression. Although parenthesis would've sufficed in the above example, it's stylistically more coherent to use parenthesis for grouping, and blocks for regions of computation. So, for example, the block:

{
    x = 1; y = 2
    x + y
}

Would have the value 3. At this point, you may be thinking that the syntax being used is a bit obscure. After all, in how many other languages are all functions anonymous and defined with ->? Rather than obscure, I like to think of this syntax as simple - we've already covered 3/5ths of the core language. So what remains? Just syntactic-extension and pattern-matching.

To extent passerine's syntax, syntactic-extensions are used. Extensions, quite simply, are bits of code that hygienically produce more code when invoked at compile time. Extensions use a simple yet powerful templating language to transform code. Here's a simple example of a swap operator:

syntax a '<-> b {
    tmp = a
    a = b
    b = tmp
}

x = 7
y = 3
x <-> y

Extensions are defined with the syntax keyword, followed by n arguments, followed by the code that the captured arguments will be spliced into. Note that the above code is completely hygienic. the expanded macro looks something like this:

_tmp = x
x = y
x = _tmp

because tmp was not passed as a syntactic parameter, all uses of tmp in the function body are unique unrepresentable variables that do not collide with any other variables currently bound in scope. Essentially:

tmp = 1
x = 2
y = 3
x <-> y

Will not affect the value of tmp; tmp will still be 1. So, what does the syntax look like? We're talking about what goes in between:

syntax ... {}

Each item between syntax and the body is referred to as a syntactic argument. arguments can be:

So, first off, what are textual identifiers? Identifiers are literal names that must be present for the pattern to match. Each syntactic extension is required to have at least one. For example, an extension that matches a for loop:

syntax 'for binding 'in values do { ... }

In this case, 'for and 'in are textual identifiers. For example, this definition could be used as follows:

for a in [1, 2, 3] {
    print a
}

Here, a is the binding, [1, 2, 3] are the values, and { print a } is the do. Operators can be defined this way as well:

syntax sequence 'contains value {
    c = s -> {
        h:t = s
        if h == val {
            True
        } else {
            match t {
                []   -> False
                rest -> c rest
            }
        }
    }

    c sequence
}

This defines a contains operator that could be used as follows:

print {
    if [1, 2, 3] contains 2 {
        "It contains 2"
    } else {
        "It does not contain 2"
    }
}

Evidently, It contains 2 would be printed. What are modifiers? Modifiers are postfix symbols that allow for argument flexibility. The following modifiers exist:

Additionally, parenthesis are used for grouping, and { ... } are used to match expressions within blocks. Here are some syntactic arguments that match an if else statement like this one:

if x == 0 {
    print "Zero"
} else if x % 2 == 0 {
    print "Even"
} else {
    print "Not even or zero"
}

The syntactic arg. must match a beginning if:

syntax 'if { ... }

Then, a condition:

syntax 'if condition { ... }

Then, the first block:

syntax 'if condition then { ... }

Next, we'll need a number of else if <condition> statements:

syntax 'if condition then ('else 'if other do)... { ... }

Followed by a required closing else (Passerine is expression oriented and type-checked, so a closing else ensures that an if expression always returns a value.):

syntax 'if condition then ('else 'if other do)... 'else finally { ... }

Of course, if statements are already baked into the language - let's try something else - a match expression. A match expression takes a value and a number of functions, and tries to apply the value to each function until one successfully matches and runs. A match statement looks as follows:

x = Some (1, 2, 3)

match x {
    Some (m, x, b) -> m * x + b
    None           -> 0
}

If no branches are matched, an error is raised. The syntactic arguments for this extension follows the following layout:

syntax 'match value { arms... } { ... }

Note:

At this point, I realized that full functional anonymity makes non-local returns really hard; Here's what I'm getting at: say we have the following definition:

foo = a -> {
    bar = n -> {
        return n + 1
    }
    bar a
}

Now obviously, we'd expect the return in bar to return from bar, simple enough. Now look at the following example:

-- takes a list and position,
-- returns `Some index` if the value exists,
-- `None` otherwise.
find = list value -> {
    for (index, v) in (enumerate list) {
        if v == value {
            return Some index
        }
    }   

    return None
}

Now this is where it gets tricky. for is a macro that construct an invisible function, so the above really looks more like the following:

find = list value -> {
    _body = (index, v) -> if v == value {
        return Some index
    }
    _loop = _remaining {
        _maybe_item = (head _remaining)
        _remaining  = (tail _remaining)
        if _item is Some _ {
            Some _item = _maybe_item
            body _item
            _loop _remaining
        }
    }
}

The main issue with this is that the return in the generated function _body is ambiguous - does it return from _body, or does it return from find?

One way to get around this is by explicitly naming which function is being returned from; for instance:

find = list value -> {
    for (index, v) in (enumerate list) {
        if v == value {
            find return Some index -- note func return body
        }
    }   

    return None
}

The issue with this is that it's slightly more verbose, and doesn't work for unnamed functions (unless you want to get all super self). Another method is just dropping the imperative style entirely; for instance:

find = list value -> {
    let [head | tail] = list
    if head == value {
        Some 0
    } else {
        match (find tail value) {
            Some position -> Some (1 + position)
            None -> None
        }
    }
}

This, while pretty cool, is limiting. Writing functional code is nice and all, but imperative style code is easier to follow. If you think of a semantical solution that doesn't require any syntactic changes, please reach out :)

main challenge - how to effectively implement returning and yielding?

{
    for arm in arms {
        let result = try (value arm)
        if result is (Ok v) {

        } else if result is (Err (Match _)) {
            ()
        } else {
            raise result
        }
    }
}