Filling out source locations in Idris
The mainline branch of Idris now has support for source location reflection. This means that Idris code can be informed about which line and column of which file it occurs on. In this post, I’ll walk you through how this can be used for implementing programmer tools, such as a facility similar to Haskell’s error function. Then, I’ll briefly discuss how the feature is implemented and the implications for referential transparency and claims of purity.
Note that I’m not claiming any major innovation here – this blog post exists only to explain a feature that I took from scala-virtualized and implemented in Idris.
Terminating with an error
The error function stops execution immediately, printing a user-supplied message. Because it will never return, it should inhabit every type, but it should most assuredly not pass the totality checker.
It is straightforward to implement error using Idris’s unsafe features. The C FFI can be used to call exit, terminating the program, and unsafePerformIO allows us to call arbitrary C functions in a pure context. The built-in primitive believe_me enables us to subvert the type checker. Combining these ingredients yields:
||| Terminate the program after printing a user-specified error message. |
||| |
||| @ message The error to print |
partial |
error : (message : String) -> a |
error message = |
believe_me . unsafePerformIO $ |
do putStrLn message |
exit 1 |
If this function is ever called, it means that something has gone wrong, and we have a bug in our program. However, the error itself may not be particularly helpful: there may be insufficient clues to determine why it occurred. Knowing where it occurred in the source code can make it much easier to debug the problem. Let’s extend error to display its call site along with the message.
Source Locations for Dynamic Errors
Broadly speaking, Idris functions accept two kinds of arguments: explicit arguments, which the user directly provides, and implicit arguments, which the compiler is expected to work out for itself in most situations. An explicit argument can be provided implicitly by writing and underscore in its position, and an implicit argument can be provided explicitly by writing it in curly braces.
Implicit arguments are normally solved by unification. In other words, something else in the program will typically cause there to be precisely one possible value that would still type check, so the compiler is able to fill it out. When defining a function, you can specify an alternative way to fill out the implicit argument, such as by providing a default value or invoking the proof search system. Additionally, the default value need not be literal Idris code: it can also be a tactic script to be executed during elaboration, which is what we call the process of translating the high level Idris code that users write into the highly tedious, fully explicit core language that the compiler then proceeds to optimize and send off to the individual backends. Typical tactics include things like refine, which applies a name to the goal, generating subgoals as necessary for its arguments, and try, which attempts one tactic and falls back to another if the first fails.
data SourceLocation : Type where |
FileLoc : String -> (Int, Int) -> (Int, Int) -> SourceLocation |
The first argument to the constructor is the source file name, and the second and third are the line and column at the beginning and end of the relevant expression. Because not all of the parser has been updated to save source spans, the second and third arguments are often identical for now, but this will eventually be fixed.
To improve the messages provided by error, we need to arrange for the sourceLocation tactic to be used to fill out an implicit source context argument. Then, we can destructure this implicit argument using ordinary pattern matching to recover the filename, line number, and column number to report. The updated error function (which is in the Idris library in the Debug.Error package) is:
||| Terminate the program after printing a user-specified error message. |
||| |
||| @ loc The source location to display for the error |
||| @ message The error to print |
partial |
error : {default tactics {sourceLocation} loc : SourceLocation} -> |
(message : String) -> |
a |
error {loc = FileLoc filename (line, col) _} message = |
believe_me . unsafePerformIO $ |
do let place = filename ++ " line " ++ show line ++ " column " ++ show col |
let message' = place ++ ": " ++ message |
putStrLn message' |
exit 1 |
The same basic technique can also be used to report source locations for errors in deeply embedded langauges that aren’t easily encoded in the type system. Note that higher-level features, such as an assertions system built on top of error, can provide loc explicitly, enabling them to override the displayed source code location with their own. This straightforwardly allows a fine-grained control over the level at which errors should be displayed.
Implementation
Idris’s expression elaborator is a recursive traversal of the expression. Elaboration occurs within a monad that resembles a tactic-based interactive theorem prover, with a present goal and list of assumptions, a stack of remaining goals to be solved, and tactics that manipulate this state. Smaller tatics can be composed to create larger tactics. For details, check out Edwin’s JFP paper from last year.
The high-level Idris AST already contains source code information, which is used to report static error messages such as unification errors. To implement sourceLocation, the already-existing source locations are simply used to generate Idris-level source locations. Because not every term resulting from the parser contains a source location, the closest containing location is passed recursively in the elaborator.
Purity
Does this change mean that Idris is no longer a pure language? After all, the following function will return a different value depending on where it is called:
getLocation : {default tactics {sourceLocation} loc : SourceLocation} -> SourceLocation |
getLocation {loc} = loc |
Even mere changes to whitespace could cause a program that uses getLoc to compute a radically different value!
However, this does not affect the purity of the uderlying core language. From the perspective of the core, when you move a call to getLoc to a new location, the term itself changes, and the constant value of loc is simply changed. While this feature could most assuredly be abused, it is surely no worse than type class resolution or proof search when it comes to purity.
If you find a creative use for the sourceLocation tactic or if you use it in a serious manner, please let me know! I’d love to hear how it works for real-world users.