A Pretty-Printer That Says What it Means

Posted in Idris, Pretty printing

Idris supports semantic highlighting of compiler output. In this post, I’ll sketch what the feature brings users, why it’s interesting, what inspired it, and how it’s implemented in the Idris compiler. Rather than spending a bunch of time describing a series of screenshots, here’s a quick video of it in action, inside of idris-mode for Emacs:

Note that all the interesting features there are implemented in the Idris compiler (specifically, the pretty-printer), and that Emacs does very little work to use these features. They are available for any editor, and partially at the command line REPL, in a manner that can be incrementally implemented to get more and more features.

I showed Simon Peyton Jones a quick demo at the Haskell Implementors’ Workshop in Gothenburg today, and he encouraged me to write it up, thinking that there might be more general interest. So here we go!

Context

The Idris compiler and REPL have two output modes: one intended for direct human interaction in a terminal, and one intended for integration into external tools. When in console mode, Idris emits ANSI codes in its output stream that place semantically relevant font information into the interpreter output. However, we’re limited by what terminals support. When working with more expressive output contexts, we can present much more than just fonts.

Output intended for other programs, when Idris is running in IDE mode, consists of S-expressions that represent the semantics of the output. Sometimes, the entire sense of the message can be a machine readable message, but other times annotated strings are sent. Examples of the first include a notification that a file was successfully type checked and a signal indicating that the user has completed a tactic proof, and examples of the second include error messages and REPL output. In this post, I’ll mostly focus on how annotated strings are generated, as it’s the most fun part. I’m focusing on the underlying ideas here, so I’m glossing over some details and type signatures may not match the real ones in the Idris code base.

Having colors in the output from the Idris interpreter was originally inspired by a demonstration of Julia at OPLSS 2013, by Leah Hanson. I was really impressed by how the colored output from the REPL was helpful in understanding the results and distinguishing between program output and expression values. I was also feeling envious of Agda’s nice semantic coloring in Emacs buffers.

The Pretty-Printer

All output from the Idris compiler is generated by the Idris pretty-printer. In this pretty-printer, we need to do two things: (1) generate strings from compiler data structures, and (2) maintain the link between regions of these strings and semantically interesting information about them. For example, the compiler needs to know that the “Nat” in “Z : Nat” is a type constructor and that “Z” is a data constructor, and furthermore that both really occur in the namespace “Prelude.Nat”, but are being displayed in a truncated form. That way, the full disambiguated name can be used to do things like looking up documentation.

It is not possible, however, to maintain the link between regions of text and semantic annotations during pretty-printing, because the whole reason we have a pretty-printer is to free us from the burden of performing specific text formatting. The pretty-printing library combinators generate an abstract type Doc, which can later be rendered to specific output contexts, such as terminals of different widths.

The approach taken in Idris to bridge this gap is to replace the type Doc with a type constructor Doc a, where the type a represents the type of the semantic annotations in the document. Then, a single additional combinator annotate :: a -> Doc a -> Doc a can be used to attach elements of the semantic annotation type to the document.

The Idris pretty-printer is based on wl-pprint, and its available as a mostly-API-compatible library on Hackage called annotated-wl-pprint. Please refer to the original library documentation or to Phil Wadler’s paper "A Prettier Printer" for the basics of the approach. Rendering, in wl-pprint, converts a Doc to a SimpleDoc, which is specialized to a particular output width. Then, various functions are available to convert a SimpleDoc to a String and to write it to an output strea in IO. In annotated-wl-pprint, both Doc and SimpleDoc are parameterized over annotations, and additional String-rendering functions are available that can use the semantic information.

The original rendering functions are still available in annotated-wl-pprint: they merely discard the annotations. However, two new rendering functions are available. The first, called displayDecorated, take a decorator function with type a -> String -> String as an argument, and uses this to rewrite each substring as it is generated. This function is used in Idris to generated console output. The second, called displaySpans, generates a pair consisting of the ordinary string representation as well as a list of text spans and their corresponding annotations. This is used in Idris to generate machine-readable IDE output.

Doc and SimpleDoc are Functors

The constructors of Doc a and SimpleDoc a are not exported, as they must maintain some invariants that the combinator language ensures. However, they have instances of Functor, which means that post-processing and rendering of user representations of the semantic annotations is just an fmap away. This is very useful in Idris, and I would like to thank Philippa Cowderoy for pointing out to me on IRC that pretty-printer documents are in fact functors.

For example, when the pretty-printer encounters a global name in a term, it simply saves its fully-qualified form in an annotation. Later, we can add the additional information (such as documentation summaries and type signatures) that is sent to IDEs. This enables the pretty-printer to be independent of the global definition context in Idris, enabling greater modularity. Additionally, the type system can be used to ensure that various processing steps have been carried out, although this could use some refactoring in Idris at the moment.

This structure enables a clean separation between the pretty-printing of textual output and the processing of semantic information. Additionally, the technique is also useful when displaying documentation, where parsed Markdown formatting information is inserted in the strings to be displayed, and to associate source locations in error messages with a declarative specification of the full filename and line and column numbers.

Active Terms in Idris

In the demo that I linked to at the top of this post, you can see editor commands to show the implicit arguments to already-printed terms and to normalize them on demand. This is implemented by annotating subterms in the output record with a serialized representation of their corresponding term in Idris’s core language, as well as some information about the names available in their scope. These terms are then attached as a semantic annotation. When these annotations are rendered for IDE clients, they are base64-encoded.

Idris’s IDE protocol has commands that can manipulate these encoded terms, pretty-printing them with various options, potentially after normalization. Naturally, these commands annotate their output with a serialized representation of the resulting term, allowing them to be used again.

Remaining

The current implementation of annotated-wl-pprint is lacking one important rendering function, which would allow actions in a monad to delimit output strings. This would enable the display of semantic colors on terminals in Windows, which don’t support ANSI color codes. Additionally, I’m quite sure that I can attach more information that would be useful.

Do you have any nifty ideas for idris-mode? Are you working on Idris support from another editor? Is there something in the above that should be done a better way? Do you need help using annotated-wl-pprint in your own project? Please don’t hesitate to get in touch!

Update, 26 January, 2015

Trevor Elliott has implemented annotation support for the pretty library, which has a Hughes-PJ-style interface, so if your project uses that API, you can more easily begin to use semantic annotations. See pull request #19 for the gory details.

Comments

Last time I needed coloured output from a Haskell program I defined some simple readable colour codes that were used throughout the code, something like "Text in !1red!7 and !4blue". Then I added a final step that output the text using System.Console.ANSI so it worked on both Windows and Linux without complicating the structure of your code.

This is a very neat idea. I hope that someone implements something like it for GHC.

Simon