Ignorance may be Strength : LSP Navigation and Completion

The final things I want to do before moving on are to navigate to elements and to complete them. This demonstrates that we have at least a basic understanding of the program we are analyzing.

Building up Knowledge

The first thing to do is to build a repository of definitions encountered by the compiler.

In a real compiler, this is obviously the main purpose of parsing, but having ignored that up to now for the purposes of this exercise, I need to add building a repository to the SimpleParser. For simplicity, I am just going to capture the definitions of cards and contracts.

In order to be ready before users open files, we need to parse all the files in the workspace during initialization and store all the definitions in the repository. I want to make a few points here before moving on.

In the real world, this calls for quite a cunning data structure, in that it needs to be able to refreshed quite frequently (every time we receive a didChange event) and incrementally (removing all the definitions from the damaged areas) without leaving any dangling links (if we allow references) and (for the purposes of completion) being able to find all the tokens matching some typed text (possibly in some preference order such as usage); for this demo, I am just going to keep it small, simple and brute force.
The IgnorantLanguageServer class now seems to be growing too big and have too many responsibilities: if this were a blog about clean code, I would have a post or two about how I broke this up and found the ideal pattern that describes it; but for now, I am just going to keep putting more miscellaneous material in here as long as it comes under the very general heading of "coordination".

As soon as I started to do this, I realized that the protocol I was using on the server side was different to the client side … of course my dependencies are out of date. I upgraded to version 0.9.0 of the lsp4j library and adjusted to cope with the fallout.

In line with the general air of insouciance which pervades this article - if not indeed this entire blog - I am not going to parse the files at all carefully, but just go after what I think should be there, being just careful enough not to cause any fatal exceptions. The important thing is to capture, for each interesting definition, its name and its location.

Because this is an incremental, repeated process, we need to clear out the repository every time before we parse a file. We do this by scanning through the repository looking at each entry in turn. Because we are parsing an entire file at a time, we only need to match the entry's URI although if we were interested in handling smaller changes we could also look at line numbers (and even possibly character positions) and clear out at that level. As noted above, our repository is just storing all the items in a TreeSet, so we just brute-force our way through the entries and remove the ones with a matching URI.

Go to Definition

In order to handle the "Go to Definition" functionality, we need to inform the client that we are able to do this.

For some reason, this requires two steps. First, we must add the capability in the initialize method of the IgnorantLanguageServer, and then we must implement a new method (declaration) in the TextDocumentService. I have to say I don't like this approach. Maybe it's just because I haven't caught up with default methods in Java interfaces, but it seems to me that a better pattern would be to say in initialize "here is an implementation of a (one function) interface that supports declaration".

Anyway, the intent of the declaration method is to find all the locations where there might be a declaration of a symbol. Not, of course, that you get the symbol per se - you get the location of the cursor when the user asks for the definition. One approach would be to have the parser record all instances of all symbols and then have a location to location table; while I think there is a lot to be said for this (for example, it would make Find Usages easy to implement) it seems to be more work than I want to do right now. So instead, I'm going to go to the relevant file and line and try and identify the token around that location.

This feels like a responsibility for the parser, so I'll put the code there. I don't like allowing abstraction bleed, but I'm going to pass across an LSP Position because I don't at the moment have my own position abstraction as such (I just stored the primitive line and character position in name) so if I were going to take this further I would need to refactor (my real compiler, of course, does have a very clear, and quite rich, position abstraction). Because this method does not come with any file text, I also had to store the text of all files during parsing so that I can read it back now (it would of course be possible to read them from the disk).

Having found the token (an exercise in string manipulation), we then ask the repository to hand back a list of locations where it is defined. While (logically) that might be only one, in a "live" compiler where users can type anything, it is not unreasonable to suppose that there might be multiple definitions; or indeed, that ambiguity might exist as to which of several matching definitions might be intended. The repository scans through all its definitions and sees which match. Again, for convenience, I have let the LSP abstractions bleed through into the repository; obviously the code in the ParsingTextDocumentService should adapt these.

And hey presto, we are done. If you click on the uses of HitMe or Multiplier in the contract.st file, and select "Go to Definition" from the popup menu, the appropriate definition in contract.fl will be selected.

Completion

Completion works by you typing some characters and the editor offering you a selection of possible completions. Obviously, this list should be ordered in a way which makes the more likely selections come first and contains possible and partial matches as well as the "obvious" ones. I'm not going to do any of that. I'm just going to offer completions for which what the user has typed is a proper prefix. Yes, it is because I'm lazy - but my justification is that all of that hard work is not relevant to the integration.

As before, the first thing is to announce that we can do this. This again comes in two parts: we need to specify the capability and then we need to implement the completion method. In this case, however, the capability is more than just a boolean value, it wants a boolean saying if it is a "resolve provider" and a list of "trigger characters". It is not entirely clear what it means to be a "resolve provider" but I think it is reasonable to say "no" for now; googling around, the idea behind "trigger characters" seems to be that not everything you type in the editor will force it to the expense of a round trip to the server to obtain completions. It seems by default, alphanumeric characters will; if your language wants to extend this, it is possible to specify "trigger characters" that will force completion logic (think < in HTML).

The implementation is much the same as before: try and figure out the token and then ask the repository to complete it. Again, I let the LSP abstractions bleed over into the repository code.

And just like that, it works!

Resolving Completions

It would seem that there is a lot more information that can be communicated about completions than just the text to fill in. Some of this (such as the icon to display) probably needs to be returned at the same time as the original completion item to be useful; but other data (such as how to insert it, etc) is only relevant if you choose to do the insert.

Given that collecting and transmitting this data is probably expensive, I think the logic is that the completion() method returns a list of all the candidates and the resolve() method asks for detailed information about only the selected candidate.

I did a brief - and not very interesting - foray into this area to finish up this work.

Conclusion

Wiring up the LSP services does not seem that hard. It seems to be mainly a question of implementing the correct parsing and repository operations on the back end and adapting between the expectations of the LSP protocol and how you have your information stored. In a real system, a sophisticated repository is essential to good performance.

Ignorance may be Strength

Thursday, September 24, 2020

LSP Navigation and Completion

Building up Knowledge

Go to Definition

Completion

Resolving Completions

Conclusion

No comments:

Post a Comment