Skip to content

[Phase 2] Add Semantic Tokens for Advanced Highlighting #34

@vogella

Description

@vogella

Description

Implement semantic tokens in the LSP server to provide advanced syntax highlighting beyond what TextMate grammars offer. This enables context-aware highlighting for attribute references, macros, special AsciiDoc constructs, and more.

Current State

Syntax Highlighting: TextMate grammar handles basic syntax
Missing: Semantic understanding (attribute references, resolved vs. unresolved, etc.)

Background

Semantic tokens provide language-aware highlighting based on semantic analysis, not just regex patterns. Examples:

  • Highlight undefined attribute references differently than defined ones
  • Different colors for broken vs. valid links
  • Macro names vs. regular text
  • Block attributes vs. inline attributes

Required Changes

1. Add Semantic Tokens Capability

File: AsciidocLanguageServer.java

ServerCapabilities capabilities = new ServerCapabilities();

SemanticTokensWithRegistrationOptions semanticTokensOptions = new SemanticTokensWithRegistrationOptions();

// Define token types
semanticTokensOptions.setLegend(new SemanticTokensLegend(
    Arrays.asList(
        "namespace",    // Attribute definitions
        "class",        // Headers
        "function",     // Macros (image::, include::, etc.)
        "parameter",    // Attribute references {name}
        "variable",     // Block attributes [source, java]
        "string",       // Quoted strings
        "comment",      // Comment blocks
        "keyword",      // Special keywords
        "operator",     // Delimiters
        "type"          // Types/roles
    ),
    Arrays.asList(
        "declaration",  // Attribute declarations
        "definition",   // Definitions
        "readonly",     // Built-in attributes
        "deprecated",   // Deprecated syntax
        "documentation" // Documentation blocks
    )
));

semanticTokensOptions.setFull(true);
semanticTokensOptions.setRange(false);

capabilities.setSemanticTokensProvider(semanticTokensOptions);

2. Implement Semantic Tokens Provider

File: AsciidocTextDocumentService.java

@Override
public CompletableFuture<SemanticTokens> semanticTokensFull(SemanticTokensParams params) {
    String uri = params.getTextDocument().getUri();
    AsciidocDocumentModel model = documentCache.get(uri);
    
    if (model == null) {
        return CompletableFuture.completedFuture(new SemanticTokens(Collections.emptyList()));
    }
    
    List<Integer> data = new ArrayList<>();
    List<String> lines = model.getLines();
    
    // Extract defined attributes for validation
    Set<String> definedAttributes = extractDefinedAttributes(model);
    
    int prevLine = 0;
    int prevChar = 0;
    
    for (int i = 0; i < lines.size(); i++) {
        String line = lines.get(i);
        
        // Tokenize line
        List<SemanticToken> tokens = tokenizeLine(line, i, definedAttributes);
        
        // Encode tokens in LSP format
        for (SemanticToken token : tokens) {
            int deltaLine = token.line - prevLine;
            int deltaChar = (deltaLine == 0) ? (token.startChar - prevChar) : token.startChar;
            
            data.add(deltaLine);
            data.add(deltaChar);
            data.add(token.length);
            data.add(token.tokenType);
            data.add(token.tokenModifiers);
            
            prevLine = token.line;
            prevChar = token.startChar;
        }
    }
    
    return CompletableFuture.completedFuture(new SemanticTokens(data));
}

3. Tokenize Line

private List<SemanticToken> tokenizeLine(String line, int lineNum, Set<String> definedAttributes) {
    List<SemanticToken> tokens = new ArrayList<>();
    
    // Attribute definitions (:name: value)
    tokens.addAll(tokenizeAttributeDefinitions(line, lineNum));
    
    // Attribute references {name}
    tokens.addAll(tokenizeAttributeReferences(line, lineNum, definedAttributes));
    
    // Headers (=, ==, ===)
    tokens.addAll(tokenizeHeaders(line, lineNum));
    
    // Macros (image::, include::, link:)
    tokens.addAll(tokenizeMacros(line, lineNum));
    
    // Block attributes [source, java]
    tokens.addAll(tokenizeBlockAttributes(line, lineNum));
    
    // Inline formatting (*bold*, _italic_, `mono`)
    tokens.addAll(tokenizeInlineFormatting(line, lineNum));
    
    return tokens;
}

private static class SemanticToken {
    int line;
    int startChar;
    int length;
    int tokenType;
    int tokenModifiers;
    
    SemanticToken(int line, int startChar, int length, int tokenType, int tokenModifiers) {
        this.line = line;
        this.startChar = startChar;
        this.length = length;
        this.tokenType = tokenType;
        this.tokenModifiers = tokenModifiers;
    }
}

4. Tokenize Attribute Definitions

private List<SemanticToken> tokenizeAttributeDefinitions(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    String trimmed = line.trim();
    
    if (trimmed.startsWith(":") && trimmed.contains(":") && trimmed.lastIndexOf(':') > 0) {
        Pattern pattern = Pattern.compile("^:([^:]+):");
        Matcher matcher = pattern.matcher(trimmed);
        
        if (matcher.find()) {
            int startPos = line.indexOf(':');
            String attrName = matcher.group(1);
            
            // Token for attribute name (namespace + declaration modifier)
            tokens.add(new SemanticToken(
                lineNum,
                startPos + 1,
                attrName.length(),
                0, // namespace
                1  // declaration modifier
            ));
        }
    }
    
    return tokens;
}

5. Tokenize Attribute References

private List<SemanticToken> tokenizeAttributeReferences(String line, int lineNum, 
                                                         Set<String> definedAttributes) {
    List<SemanticToken> tokens = new ArrayList<>();
    Pattern pattern = Pattern.compile("\\{([^}]+)\\}");
    Matcher matcher = pattern.matcher(line);
    
    while (matcher.find()) {
        String attrName = matcher.group(1);
        int startPos = matcher.start() + 1; // Skip opening {
        
        boolean isDefined = definedAttributes.contains(attrName) || isBuiltInAttribute(attrName);
        
        // Token for attribute reference (parameter + modifier based on definition)
        tokens.add(new SemanticToken(
            lineNum,
            startPos,
            attrName.length(),
            3, // parameter
            isDefined ? 4 : 0 // readonly if built-in, 0 if undefined
        ));
    }
    
    return tokens;
}

6. Tokenize Headers

private List<SemanticToken> tokenizeHeaders(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    String trimmed = line.trim();
    
    if (trimmed.startsWith("=") && !trimmed.startsWith("====")) {
        int level = 0;
        while (level < trimmed.length() && trimmed.charAt(level) == '=') {
            level++;
        }
        
        int startPos = line.indexOf('=');
        String headerText = trimmed.substring(level).trim();
        int textStart = line.indexOf(headerText);
        
        // Token for header text (class)
        if (textStart >= 0) {
            tokens.add(new SemanticToken(
                lineNum,
                textStart,
                headerText.length(),
                1, // class
                2  // definition modifier
            ));
        }
    }
    
    return tokens;
}

7. Tokenize Macros

private List<SemanticToken> tokenizeMacros(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    
    // image::, include::, link:, etc.
    Pattern pattern = Pattern.compile("(image|include|link|kbd|btn|menu)::");
    Matcher matcher = pattern.matcher(line);
    
    while (matcher.find()) {
        String macroName = matcher.group(1);
        int startPos = matcher.start();
        
        // Token for macro name (function)
        tokens.add(new SemanticToken(
            lineNum,
            startPos,
            macroName.length(),
            2, // function
            0
        ));
    }
    
    return tokens;
}

8. Tokenize Block Attributes

private List<SemanticToken> tokenizeBlockAttributes(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    String trimmed = line.trim();
    
    if (trimmed.startsWith("[") && trimmed.endsWith("]")) {
        Pattern pattern = Pattern.compile("\\[([^\\]]+)\\]");
        Matcher matcher = pattern.matcher(trimmed);
        
        if (matcher.find()) {
            String attributes = matcher.group(1);
            int startPos = line.indexOf('[') + 1;
            
            // Token for block attributes (variable)
            tokens.add(new SemanticToken(
                lineNum,
                startPos,
                attributes.length(),
                4, // variable
                0
            ));
        }
    }
    
    return tokens;
}

Testing Checklist

Attribute Tokens

  • Attribute definitions highlighted differently
  • Defined attribute references colored correctly
  • Undefined attribute references stand out
  • Built-in attributes recognized

Header Tokens

  • Header text highlighted semantically
  • Different header levels distinguishable

Macro Tokens

  • Macro names (image::, include::) highlighted
  • Distinguishable from regular text

Block Attribute Tokens

  • Block attributes [source, java] highlighted
  • Roles and options colored correctly

General

  • Semantic highlighting updates on edit
  • No conflicts with TextMate grammar
  • Colors configured in Eclipse theme
  • Performance acceptable

Files to Modify

  • com.vogella.lsp.asciidoc.server/src/.../AsciidocLanguageServer.java
  • com.vogella.lsp.asciidoc.server/src/.../AsciidocTextDocumentService.java

Dependencies

Success Criteria

  1. ✅ Attribute definitions highlighted
  2. ✅ Attribute references colored by status (defined/undefined)
  3. ✅ Headers highlighted semantically
  4. ✅ Macros stand out from text
  5. ✅ Block attributes colored correctly
  6. ✅ Performance acceptable
  7. ✅ Works with Eclipse color themes

Estimated Effort

2-3 days (complex feature)

Priority

Low - Nice enhancement, not critical

Related Issues

Notes

  • Semantic tokens supplement TextMate, not replace
  • LSP4E semantic token support may have limitations - test thoroughly
  • Token types/modifiers should map to Eclipse theme colors
  • Consider performance with very large documents
  • May need incremental updates (range support) for better performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions