Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic pattern skips non-letter characters at the end #3014

Closed
TIMONz1535 opened this issue Dec 28, 2024 · 2 comments
Closed

Generic pattern skips non-letter characters at the end #3014

TIMONz1535 opened this issue Dec 28, 2024 · 2 comments

Comments

@TIMONz1535
Copy link
Contributor

TIMONz1535 commented Dec 28, 2024

How are you using the lua-language-server?

Visual Studio Code Extension (sumneko.lua)

Which OS are you using?

Windows

What is the issue affecting?

Annotations

Expected Behaviour

feature #2484

Generic pattern can't recognize non-letter characters after `T` for example `T`.Base or `T`*

There is my example

---@generic T
---@param t MainContainer.`T`
---@return T
local function SubContainerClass(t) end

local mainContainerBase = SubContainerClass('Base')           -- MainContainer.Base
local mainContainerSubExtra = SubContainerClass('SubExtra')   -- MainContainer.SubExtra

---@generic T
---@param t `T`.Base
---@return T
local function ContainerClassBase(t) end

local mainContainerBase = ContainerClassBase('MainContainer') -- BUG: MainContainer instead of MainContainer.Base
local extraContainerBase = ContainerClassBase('ExtraContainer') -- BUG: ExtraContainer instead of ExtraContainer.Base

---@generic T
---@param t `T`*
---@return T
local function GetObjectPointer(t) end

---@generic T
---@param t `T`2
---@return T
local function GetObject2(t) end

local entityPointer = GetObjectPointer('Entity') -- BUG: Entity instead of Entity*
local entity2 = GetObject2('Entity') -- BUG: Entity instead of Entity2

Actual Behaviour

{8C546A71-BC93-4889-B8CB-60BBA6C99E42}

---@generic T
---@param t1 A.`T`Base -- works
---@param t2 A`T`Base -- works
---@param t3 A*`T` -- works
---@param t4 A`T`.Base -- nothing after `T`
---@param t5 `T`.Base -- nothing after `T`
---@param t6 `T`Base -- nothing after `T`
---@param t7 A*`T`* -- nothing after `T`
---@param t8 `T`* -- nothing after `T`
---@param t9 A`T`2 -- seems to be broken at all
---@return T
local function Test(t1,t2,t3,t4,t5,t6,t7,t8,t9) end

Reproduction steps

  1. Create generic patter with *, . or number after `T`
  2. There is no more definition after `T`

Additional Notes

Your class name can be very weird, but it works.
class

I just want to use pointer Class*

Log File

No response

@tomlau10
Copy link
Contributor

This seems due to how luadoc is parsed in script/parser/luadoc.lua 🤔

The token types

`T` is a code token, while MainContainer is a name token

  • from the lpeg rules here:
    name = (m.R('az', 'AZ', '09', '\x80\xff') + m.S('_')) * (m.R('az', 'AZ', '__', '09', '\x80\xff') + m.S('_.*-'))^0,
  • this largely translates to \w[\w.*-]*
    which basically means a Name can only start with \w, it CANNOT start with a . / * / -
    while the . / * / - can repeat themselves, so you can have some weird class names

The parse code logic

when it parse a generic pattern type, it has 2 logics

function parseTypeUnit(parent)
local result = parseFunction(parent)
or parseTable(parent)
or parseTuple(parent)
or parseString(parent)
or parseCode(parent)
or parseInteger(parent)
or parseBoolean(parent)
or parseParen(parent)
or parseCodePattern(parent)

  • parseCode: If the type starts with a code token, then only a single code token is allowed, nothing after it will be parsed as type
    => this explains why `T`.Base doesn't work

    local function parseCode(parent)
    local tp, content = peekToken()
    if not tp or tp ~= 'code' then
    return nil
    end
    nextToken()
    local code = {
    type = 'doc.type.code',
    start = getStart(),
    finish = getFinish(),
    parent = parent,
    [1] = content,
    }
    return code
    end

  • parseCodePattern: must start with a name token first, in the middle should be a code token, and after it can be name token again

    local function parseCodePattern(parent)
    local tp, pattern = peekToken()
    if not tp or tp ~= 'name' then
    return nil
    end
    local codeOffset
    local finishOffset
    local content
    for i = 2, 8 do
    local next, nextContent = peekToken(i)
    if not next or TokenFinishs[Ci+i-1] + 1 ~= TokenStarts[Ci+i] then
    if codeOffset then
    finishOffset = i
    break
    end
    ---不连续的name,无效的
    return nil
    end
    if next == 'code' then
    if codeOffset and content ~= nextContent then
    -- 暂时不支持多generic
    return nil
    end
    codeOffset = i
    pattern = pattern .. "%s"
    content = nextContent
    elseif next ~= 'name' then
    return nil
    else
    pattern = pattern .. nextContent
    end
    end
    local start = getStart()
    for _ = 2, finishOffset do
    nextToken()
    end
    local code = {
    type = 'doc.type.code',
    start = start,
    finish = getFinish(),
    parent = parent,
    pattern = pattern,
    [1] = content,
    }
    return code
    end

    • BUT‼️since a name can only starts with \w, therefore currently you cannot have Base.`T`.Base, yet Base.`T`Base will work

An attempt to fix

I just want to use pointer Class*

I tried to make this work by changing 2 places:

  1. Change the name rule to allow starting with a *
  2. Merge parseCode and parseCodePattern such that name token after a code can be parsed
diff --git forkSrcPrefix/script/parser/luadoc.lua forkDstPrefix/script/parser/luadoc.lua
index d108cebc26c64fb8506d525cce8ffcca3e085e1e..f12e26b4bdb485568c7265553e024dc313917029 100644
--- forkSrcPrefix/script/parser/luadoc.lua
+++ forkDstPrefix/script/parser/luadoc.lua
@@ -71,7 +71,7 @@ Symbol              <-  ({} {
     er = '\r',
     et = '\t',
     ev = '\v',
-    name = (m.R('az', 'AZ', '09', '\x80\xff') + m.S('_')) * (m.R('az', 'AZ', '__', '09', '\x80\xff') + m.S('_.*-'))^0,
+    name = (m.R('az', 'AZ', '09', '\x80\xff') + m.S('_*')) * (m.R('az', 'AZ', '__', '09', '\x80\xff') + m.S('_.*-'))^0,
     Char10 = function (char)
         ---@type integer?
         char = tonumber(char)
@@ -738,12 +738,17 @@ end
 
 local function parseCodePattern(parent)
     local tp, pattern = peekToken()
-    if not tp or tp ~= 'name' then
+    if not tp or (tp ~= 'name' and tp ~= 'code') then
         return nil
     end
     local codeOffset
     local finishOffset
     local content
+    if tp == 'code' then
+        codeOffset = 1
+        content = pattern
+        pattern = "%s"
+    end
     for i = 2, 8 do
         local next, nextContent = peekToken(i)
         if not next or TokenFinishs[Ci+i-1] + 1 ~= TokenStarts[Ci+i] then
@@ -834,7 +839,7 @@ function parseTypeUnit(parent)
                 or parseTable(parent)
                 or parseTuple(parent)
                 or parseString(parent)
-                or parseCode(parent)
                 or parseInteger(parent)
                 or parseBoolean(parent)
                 or parseParen(parent)
  • So far this allow the following use case:
---@generic T
---@param a `T`*
---@return T
function ToPtrClass(a) end

local a = ToPtrClass('A') --> a: A*

But still it doesn't solve all the cases that you reported.
And I don't know if this would cause other side effects or not, since a name (identifier) generally should not be allowed to start with a *.
Still might be you or others would like to pick it up from here 😄 or even better ask opinions from maintainers first

TIMONz1535 added a commit to TIMONz1535/lua-language-server that referenced this issue Dec 29, 2024
…Pattern to support "`T`.*-", prevent crash with "`T``T`" and when tokens >= 8, fix wrong getStart of result.
TIMONz1535 added a commit to TIMONz1535/lua-language-server that referenced this issue Dec 29, 2024
…-" without name token before code. Added tests.
TIMONz1535 added a commit to TIMONz1535/lua-language-server that referenced this issue Dec 29, 2024
@TIMONz1535
Copy link
Contributor Author

TIMONz1535 commented Dec 29, 2024

I fixed it without messing up the name parsing. But I found a few edge cases

---@class MehClass-Sub
---@class MehClass..Sub
---@class MehClass...Sub
---@class MehClass--Sub

---@param t `T`-Sub -- ok
---@param t `T`..Sub -- ok
---@param t `T`...Sub -- doesn't work because `...` is an individual symbol and I don't want to support it.
---@param t `T`--Sub -- doesn't work because `--` becomes a comment!

I also found an error with luadoc Parser - the token 2.0 is considered as a name token, but the name should only start with a letter or an underscore.

a.-1 (name)
_0w* (name)
0 (integer)
2.0 (should be `integer symbol integer` but its `name`)
2_0 (integer name)
.*-2.0 (should be `symbol symbol integer symbol integer` but its `symbol symbol symbol name`)
.*-2_0 (symbol symbol integer name)

upd. Well, I see that the @version directive uses this behavior. It gets a single token name. That is, it could be some other single token, a new type number for example, but not an integer symbol integer.

            if tp ~= 'name' then
                pushWarning {
                    type  = 'LUADOC_MISS_VERSION',
                    start  = getStart(),
                    finish = getFinish(),
                }
                break
            end
            version.version = tonumber(text) or text

TIMONz1535 added a commit to TIMONz1535/lua-language-server that referenced this issue Jan 6, 2025
…y and comment without a space. Fixed regression.
TIMONz1535 added a commit to TIMONz1535/lua-language-server that referenced this issue Jan 6, 2025
…y and comment without a space. Fixed regression.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants