Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs): asm functions #1061

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

feat(docs): asm functions #1061

wants to merge 13 commits into from

Conversation

novusnota
Copy link
Member

Rewrote the method ID collisions section to remove all logical jumps and make it much more streamlined :)

Also adjusted the structure a little towards the upcoming PR revamping this page. I'll push the draft of it right after we deal with asm functions here.

P.S.: I actually call argument to return position mappings "arrangements" and not "shuffle" as in grammar.ohm, because the latter kinda implies randomness, while those are actually deterministic. Hence, "asm arrangments".

Issue

Also, resolved two teeny tiny issues from tact-docs — virtually 3-5 lines of fixes for each, no need in a separate CHANGELOG entry:

Checklist

  • I have updated CHANGELOG.md
  • I have run the linter, formatter and spellchecker
  • I did not do unrelated and/or undiscussed refactorings

@novusnota novusnota added this to the v1.6.0 milestone Nov 19, 2024
@novusnota novusnota requested a review from a team as a code owner November 19, 2024 02:22
@novusnota novusnota changed the title feat(docs): asm-functions feat(docs): asm functions Nov 19, 2024
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/import.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
@anton-trunov anton-trunov self-assigned this Nov 20, 2024
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
docs/src/content/docs/book/functions.mdx Outdated Show resolved Hide resolved
@jeshecdom
Copy link
Contributor

I agree that this is confusing, and I think that the cause is that we are mixing two mental models: a stack (low-level) and tuples (high-level).

We should pick only one and stick with it the entire explanation. It seems that describing everything in terms of tuples is more intuitive, but then we should not mention the stack (or mention how tuples are pushed and popped from a stack in a separate subsection, and only in that section mention the stack).

So, for example, I would start the examples saying something about the TVM instructions, something like this:

"
Even though TVM instructions work with a stack, TVM instructions can be seen, intuitively, as maps from tuples to tuples. To see how TVM instructions map tuples to tuples in the TVM stack, see [link: here]. Thinking in terms of tuples makes the explanation of asm functions much clearer, but for those who want to see an explanation using the stack directly, see [link: here].
"

Then, explaining the meaning of a declaration like:

asm(len self -> 1 0) fun testFun(self: Slice, len: Int): Result { TVM_INSTRUCTION }

struct Result {
   res1: Int;
   res2: Bool;
}

amounts to saying simply:

"
testFun passes the argument tuple (len, self) to instruction TVM_INSTRUCTION. Suppose (r0, r1) is the tuple result of TVM_INSTRUCTION, then testFun reorders the result according to the map -> 1 0, i.e., the 1-th index element (r1) has now index 0, and the 0-th index element (r0) has now index 1, producing the tuple (r1, r0). Finally, testFun assigns the tuple (r1, r0) into the Result struct one field at a time, producing Result {res1: r1, res2: r0}.
"

@novusnota
Copy link
Member Author

@jeshecdom interesting note. The cases with multiple instructions should be covered too. And tuples on TON are denoted with square brackets [], so it's better to use those. Also, it's best for readability to remove parentheses as much as possible, so things like "and the 0-th index element (r0)" will be "and the 0-th index element r0" — no need to add indirection and visual pauses with parens :)

@anton-trunov wdyt about #1061 (comment)?

@anton-trunov
Copy link
Member

TVM instructions can be seen, intuitively, as maps from tuples to tuples.

this is incorrect in a very specific technical sense: a tuple is a TVM data structure that occupy precisely one TVM stack position but can contain multiple other TVM primitives, including tuples

the term you probably intended to use is tensor

@anton-trunov
Copy link
Member

testFun passes the argument tuple (len, self)

I find it confusing

@anton-trunov
Copy link
Member

@novusnota just adapt the corresponding calling convention description from tvm.pdf

@anton-trunov
Copy link
Member

@novusnota you also need to check how structures that are returned from a function are actually encoded

@anton-trunov
Copy link
Member

To fully finish this section, #910 needs to be resolved too

@jeshecdom
Copy link
Contributor

TVM instructions can be seen, intuitively, as maps from tuples to tuples.

this is incorrect in a very specific technical sense: a tuple is a TVM data structure that occupy precisely one TVM stack position but can contain multiple other TVM primitives, including tuples

the term you probably intended to use is tensor

I meant mathematical tuple, but now I see that this would introduce much more confusion because of the technical terms in TVM. So, the explanation should stick with the stack and use the technical terms in TVM.

@jeshecdom
Copy link
Contributor

@jeshecdom interesting note. The cases with multiple instructions should be covered too. And tuples on TON are denoted with square brackets [], so it's better to use those. Also, it's best for readability to remove parentheses as much as possible, so things like "and the 0-th index element (r0)" will be "and the 0-th index element r0" — no need to add indirection and visual pauses with parens :)

@anton-trunov wdyt about #1061 (comment)?

Yeah, we should stick with the stack explanation and use the technical terms in TVM, because I see everyone is confused now :). But the way, a question:

In a function like this:

asm fun testFun(a: Int, b: Int): Result { 
 INS_1
 INS_2 
 .....
 INS_n   // Let us suppose that after INS_n finishes, 
         // there are 5 results in the stack
}

Does Tact know that after executing those instructions, there will be exactly 5 results in the stack?
What happens if struct Result has more than 5 fields? Will Tact pop more than 5 elements from the stack until it fills the struct, having as consequence the popping of elements that are not part of the intended result?

@anton-trunov
Copy link
Member

Does Tact know that after executing those instructions, there will be exactly 5 results in the stack?

not in the current implementation

Will Tact pop more than 5 elements from the stack until it fills the struct

nope (the consequence of the previous answer)

This should be documented, of course, but an even more important question is "are returned structs actually represented as tensors (multiple TVM values)?"

@anton-trunov
Copy link
Member

anton-trunov commented Nov 21, 2024

and, of course, the symmetrical question for input function parameters (including passing structs)

@anton-trunov
Copy link
Member

in any case, each such point should be accompanied by a concrete example of an asm-function

@novusnota
Copy link
Member Author

novusnota commented Nov 21, 2024

just adapt the corresponding calling convention description from tvm.pdf

Sure, the -> 0 1 is much better explained in terms of s0..s255 stack registers.

Does Tact know that after executing those instructions, there will be exactly 5 results in the stack?

Nope, at the moment it's all handled by FunC, which it turn just passes it to Fift, which does all the work. Neither Tact nor FunC check anything until it's too late and user hits exit code 5, 7, or whatever else.

Also, there could be more things in the stack, only the topmost 5 are of interest if we know that after all the instructions we need 5 values.

This should be documented, of course, but an even more important question is "are returned structs actually represented as tensors (multiple TVM values)?"
and, of course, the symmetrical question for input function parameters (including passing structs)
in any case, each such point should be accompanied by a concrete example of an asm-function

👍

@anton-trunov
Copy link
Member

@jeshecdom when we have the grammar and AST for our embedded assembly language then we will be able to typecheck asm-functions and warn the user their stack discipline makes sense

@jeshecdom
Copy link
Contributor

just adapt the corresponding calling convention description from tvm.pdf

Sure, the -> 0 1 is much better explained in terms of s0..s255 stack registers.

We could do the following with specific examples (previously, we should have explained how function arguments are pushed into the stack):

  • First, explain how the result tensor is popped from the stack. Suppose after popping, you get some tensor (r0, r1, ..., rn).
  • Explain how the notation -> n m l is just a re-arrangement of the result tensor.
  • Explain how the re-arranged tensor is mapped into the result type of the function (this includes explaining how the tensor is mapped into a struct or a primitive type, or whatever).

@novusnota
Copy link
Member Author

novusnota commented Nov 21, 2024

Eh, Structs are represented as tensors (...), but that's the doing of Tact+FunC. If we were to target TVM directly, we would've dealt with stack entries ourselves, all without tensors.

So I'm a bit hesitant on explaining things in tensors, and instead let's just properly describe stack and stack registers.

@jeshecdom
Copy link
Contributor

Eh, Structs are represented as tensors (...), but that's the doing of Tact+FunC. If we were to target TVM directly, we would've dealt with stack entries ourselves, all without tensors.

So I'm a bit hesitant on explaining things in tensors, and instead let's just properly describe stack and stack registers.

Sounds good.

@anton-trunov
Copy link
Member

So I'm a bit hesitant on explaining things in tensors, and instead let's just properly describe stack and stack registers.

sounds good to me too, since tvm.pdf does not even mention tensors, looks like it's a term coined by the FunC community

@novusnota
Copy link
Member Author

  1. Added description of the current Tact-flavored assembly from WIP feat: new asm parser #1064
  2. Described everything from the stack point of view, including its "registers"
  3. Refined the overall top-to-bottom reading flow


:::

### Stack calling conventions {#asm-calling}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack calling conventions section reads really nice. Just a couple of questions. What happens if one of the asm function arguments is a struct? What about an argument which is a struct with nested structs? And what happens if the return type is a struct with nested structs? like in this declaration:

struct A {
  a1: Int;
  a2: Int;
}

struct B {
  b1: Int;
  b2: A;
}

asm fun test(s: B, ...): B 
{ ....... }

// while `self` will be pushed last and get on top of the stack
asm(c self) extends fun asmStoreDict(self: Builder, c: Cell?): Builder { STDICT }

// Changing the order of return values of LDVARUINT16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still not clear what the notation -> 1 0 means regarding what happens to the results of LDVARUINT16 in the stack itself. The explanation states that 1 represents the value of stack register 1, etc. but it does not explain the significance of writing them in the order -> 1 0. Probably what needs to be said is that the notation -> 1 0 describes how the contents of the stack will be rearranged, when reading -> 1 0 left-to-right: the contents of register s1 will be placed at the top of the stack, and the contents of register s0 will be placed second-to-top.

One alternative way of explaining could be in terms of removing from the stack: -> 1 0 means that s1 is removed first, followed by s0. Hence, the function returns the Builder in s0 because it was the stack content removed last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I am having second thoughts on using "removing" because it becomes confusing with what happens with the rest of the stack. For example, suppose that after executing some asm function with declaration -> 2 1 0, we have the 5 element stack (top is leftmost):

a b c d e

Then, -> 0 1 2 means "remove s0, then s1, then s2", so that the stack after removing s0 is:

b c d e

But then, s0 contains now b, when previously b was in s1.

So, probably a better word instead of "removing" would be "read from":

-> 1 0 means that s1 is read from the stack first, followed by s0. Hence, the function returns the Builder in s0 because it was the stack content read last.

Copy link
Member Author

@novusnota novusnota Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is, as I've just checked in tests, -> 0 1 2 is not about taking or not taking any results, but merely about positioning items for the whatever result type we've specified. Like, if the return type is Int, one can only specify -> 0 and nothing else, even though -> 0 in this case is the same as not writing anything at all. And when the Structs, long Structs (more than 15 entries) or even nested Structs are involved, this is getting complicated.

Thus, my description of s0 matching 0, s1 matching 1 is actually incorrect and has to be rewritten. And I've got to check the cases with long or nested Structs here as well, same as for the "stack calling conventions" bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, this declaration is incorrect (because it returns only one element):

asm(self len -> 1 0) extends fun asmLoadInt(self: Slice, len: Int): Slice { LDIX }

but this is correct:

asm(self len) extends fun asmLoadInt(self: Slice, len: Int): Int { LDIX }

even though it will discard the Slice result and keep only the Int. Or is this last one also incorrect?

Mmmm.... very confusing indeed. So, when using the notation -> m n p it is not possible to discard values in the result type. I think this is acceptable. It is better to explicitly state all the results than to rely on understanding implicit discards.

Copy link
Member Author

@novusnota novusnota Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First one is incorrect. Second one could've been correct if we had our own backend or if we'd alter FunC generation, but since I tested that it's also incorrect — nothing can be discarded in result type.

It worked for me in previous tests mainly because FunC doesn't perform any checks, and because all asm function bodies are embedded in Fift code.

I had some DROP instructions very deep later on in other asm functions, which unexpectedly (for me) cleared the stack for this one. And I noticed that a little too late.

In the end, this really proves the point of those cautionary paragraphs at the top of the assembly functions description. This stuff is really messy, intertwined and hard to debug (until our own backend for it, of course). But I'll persevere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and thank you for your effort!

So, let's adapt the explanation so that no discards happen in the result type.

Now, regarding nested structs, structs in arguments and structs with more than 15 fields, if you think that the explanation would become too complex to fit it in the page or that the explanation would become so convoluted because of those exceptional cases, probably it would be better to explain those in a separate page, with a link to that page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants