Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roots asset graph #149

Closed
jescalan opened this issue Jun 6, 2013 · 42 comments
Closed

roots asset graph #149

jescalan opened this issue Jun 6, 2013 · 42 comments
Labels

Comments

@jescalan
Copy link
Owner

jescalan commented Jun 6, 2013

The next big addition to roots will be a custom asset graph -- that is, a way for roots to be aware of what files reference what other files. This will allow for the compilation speed to increase by orders of magnitude, as roots will know only what needs to be compiled at any given time.

This is a big project getting this built correctly and integrated, but it will be happening over the next few months I would assume.

@samccone
Copy link
Collaborator

So i have been giving this tons of thought lately.

I think I want to start out with a CSS asset graph. I think whati will do is take some CSS generate a AST https://github.com/NV/CSSOM

then spit back an array of files that this file includes / uses.

I think we can do this all in chunks... thoughts?

@jescalan
Copy link
Owner Author

So I'm entirely winging it here, but I started messing around with some of this stuff this weekend and came to the conclusion that actually parsing each language to generate an AST wouldn't actually be that helpful. For example, in stylus the biggest pull is the @import statements, in jade I need to get layouts and partials. Almost all of the things I need most are not built directly into the language and are actually features of the preprocessor, which is why asset-graph, although brilliant, is of little use to us.

The way I started doing it was with regexes. I'm fully planning on putting up the code I have early early so I can get you guys' input and help with it, but I barely have anything at all now. It's essentially a structure where each language has a folder, and in it, files. The files are named by the asset they detect, and export a regex source which can be used to scan the source code for that particular asset, and the match will be the asset path.

I know it's quite basic, and it makes me a tad bit nervous not to be running a full parse, but I just don't feel like it's necessary to tokenize the entire language just to pull out a very small number of things we're relying on.

I also definitely want to make it super extensible, so that adding languages is just a matter of creating a new folder with matchers and writing tests.

@notslang
Copy link
Collaborator

Why do we need to find the Stylus's @imports or Jade's partials? Those would get inlined into the CSS/HTML and we wouldn't need to compile them anyway. All we need to do is find out what files the compiled HTML/CSS/JS needs. So this is what we do:

  1. Compile the views that aren't partials (everything in ./views that doesn't start with a _)
  2. Search the HTML that comes out for the URLs of any CSS stylesheets that it relies on.
  3. Use those URLs to find the files that are supposed to compile into said CSS stylesheets.

example:
if we find /css/main.css, we then see if /assets/css/main.css exists
if it does, we copy it over to ./public/
if not search for /css/main.styl, if that exists then we compile it into ./public/
if a .styl file isn't there, we search for .less and so on for everything else that can compile to css
if we just can't find anything, we throw a Can't Find Ur Dang Source exception

...

if we find something like www.mycdn.com/script.js then that's not our problem. But, if we wanna be really fancy, we could download this file to a tmp folder, then rewrite the URL to point to it to speed up downloads in development.

  1. take those compiled CSS style sheets and search through them for anything that the CSS expects to @import.
  2. everything we found in step 4 gets sent to step 3
  3. go back to step 2 and repeat for JS

...what this doesn't work for: dynamically generated requirements ... like if a JS lib adds a style tag after the page load. personally I think that's stupid, but to support it, we could either have people explicitly declare extra files that need to get compiled, or switch to a different way of searching for dependencies...

"like what" you ask? well, what if we connect to the HTTP server and when a request is about to turn into a 404, we take the URL that's requested and check if there's a file we could compile to make it not 404. this has some disadvantages... like:

  • you need to open up your webapp and do everything you need to make all the dependencies get requested... maybe we could automate this with PhantomJS or something, but it sounds like a pain.
  • it would prolly be pretty slow, but maybe we could setup a cache of compiled files
  • it just sounds complicated
  • we can't do cool URL rewriting like i mentioned above

PS: if we do the 1st approach, we could prolly jack some of the functionality from a webbrowser, cause, if you think about it, that's exactly what browsers do when they get a html page (minus the compiling part)

@samccone
Copy link
Collaborator

hmmm however i see what jeff is saying

in a main styl file you could do @import foo

thus main now depends on foo so when foo is updated we now have to update main

and so on and so on

I think AST's are the way to go,

so for stylus ... round 1 i think just knowing about imports would be HUGE
https://github.com/samccone/stylus/blob/master/lib/visitor/evaluator.js#L651

so picture this

connectedNodes = graph(filepath)

would return something like

[{
   "myfile.styl": [
     { "mysubfile.styl": [] }
   ]
}, ...]

that would could then map the relationship from

@notslang
Copy link
Collaborator

Ugh... 🤦 now I remember why we needed that.

Here's a rather random idea: what if we subclassed the instance of fs that the compiler uses, so that we could record all the files that the compiler reads from. Then we take that list, remove everything that isn't in the project folder, and just watch those files... not sure how to do this off the top of my head, but it seems like a pretty simple solution, right?

@samccone
Copy link
Collaborator

ha.. that is one way... except we would have to fork the compiler or duck punch it 🐐

@notslang
Copy link
Collaborator

nah, we wouldn't fork the compiler... that would be horrible if we want to support all the crazy preprocessors people use. we could just make sure that when the compiler does require('fs') that resolves to our version of fs, that has a tiny logging function added into the read functions.... maybe that's what you mean by "duck punch"? ... idk

@samccone
Copy link
Collaborator

yeah that is what i mean :) http://en.wikipedia.org/wiki/Monkey_patch

@notslang
Copy link
Collaborator

ah... btw, i think i found a module that does what i described: https://npmjs.org/package/inj

@notslang
Copy link
Collaborator

oh, and as for the data-structure, we might want something like this to make searching easier:

asset_tree = {
    "./assets/css/_partial.styl": [
        "./assets/css/main.styl",
        "./assets/css/home.styl"
    ],
    "./assets/js/derp.js": [
        "./assets/js/main.coffee"
    ],
    "./assets/js/herp.coffee": [
        "./assets/js/main.coffee"
    ],
}

so that means that when ./assets/js/main.coffee was compiled, it needed to read ./assets/js/herp.coffee & ./assets/js/derp.js. So whenever a file is modified, we just get asset_tree["name of the file we modified"] and iterate over all the files that rely on it, compiling each one.

@L8D
Copy link
Contributor

L8D commented Jun 18, 2013

Though @slang800, I don't think that you'd need to recompile main.coffee or any files really if you're modifying a javascript file. But that is irrelevant your example.

@notslang
Copy link
Collaborator

No, if we are going with CJS then ./assets/js/derp.js being read while compiling main.coffee would have happened because main.coffee did require("derp.js")... and since we'd be concatenating all the deps, we'd need to recompile main.coffee so the updated derp.js is in it.

@jescalan
Copy link
Owner Author

Woah missed a lot here. So I guess my last comment didn't really ring true to anyone, but the following discussion is exactly why I though it would be easiest to build it that way.

I don't want to have to subclass node core modules or fork a compiler and figure out its inner workings every time we need to add support for language to the asset graph. I want a single interface we can use to parse any language, and what that boils down to is regexes (unless we want to build a custom parser for each, which seems like ridiculous overkill).

It's important not to think about solving this problem for one particular language, but rather for any language we might encounter or that might be used with roots.

@jescalan
Copy link
Owner Author

As for the structure of the AST output, I'm not 100% on this, but good to see you guys' suggestions. If I had to take a stab at it, this is what I'd go for:

{
  root: "/Users/jeff/Desktop/whatever",
  files: {
    'views/index.jade': {
      deps: ['views/_partial.jade' ]
    }
  }
}

Pretty similar to what you guys have proposed, but added a couple more properties just to make what's being accessed clear and to make it a little more extensible if we need. Also added a root path and reference everything relative by default, because it would be easy to build the full path if needed this way. Thoughts?

@notslang
Copy link
Collaborator

We should be able to get the root by just reading the cd, so I don't think that needs to be in it. And why does 'views/index.jade' need to be an object? If the only key is deps then it's kinda pointless to add another level to the object.

@jescalan
Copy link
Owner Author

Both of the things you commented on I added for clarity. I think it's a great idea to know, for any given AST which references all its files with relative paths, what root it was generated from. I do understand that in roots' particular situation, you almost always just read process.cwd() and get the same result, but I feel like it's good practice to make sure everything in the data structure is clear.

The reason I had deps in there was about the same. It makes it extensible if we ever do need to add anything else, without breaking everything, and makes both the data structure and the calls to it clear on what you are getting.

@notslang
Copy link
Collaborator

It is good to know the root that it was generated from, but does that really belong in the same object? Or better yet, could we just make all the file paths explicit?

As for making it more extensible - since this is just a mapping between files and dependents, whatever else we might add to it probably belongs in a different object... generally, keeping objects flat is better than making them deeply nested.

...but I'm probably getting too specific on how this should be implemented... I'll try to build this into the transformers branch and see what it looks like.

@jescalan
Copy link
Owner Author

In my mind, including the root once is far better than repeating it for every single file path. Keeping things DRY : )

For the deps piece, I could be convinced otherwise, but I feel like the extra clarity and potential extensibility of using an object there holds merit. I don't have an answer at the moment for what else we might add or I would have said it earlier, but this project is still in very early stages, so I can't say I'd be shocked if we didn't 100% on the requirements.

@samccone
Copy link
Collaborator

alllllright

so a very very simple pass at an import only asset graph for stylus

after messing with a few ways to do it I have decided that monkey patching a function in stylus and ripping what i want out of it is going to be the cleanest way. the other way being to path the FS or have a ton of ugly regexes.

So lets talk about why I think this way is idea;

First I am actually not even compiling the stylus. I am only generating the tree and evaluating it. not compiling. So that means this is really fast. I just think this is going to be the way to go for each type language we are going to support.

thoughts are welcome.

https://github.com/samccone/stylus-graph

@jescalan
Copy link
Owner Author

I'm still on the regex train over here. I'll post my take on it soon once I get a chance to clean up my initial implementation. What you started with there is sweet though for sure

@notslang
Copy link
Collaborator

I think there are 2 components to the asset graph.

  1. parsing the compiled HTML & CSS to find the files that they rely on (this we should use regex or a AST for)
  2. finding what files the files written in higher-level languages rely on... for this one, we cannot just monkey patch every compiler (there are way too many languages for that)... we need some way that works with every compiler (or pretty close to that).

@zspecza
Copy link
Contributor

zspecza commented Jun 21, 2013

From Stylus' package.json, it lists CSSOM as a dependency. It might help for checking the syntax tree with CSSOM for assets without having to parse the compiled CSS directly.

For example, using it on this stylesheet:

@import url('reset.css');

.wrapper p {
  background: url(../img/image.png);
}

@import url('shame.css');

Produces this tree:

{
    parentStyleSheet: null,
    cssRules: [
        {
            parentRule: null,
            parentStyleSheet: "../..",
            href: "reset.css",
            media: {
                length: 0
            },
            styleSheet: {
                parentStyleSheet: "../../..",
                cssRules: []
            }
        },
        {
            parentRule: null,
            parentStyleSheet: "../..",
            selectorText: ".wrapper p",
            style: {
                0: "background",
                length: 1,
                parentRule: "..",
                _importants: {
                    background: ""
                },
                __starts: 38,
                background: "url(../img/image.png)"
            },
            __starts: 27,
            __ends: 78
        },
        {
            parentRule: null,
            parentStyleSheet: "../..",
            href: "shame.css",
            media: {
                length: 0
            },
            styleSheet: {
                parentStyleSheet: "../../..",
                cssRules: []
            }
        }
    ]
}

@jescalan
Copy link
Owner Author

@slang800 agreed with point 2 that you made -- that's exactly why I think regexes are the way to go. Regex works with every compiler - we just parse the file contents. Also, I feel like the detection for jade and stylus will be super easy. @import and \ninclude are pretty simple to pull out with a simple regex.

For point 1, if we are going to parse the compiled html and css, we might be able to use https://github.com/One-com/assetgraph for this...

@samccone
Copy link
Collaborator

hmmmmmmmm maybe maybe maybe I will try both :)

@notslang
Copy link
Collaborator

err... no for point 2 we can't use regexes either because we would need to make a new regex for every single language. Also, parsing a templating language that may contain ANY content with a regex is a horrible idea... imagine if index.jade has a stylus code example in it... it would be far too easy to break. That's why I suggested extending fs.

@samccone
Copy link
Collaborator

@slang800 fs will work for imports only, it will not work for images / fonts / ect ect

@notslang
Copy link
Collaborator

no, that stuff would show up in the compiled CSS/HTML (point 1)

@jescalan
Copy link
Owner Author

For your point @slang800 it would detect the language of the file, and parse it for that language's specific grammar, defined by a regex for each construct. So if you had the text @import in your jade, it would do nothing.

@notslang
Copy link
Collaborator

if @import was in a :stylus block in your jade it would need to detect that.

@jescalan
Copy link
Owner Author

Then we would need a real good regex. That's a super edge case though...

@notslang
Copy link
Collaborator

proof of concept for duck punching the compilers... all of them (in like 50 SLOC, tops 😄):
image

Some of these show incorrect filehelpers for dep... like the line that says: /usrdata/web/roots/test/basic/assets/css/example.styl /usrdata/web/roots/test/basic/views/index.jade. This doesn't mean that index.jade depends on example.style... it's just because "fh" didn't get cleared out after index.jade was finished compiling... after I reorganize the way we keep track of files, this will be fixed.

Also, at the bottom you'll see that main.js is written 3 times... this is a related issue, since I kinda just pulled fh into the outermost scope for the demo.

But it does show that we can use fs to see what files get read by the compilers, without knowing anything about the compilers! ... making it pretty much language agnostic.

@jescalan
Copy link
Owner Author

jescalan commented Oct 1, 2013

Hey @slang800 - I know it's been a while on this, but do you still think we could do through by patching fs? If it was possible, it would be a nice way to do it. Have you worked on this prototype anymore?

@samccone
Copy link
Collaborator

samccone commented Oct 1, 2013

let me say again that this will only give us a graph of the entire project, not a per file graph

@jescalan
Copy link
Owner Author

jescalan commented Oct 1, 2013

Ah you're right. That is not a feasible way to do it then, is it. I really want to return to working on the asset graph, but am in the middle of some large refactors to the core at the moment. But will be more active on this soon!

@samccone
Copy link
Collaborator

samccone commented Oct 1, 2013

https://github.com/samccone/stylus-graph
https://github.com/samccone/jade-graph

I still think this direction is really the best way

@notslang
Copy link
Collaborator

notslang commented Oct 2, 2013

no... this will still show us all the dependents of each file... just like running the file through those language specific parsers would. And no, I haven't worked on it beyond a functional prototype because adding it with the way roots is currently organized would make a mess, so I'm working on restructuring that first.

@samccone
Copy link
Collaborator

samccone commented Oct 2, 2013

indeed but your solution would require a true compile on each file, where mine acts as more of a static analysis tool.

@notslang
Copy link
Collaborator

notslang commented Oct 2, 2013

Yeah... that's the point: it uses compilation which is not only something that we already do (meaning we wouldn't need to run a separate static analysis tool) but it is something that is already written for every language we use (and many we don't, but could)... So we also don't need to write static analysis tools for each and every language we want to support

@jescalan
Copy link
Owner Author

jescalan commented Oct 2, 2013

Hm, if sean's solution does give us dependents per file, it is a valid point that not having to write an additional static analysis tool for every language we support would be a large benefit.

@notslang
Copy link
Collaborator

An update on the progress of this:

I started poking around the AssetGraph codebase again and I pretty much understand it. The way it's organized is really nice, but since it's written in JS the extensive use of classes and extenders really clutters up the code. So, after making some fixes and PRing them to the main project, I started converting the major components in coffeescript so we can use a normal class syntax (@hhsnopek is helping too). Right now, that is over here: https://github.com/slang800/assetgraph/tree/coffee-convert-assets

Once that conversion is finished, I'll start integrating AssetGraph into roots. I think I'll be building this on the current master, since v3 is far from finished and still has a broken history.

@jescalan
Copy link
Owner Author

Woah, this is great. And yeah that sounds perfect - thanks @slang800 for the hard work and excited to see this starting to come together!

@jescalan
Copy link
Owner Author

Closing in favor of #586 for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants