Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat module umd globals #6717

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

MohamedLamineAllal
Copy link
Contributor

@MohamedLamineAllal MohamedLamineAllal commented Dec 26, 2022

Description

Issue: #6697

Implement umd globals mapping features and extra related fixes.

Following as best, the babel plugin @babel/plugin-transform-modules-umd

Most things were implemented alike. And there are some important differences and the motivation is explained as well.

features and changes

Reducing function

Extracted the reducing logic in global_name() method to a function.

And added js extension no inclusion in reduction and handling dot imports.

More details are in the next sections.

// config.rs
fn reduce_src(src: &JsWord) -> JsWord {
    let mut out = src.clone();

    let js_ext_regex = CachedRegex::new(r#"\.([cm]?[jt]s|[jt]sx)$"#).unwrap();

    if out.contains('/') {
        out = out.split('/').last().unwrap().into();
    }

    out = JsWord::from(js_ext_regex.replace(&out, "")); // <----- no js extension inclusion

    out = out.to_camel_case().into();
    if (out).eq("") { // <----- handling dots imports
        out = JsWord::from("_");
    }
    out
}

handling dot imports and reducing to _

"." or ".." or ../.. ... dependencies => global[""] --| expect instead |--> global._ as babel do.

Change I made:

// config.rs
fn reduce_src(src: &JsWord) -> JsWord {
    // ...
    // ...
    out = out.to_camel_case().into();
    if (out).eq("") { // <----- handling dots imports
        out = JsWord::from("_"); //<___/
    }
    out
}

And outputting _ as a member and not a computed member ["_"]

member vs computed_member => babel does set it as a member

Change I made:

// umd.rs
if is_valid_prop_ident(&dep_name) || dep_name.eq("_") {
                           // -------------------/^

I updated one existing test after that:

// crates/swc_ecma_transforms_module/tests/fixture/common/issue-4456/1/output.umd.ts (768dec8)
factory(global.input = {}, global[""],
// ==to===>
factory(global.input = {}, global._

Js extensions no inclusion in reduction

change made:

fn reduce_src(src: &JsWord) -> JsWord {
    let mut out = src.clone();

    let js_ext_regex = CachedRegex::new(r#"\.([cm]?[jt]s|[jt]sx)$"#).unwrap();
    // ...             --------/^
    // ...
    out = JsWord::from(js_ext_regex.replace(&out, "")); // <----- no js extension inclusion
    // ...
}

Before:

factory(global.input = {}, global.rootJs, global.stubFalseJs);
                     //           ___/^          ________/^

after:

factory(global.input = {}, global.root, global.stubFalse);
                    //            __/^         _______/^

I followed how babel does it in no exact mode:

import worldLevel from './world/level/index.js';
import another from './world/level2/index.js';
// ==========>
factory(mod.exports, global.index, global.index);
global.input = mod.exports;

image

Babel does leave it with Js extension in exact mode. But i didn't follow it in that mode. For the reason listed in the motivation.

Motivation:

For mapping the lib output you do the following:

Think of the file name

index.js => global.index = {}
input.js => global.input = {}

When we include imports:

factory(global.input = {}, global.index)
                    //    ---------/^ import

The import is something that is supposed to be added already to global. And compiled by the same process. So the first step of lib module exports go with filename reducing to filename stem. It's without extension. And it should be preserved as well for imports reduction. So that the imports dependencies map to the real added globals values.

Ex:

input.js => global.input = {} // imports from "./lib1.js", from "./lib2.js"
lib1.js => global.lib1 = {}
lib2.js => global.lib2 = {}

if ./lib1.js is reduced to global.lib1Js then it wouldn't work, because lib1Js is never added to global. Only global.lib1 is. (lib1.umd.js is included before input.umd.js). And that's the motivation.

And that is what babel does. And the change reflects that.

Updated tests

For this change I updated two existing tests:

// crates/swc_ecma_transforms_module/tests/fixture/common/issue-962/output.umd.js (768dec8)
factory(global.input = {}, global.depJs);
                                   //^
// =========>
factory(global.input = {}, global.dep);
                                  //^
// crates/swc_ecma_transforms_module/tests/fixture/common/issue-578/2/output.umd.js (768dec8)
factory(global.input = {}, global.rootJs, global.stubFalseJs);
// =========>                       //^                 //^
factory(global.input = {}, global.root, global.stubFalse);
                                   //^               //^

Globals mapping main feature and implementation and choices

First and following the babel plugin there is exact globals mode and no exact globals mode.

In babel, the exact mode is activated by "exactGlobals": true.

I went first with that format. But then I hit a limitation with the exact mode handling choice.

I opted for doing exact matching using regex only. The Babel plugin match using string only.

I see it as inflexible. And require more work when the same imports are imported in different modules and have different relative paths ...

The way babel does it. it's to match exactly against the import declaration expression. Regex is more flexible and allows a better match for the same module without needing to repeat the matching ...

Example:

import another from './world/level2/index.js';
import another from '../world/level2/index.js';

are different

and

{
    globals: {
        './world/level2/index.js': 'Dragon.Bolt',
    },
    exactGlobals: true,
},

will match the first only.

And

{
    globals: {
        'world/level2/index.js': 'Dragon.Bolt',
    },
    exactGlobals: true,
},

will not match any.

A regex like:

'world/level2/index\\.js$' // or
'world/level.*?index\\.js$'

will match both.

Babel exact mode illustration:

image

image

image

Need to be all exact.

I started with globalExacts {bool} and globals {map}. But then to support the fallback resolution. Regex matched in order. The first that match, is used. I went for two variables. globals {HashMap<String, String> type} for no exact mode. And exactGlobals {Vec<(String, Sting)> -[[regexStr, mapStr]]- type} for exact mode.

So if none is provided. The default reduction is used. As before (plus the new mini changes).

If one is provided. It will go with exact or no exact.

No Exact mode

with globals {Option<HashMap<String, String>>} property.

Example

{
  "globals": {
    "@themagician/sparkle": "MatchedTMSparkle",
    "./worldLevel": "MatchedWorldLevel",
    "./storm.js": "MatchedStormJs",
    "./phenomena.levels": "MatchedPhenomenaLevels",
    "./some/index.js": "MatchedIndexJs",
    ".": "Matched_",
    "input": "WorldLib"
  }
}

the resolution happens as follows:

  • at configuration build. The globals are transformed into a reduced format. Where the left side is reduced. example:
`@themagician/sparkle => sparkle`,
`./phenomena.levels => phenomenaLevels`,
`./some/index.js => index`

The same reduce function is used to convert the imports. Babel does it that way.

  • any import or filename stem (for the file and lib export) is reduced first and checked against the hashmap. If it matches. The mapping is used. Otherwise, just return the reduced format.

The differences with babel are:

  • that babel does reduce the map as well. So mapping "@themagician/sparkle": "TM.Sparkle" in babel will map to global.TMSparkle. While in what I did, it would be global.TM.Sparkle. I opted for that because it's simpler. Less processing. And more flexible. People can map to whatever they want. It's there responsibility and again more flexible. Also, it's natural to not use . and whoever does, should know what he is doing.

  • Babel doesn't reduce the matchers. While our implementation does

{
    "globals": {
        "./this/some/lib.js": "Lib",
        "lib.js": "Lib",
        "lib": "Lib",
    }
}

All works the same in our implementation => they are reduced first. So they all end up as lib.

Babel require the users to provide the reduced format directly. Only "lib": "Lib" would work in babel for the example above.

I opted for this choice. To make it more intuitive and simple for users.

Exact mode

config format:

exactGlobals: Vec<(String, String)>

{
  "exactGlobals": [
    ["@themagician/sparkle", "MatchedTMSparkle"],
    ["\\./worldLevel", "MatchedWorldLevel"],
    ["\\./storm\\.js", "MatchedStormJs"],
    ["phenomena\\..*?$", "MatchedPhenomenaLevels"],
    ["\\./some/index\\..*$", "MatchedIndexJs"],
    ["^(\\.*?/*\\.*)*$", "Matched_"],
    ["(^|.*?/)input$", "WorldLib"],
    [".*NOMatch", "NO_MATCH"]
  ]
}

Picked as Array to allow fallback resolution for regex.

Reflected on the precedency test crates/swc_ecma_transforms_module/tests/fixture/common/issue-6697/exact-globals/matching-regex-order-precedency

I opted for regex. For the reasons explained above (babel exact string matching, is not flexible, and repetitive to work with ...).

The mapping from Vec<String, String> to Vec<CachedRegex, String> is done at the BuiltConfig build step.

Info

How to map lib (module export)

every module is exported as global.<filenameStem> = {}. So input.js => global.input = {}.

With no exact mode. just do the following:

{
    "globals": {
        "input": "LibSome"
    }
}

Exact mode

{
    "exactGlobals": [
        ["^input$", "LibSome"]
    ]
}
{
    "exactGlobals": [
        ["input", "LibSome"]
    ]
}

Would still work. And multiple regexes go with first is first to be matched.

In the above: input.js => global.LibSome = {}

Regex

back slash should go in JSON as \\. Ex: "^\\.$"

I would write a note about using regex in the doc.

Tests

All features were tested.

Documentation

I have all my notes. I can update it.

To check and questions

Dropping old globals to HashMap<String, Expr>

globals: self
    .globals
    .into_iter()
    .map(|(k, v)| {
        let parse = |s| {
            let fm = cm
                .new_source_file(FileName::Custom(format!("<umd-config-{}.js>", s)), s);

            parse_file_as_expr(
                &fm,
                Syntax::default(),
                Default::default(),
                None,
                &mut vec![],
            )
            .map_err(|e| {
                if HANDLER.is_set() {
                    HANDLER.with(|h| e.into_diagnostic(h).emit())
                }
            })
            .unwrap()
        };
        (k, parse(v))
    })
    .collect(),

globals wasn't used anywhere. And i didn't need the map above that was about mapping to Expr. So i drop it.

Is it ok? should i comment it out only?

no exact mode matcher reduction

As mentioned in no exact mode section and the part about differences with babel. I implemented the matchers (left side) to be reduced. First before matching. Allowing the users to write lib.js or ./some/lib.js or lib all will be reduced to the same lib. Babel require the reduced format or wouldn't work.

Is it ok for what i did (still compatible with babel, but an extra reduction operation)? Or should i just follow the babel plugin on that.

New to rust

I wonder about what i could have not done well. And what should be improved. Or done better.

BREAKING CHANGE:

Not breaking change.

There is things that were changed and provide a differnet js output then before (dot imports, Js extension no inclusion). But none is a breaking change.

Related issue:
#6697

- Implementing by reducing globals (no exact mode)
- Implementing exact mode by using regex and fallback resolution
- Fixing and adding handling of dot only imports to be `_` rather then empty string "" (a change too was made to have it as member and not computed member )
- fixing or updating imports reducing to include Js extension to not do. Just like babel and because it allow importance and exports to work with the same format. Too necessary.

- more details on the pr -
- needed for crates/swc_ecma_transforms_module/tests/fixture/common/issue-4456/1/output.umd.ts
- there is output.*.js so .ts version make sense
This commit include the changes proted for already existing tests before this PR update. (only 3)

- crates/swc_ecma_transforms_module/tests/fixture/common/issue-4456/1/output.umd.ts
is due to the handling of dot imports to `_` that was added.

- crates/swc_ecma_transforms_module/tests/fixture/common/issue-578/2/output.umd.js
   crates/swc_ecma_transforms_module/tests/fixture/common/issue-962/output.umd.js
   Due to the no Js type extension inclusion in reducing for imports (like babel - check PR for details)
- The readme included in that test explain what that is

- the behavior follow how babel do it.

- See PR for more details (Like what part of the code is responsible for this) -
## testing no exact globals
- testing when globals are in reduced format
- when globals are not in reduced format
- testing when no globals are provided
- testing also with the dot imports _  change and no Js type extension inclusion in reducing change.

## exact mode
- testing regex matching
- testing order precedency and regex resolution fallback
- testing no regex matching

##
in all the output and mapping is tested to work correctly
@kdy1
Copy link
Member

kdy1 commented Dec 26, 2022

cc @magic-akari Can you take a look?

@magic-akari
Copy link
Member

Sorry for the late reply.

It is an oversight which swc behaves differently than babel with respect to file extensions and dot imports.

I tested babel locally and have some doubts about some details

 let js_ext_regex = CachedRegex::new(r#"\.([cm]?[jt]s|[jt]sx)$"#).unwrap();

Do you only deal with some specific suffixes?
The behavior of babel is to erase the last suffix, whatever it is.

Examples:

import "foo.js";
import "foo.xxx";
import "foo.js.js";
import "foo.js.xxxx";

->

factory(global.foo, global.foo, global.fooJs, global.fooJs);

In this PR, you did two things:

  1. Fix existing bugs.
  2. Add exactGlobals support which is more powerful than babel.

For 1. I think it's ok.
For 2. I have no idea, it depends on @kdy1 .

@kdy1
Copy link
Member

kdy1 commented Dec 28, 2022

For 2, we need more discussion before merging a PR.

@kdy1
Copy link
Member

kdy1 commented Dec 28, 2022

@magic-akari Thank you!

@MohamedLamineAllal
Copy link
Contributor Author

MohamedLamineAllal commented Dec 31, 2022

@magic-akari @kdy1 First thank you both. And too sorry for being late. I got disconnected and distracted the past few days due to an extra unexpected work and some health issues. I'm more available now.

Do you only deal with some specific suffixes?

I did only the javascript one.

For the fact of only js, I think it makes sense for UMD.

Otherwise, the thing is that those are imports written in the code source. And we do have two cases. People writing them with the js extension type (.[cm]js, ...) or without it (commonjs, typescript style).

import s from  "some.container.js"
// and
import s from  "some.container"

for me should come as the same. If we follow the babel way as you mentioned. In the first it would be someContainer in the second it would be some. And the real file would be some.container.js. The stem would be some.container => making the lib export to be someContainer. If people write in a place some.container and in another some.container.js. Even though they refer to the same exported file under global.sameContainer. It would lead to not matching in the no-js version if not for the way I implement it. And So the way I implemented it, allows the alignment as explained in the motivation in the description (the export meets the imports (dependency in other files) in the default behavior).

Not as relevant but also some.service and some.container should not both resolve to the same some global for better isolation. But then always more critically, filename => stem => export name is the big deal. If we go as babel plugin did. Things wouldn't be aligned.

The transpilation happens on the fly without awareness of all files. And so I guess there is no way to determine when it's meant to be without extension and when it is.

I don't know, is there a case where the way I did it would cause a problem? A case where expecting the file to not be a js type extension?

And i guess if all make sense and you agree to the elements that are tackled. I guess we can always go with it as it is (js extensions type only). And if a need arises from the community. We can add a hand to provide the extensions matching regex. To support ones beside js. As an option. Personally, I'm not able to see a use case. And I wonder.

@hlerenow

This comment was marked as off-topic.

@kdy1 kdy1 added this to the Planned milestone Nov 13, 2023
@MohamedLamineAllal MohamedLamineAllal requested review from a team as code owners April 29, 2024 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants