Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Custom Rules #70

Open
KibaNoOu opened this issue Sep 20, 2022 · 14 comments
Open

Add Custom Rules #70

KibaNoOu opened this issue Sep 20, 2022 · 14 comments
Assignees
Labels
feature New feature or request

Comments

@KibaNoOu
Copy link

Hi,
Would be possible to add custom rules,
because sometimes the app doesn't clean residual paramter of the url like the one after a "?"

@svenjacobs
Copy link
Owner

svenjacobs commented Sep 21, 2022

Hi @KibaNoOu,

do you have an example of a URL which is not properly cleaned, please?

Also what do you expect of custom rules? Is it okay to just enter exact parameter names to be removed or do you require something more sophisticated like regular expressions?

@svenjacobs svenjacobs added the feature New feature or request label Sep 21, 2022
@svenjacobs svenjacobs self-assigned this Sep 21, 2022
@KibaNoOu
Copy link
Author

Hi Sven,
Here's an example URL:
https://www.ebay.it/itm/175311733713?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=k-vapwf5rtw&sssrc=2349624&ssuid=n5H50APCTfe&var=&widget_ver=artemis&media=MORE

And would be great to have both exact parameter and regular expression.

Keep up the good work!

Screenshot_20220923-124501

@svenjacobs
Copy link
Owner

In this case I could add a specific sanitizer for eBay links if you tell me which parameters can be safely removed.

Regarding custom rules and regular expressions: Regex are very powerful. Allowing users to specify regex rules could potentially break the functionality of the application if there is an error in the expression. I need to think about how to deal with this possibility.

@KibaNoOu
Copy link
Author

For the eBay sanitizier everything after the first ? should be discarded.

So from this:
https://www.ebay.it/itm/175311733713?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=k-vapwf5rtw&sssrc=2349624&ssuid=n5H50APCTfe&var=&widget_ver=artemis&media=MORE

To This:
https://www.ebay.it/itm/175311733713

Regarding the Regex, you could hide it behind advanced option so only who really wants that function should enable it with all the warning of course.

@svenjacobs
Copy link
Owner

The eBay sanitizer is available in version 1.2.0.

@brsysadmin
Copy link

I'm interested by a feature of "custom rules" too :)

Details:

In this example the URL is clean (www.google.com) but in the real life the extracted URL needs probably to be sanitized to be usable :)

@svenjacobs
Copy link
Owner

@KibaNoOu @brsysadmin I've been thinking about the custom rules feature. Please provide your feedback in the discussion item.

@TPS
Copy link

TPS commented May 28, 2023

@gpsnomad's suggestion in #162 (reply in thread) might be a good stopgap until this is finalized:

Any chance you could build an option that just strips all parameters? Ie parses the link and stops at the first question mark that it finds? That would be good enough for me, rather than a custom sanitizer for each domain.

@KibaNoOu
Copy link
Author

@gpsnomad's suggestion in #162 (reply in thread) might be a good stopgap until this is finalized:

Any chance you could build an option that just strips all parameters? Ie parses the link and stops at the first question mark that it finds? That would be good enough for me, rather than a custom sanitizer for each domain.

Could be a really simple but effective solution!

@svenjacobs
Copy link
Owner

svenjacobs commented May 30, 2023

@TPS @KibaNoOu The thing is, we don't know for sure what parameters could be removed without breaking an URL. Of course we could remove all query parameters from an URL but some URLs, like the Amazon product link from a shopping cart, encode some optional parameters in path arguments (see /ref=…). But usually path arguments are required, not optional.

@TPS
Copy link

TPS commented May 30, 2023

The thing is, we don't know for sure what parameters could be removed without breaking an URL.

But usually path arguments are required, not optional.

That's why, as a stopgap, it'd be worth making stripping everything unknown a function (like Decode URL — wait, is this what Extract only URL is supposed to do? I never did figure that 1 out).

E.g., for Amazon, it's increasely evident that almost any of their product URLs can be rewritten into 1 format just keeping the 1 ASIN parameter, but, the non-product Amazon URLs, 1 can just strip everything ? & ref onwards, & it's mostly good. It certainly doesn't hurt to try such everyplace.

@NikunjKhangwal
Copy link

There's this app which allows custom rules, open source and also written in Kotlin. Maybe it could be used for reference?

@farOverNinethousand
Copy link

@NikunjKhangwal Oh wow that is a nice one too.
Looks like it supports custom rules via java script code which is not so intuitive but I guess it makes it quite flexible :)

@NikunjKhangwal
Copy link

Yeah
I'm not a developer so i don't know how exactly these things work but I just shared it so maybe dev can get some help 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants