trurl --trim scheme
?
#203
Replies: 4 comments 2 replies
-
trurl only outputs, unless you use You cannot use a If your goal is actually to only print out only the $ cat test
http://a.example.com/test/foo/./bar/..
xyz.example.org
https://b.example.com:20/test?hi#hello
ftp://[email protected]/hey.txt
$ trurl -f - < ./test
http://a.example.com/test/foo
http://xyz.example.org/
https://b.example.com:20/test?hi#hello
ftp://[email protected]/hey.txt
$ trurl -f - -g '{:host}{:path}' < ./test
a.example.com/test/foo
xyz.example.org/
b.example.com:20/test
c.example.org/hey.txt You may also use Maybe the Anyway, as a workaround, in the specific case of removing a scheme, if you really want to remove the scheme and nothing else from a full URL for some reason, I guess you can use something like this: $ trurl -f - < ./test | sed -n 's@^[^:]*://@@p'
a.example.com/test/foo
xyz.example.org
b.example.com:20/test?hi#hello
[email protected]/hey.txt
$ # or to only print http/https URLs, without the scheme
$ trurl -f - < ./test | sed -n 's@^https\{0,1\}://@@p'
a.example.com/test/foo
xyz.example.org
b.example.com:20/test?hi#hello
$ # notice that trurl guessed the scheme for xyz.example.org as http://
$ # so it is printed. This should be fine since trurl will only output lines that contain one full valid URL, and discard invalid URLs in the input, so you can assume that the scheme will not contain colons, and removing everything before the first ":", and the "://" after that will only remove the scheme. |
Beta Was this translation helpful? Give feedback.
-
Oh, duh. Sorry, your example also had URLs that were identical except for the scheme, so I don't know how i missed that. :p Still, I don't understand why you are trying to only remove the scheme. In that case, you can simply set the scheme to the desired value e.g. $ trurl -f - -s 'scheme=http' < ./test | sort -u If you want to do something more complex like discarding non-http/https URLs, and keeping https:// if both http:// and https:// are specified, you can use $ trurl --json -f - < ./test | jq -r 'group_by(del(.url, .scheme, .raw_port))[] | first(("https", "http") as $s | .[] | select(.scheme == $s).url)' |
Beta Was this translation helpful? Give feedback.
-
I'm with @emanuele6. You can do this already with a few very simple workarounds: either decide to use |
Beta Was this translation helpful? Give feedback.
-
Thank you @bagder and @emanuele6 for your replies and various solutions to my simple problem! I thought, I might just use trurl incorrectly. My use-case comes from InfoSec: I aim to remove the protocol and duplicates before running a port scan. The point of trurl only outputs valid URLs makes sense and is something I would keep too - my use-case is limited and not worthy to change trurl for it. A simple |
Beta Was this translation helpful? Give feedback.
-
Hello @bagder,
Thank you for
trurl
!I was wondering if trurl allows to remove the scheme (to dedup them later). I've tried this among other commands:
Expected:
Set with an empty path or a space didn't lead to success.
Is there a way to drop the protocols using trurl?
Cheers,
Peter
Beta Was this translation helpful? Give feedback.
All reactions