-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cssselect can't work on firefox #27
Comments
First, I know this is probably not gonna be helpful, but I’m curious why you want to use cssselect to use XPath in a browser that supports As to the bug itself, I will gladly review and merge a pull request. But to be honest, I’m a bit tired of XPath. There are so many corner cases like this that are just wrong in cssselect that I don’t really want to invest any more time in the "Selectors → XPath" approach. |
Thanks for replying. Our use case is a bit involved. We use selenium for testing, which only supports css2 selectors. It also supports xpath, so our attempt to bolt-on css3 support was to use cssselect, but the above bug prevents us from using this approach. I have a branch which replaces the use of [name() = 'e'] with [self::e], but it causes xpath to explode for tag names such as "h\a0 ref". I personally think this is reasonable and appropriate, but wanted to get your feedback since it seems someone went through some trouble to make sure pathological tags wouldn't crash. The branch in question is my "no_name" branch. All tests pass, but I had to convert some assertions to assertRaises. |
I've been looking at this a bit more. I believe the Right fix to this issue, which doesn't use name(), is case-insensitive, and doesn't allow cssselect to generated invalid xpath expressions is to update this regex to fully conform to the xml Name spec, and throw an ExpressionError when it doesn't match. Please give me your opinions of this idea. |
I don’t see how changing |
The name-test uses name() == 'x' because we can't be sure that self::x will be valid xpath for all x, such as u'h\xa0ref' or u'h]ref'. However, if I improve the safe_name regex to match the standard, we can detect invalid names, and use self::x syntax with confidence. The improved regex would be used to throw a cssselect.xpath.ExpressionError for identifiers such as 'h]ref'. I just want some sign that you'd consider such a change before I do the work. |
Again, this doesn't seem like it would fix the original issue since this not the only cases where |
How about using upper case when |
It would need to vary upper and lower depending on which xpath implementation the expression will be sent to. I'd rather not go down that path if possible. It sounds like you don't hate the idea, you're just not sure it will work. Let me work up a rough branch and show my work. I think all the uses of name() can be refactored, if we have a regex that can reliably determine whether a string can be used in an expression like self::x. |
To clarify: I’m pretty sure it will not work in all cases (especially in very much non-obvious corner cases), and I don’t like the idea as I find XML’s |
I agree that it will be hard to validate xml Qnames in all the edge cases, but I believe I can do it using a similar style to what you've done for the css grammar. I should be able to find a good listing of edge-case Qnames in some open-source project's test suite (libxml2?). |
It's not that it's hard, it's that it doesn't fix your issue. |
Firefox is unlike other xpath implementations in that name() returns an upper-cased string. cssselect's translation of the nth-child selector (for example) uses "name() = 'foo'" which will never possibly match, due to the above oddity. One workaround is to set HTMLTranslator.lower_case_element_names to False, and write selectors like 'LI:nth-child(2)', which will result in a working xpath for firefox, but won't work on any other xpath implementation.
I see two possible solutions:
Demonstration of the problem and solution here: (the contrast between chrome and firefox is stark, ie is like chrome)
http://fiddle.jshell.net/J7VrG/10/show/light/
The text was updated successfully, but these errors were encountered: