-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add position information to taint.config errors #732
Conversation
6763c9b
to
61fc0b0
Compare
Add position information to taint.config validation errors - RuleCodeDuplicate, SourceDuplicate, SinkDuplicate. Add an additional option to the analyze command called --verify-taint-configuration which just verifies the taint configuration and nothing more. What makes this commit long and sturdy is mostly due to YoJSON, the current json file parser used to parse taint.config files, lack support for line numbers and column numbers of matched elements. This leaves us with two options: 1. Use another json parser which has support for getting line numbers 2. Get positioning information via hacky string operations 2. Has performance implications and for 1, the closest parser that does this is Jsonm, which has a complete lack of utility functions that yojson provide. I opted for 1 and then converted the results to conform to the yojson data format, so that parsing is done only once and we get to use yojson's utility functions along with positioning information! Why is this important? For the visual code extension, positioning information is vital. Signed-off-by: Abishek V Ashok <[email protected]>
61fc0b0
to
11b8c00
Compare
NB: The pysa GitHub action/workflow was failing previously before this PR as well :) |
@click.option( | ||
"--verify-taint-configuration", | ||
is_flag=True, | ||
default=False, | ||
help="Verify taint.config files. Skips analysis.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be moved in a different PR? It seems unrelated to adding positions and makes it harder to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I will try splitting :)
let rec to_yojson_data_model = function | ||
| `Null -> `Null | ||
| `Bool b -> `Bool b | ||
| `String s -> `String s | ||
| `Float f -> `Int (int_of_float f) | ||
| `A a -> `List (List.map ~f:to_yojson_data_model a) | ||
| `O o -> | ||
let obj = List.map ~f:(fun (key, value) -> key, to_yojson_data_model value) o in | ||
`Assoc obj | ||
in | ||
let object_with_position_information value decoder = | ||
let (line, column), _ = Jsonm.decoded_range decoder in | ||
`O | ||
[ | ||
InternalJsonKeys.line_number, `String (string_of_int line); | ||
InternalJsonKeys.column_number, `String (string_of_int column); | ||
InternalJsonKeys.value, value; | ||
] | ||
in | ||
let parse_with_jsonm parasable_string = | ||
let decode decoder = | ||
match Jsonm.decode decoder with | ||
| `Lexeme lexeme -> lexeme | ||
| `Error error -> | ||
raise | ||
(JsonmException | ||
{ | ||
message = Format.asprintf "%a" Jsonm.pp_error error; | ||
position = Jsonm.decoded_range decoder; | ||
}) | ||
| `End | ||
| `Await -> | ||
raise | ||
(JsonmException | ||
{ | ||
message = "Encountered end of file or input"; | ||
position = Jsonm.decoded_range decoder; | ||
}) | ||
in | ||
let rec value element key decoder = | ||
match element with | ||
| `Os -> obj [] key decoder | ||
| `As -> arr [] key decoder | ||
| (`Null | `Bool _ | `String _ | `Float _) as data_value -> | ||
key (object_with_position_information data_value decoder) decoder | ||
| _ -> | ||
raise | ||
(JsonmException | ||
{ | ||
message = "Encountered unexpected token or element"; | ||
position = Jsonm.decoded_range decoder; | ||
}) | ||
and arr values key decoder = | ||
match decode decoder with | ||
| `Ae -> key (`A (List.rev values)) decoder | ||
| element -> value element (fun value -> arr (value :: values) key) decoder | ||
and obj ms key decoder = | ||
match decode decoder with | ||
| `Oe -> key (`O (List.rev ms)) decoder | ||
| `Name name -> value (decode decoder) (fun value -> obj ((name, value) :: ms) key) decoder | ||
| _ -> | ||
raise | ||
(JsonmException | ||
{ | ||
message = "Encountered unexpected token or object sequence"; | ||
position = Jsonm.decoded_range decoder; | ||
}) | ||
in | ||
let decoder = Jsonm.decoder (`String parasable_string) in | ||
let jsonm_results = value (decode decoder) (fun value _ -> value) decoder in | ||
(* Convert Jsonm data model to that of YoJSON to make use of the later's utility functions. *) | ||
to_yojson_data_model jsonm_results | ||
in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, we keep using the Yojson
representation but add some positions in Assoc
objects?
I understand the logic of doing it this way since that would lead to the least amount of changes but this is pretty ugly and hard to maintain.
Given that the json representation is not too large, I think I would prefer if we declared our own AST with positions, and parse into it.
That could look like (inspired from pyre's old AST):
module Node = struct
type 'a t = {
location: Location.t; (* something to represent locations *)
value: 'a;
}
}
module Json = struct
type expression =
| Null
| Bool of bool
| String of string
| Float of float
| List of t list
| Assoc of (string * t) list
and type t = expression Node.t
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will experiment with this to see if it would work with YoJSON's utility functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will have to reimplement those, but it shouldn't be too hard. I would basically dump Yojson entirely and implement a proper library with an AST with locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's gonna be fun :)
taint_config_position = ""; | ||
taint_config_path = PyrePath.current_working_directory (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why do we need to store a position into
AnnotationParser.t
? - Could we use a proper type (i.e something like
Location.t
) to represent locations? It's also odd that the position is a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
If we don't store location in Annotation Paraer, we can't validate between sinks in different taint.config files. For example, consider a sink named A defined in two different taint.config files. Currently we parse the taint.config files one by one and then use a seperate validate function to validate things. If we don't store position, I don't know how we could cross compare.. same applies for rules (which currently if duplicate in multiple taint.config files is not registered as an error, this changed), sinks etc.
-
We could define a custom Location type for this :)
Pre-submission checklist
pip install -r requirements-dev.txt && pre-commit install
pre-commit run
Summary
Add position information to taint.config validation errors - RuleCodeDuplicate, SourceDuplicate, SinkDuplicate.
Add an additional option to the analyze command called --verify-taint-configuration which just verifies the taint configuration and nothing more.
What makes this commit long and sturdy is mostly due to YoJSON, the current json file parser used to parse taint.config files, lack support for line numbers and column numbers of matched elements. This leaves us with two options:
Use another json parser which has support for getting line numbers
Get positioning information via hacky string operations
Has performance implications and for 1, the closest parser that does this is Jsonm, which has a complete lack of utility functions that yojson provide.
I opted for 1 and then converted the results to conform to the yojson data format, so that parsing is done only once and we get to use yojson's utility functions along with positioning information!
Why is this important? For our visual code extension, positioning information is vital.
Test Plan
pyre -n analyze --verify-taint-configuration
on thedocumentation/pysa_tutorial/exercise1/
withtaint.config
as follows:pyre -n analyze --verify-taint-configuration
on thedocumentation/pysa_tutorial/exercise1/
withtaint.config
changes reverted.pyre analyze
on thedocumentation/pysa_tutorial/exercise1/
to make sure everything is working as before.Fixes part of MLH-Fellowship#82