Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add position information to taint.config errors #732

Closed

Conversation

abishekvashok
Copy link
Contributor

@abishekvashok abishekvashok commented Apr 21, 2023

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

Add position information to taint.config validation errors - RuleCodeDuplicate, SourceDuplicate, SinkDuplicate.

Add an additional option to the analyze command called --verify-taint-configuration which just verifies the taint configuration and nothing more.

What makes this commit long and sturdy is mostly due to YoJSON, the current json file parser used to parse taint.config files, lack support for line numbers and column numbers of matched elements. This leaves us with two options:

  1. Use another json parser which has support for getting line numbers

  2. Get positioning information via hacky string operations

  3. Has performance implications and for 1, the closest parser that does this is Jsonm, which has a complete lack of utility functions that yojson provide.

I opted for 1 and then converted the results to conform to the yojson data format, so that parsing is done only once and we get to use yojson's utility functions along with positioning information!

Why is this important? For our visual code extension, positioning information is vital.

Test Plan

  • Run pyre -n analyze --verify-taint-configuration on the documentation/pysa_tutorial/exercise1/ with taint.config as follows:
{
  "sources": [
    {
      "name": "CustomUserControlled",
      "comment": "use to annotate user input"
    },
    {
      "name": "CustomUserControlled",
      "comment": "use to annotate user input"
    }
  ],

  "sinks": [
    {
      "name": "CodeExecution",
      "comment": "use to annotate execution of python code"
    },
    {
      "name": "CodeExecution",
      "comment": "use to annotate execution of python code"
    }

  ],

  "features": [],

  "rules": [
    {
      "name": "Possible RCE:",
      "code": 5001,
      "sources": [ "CustomUserControlled" ],
      "sinks": [ "CodeExecution" ],
      "message_format": "User specified data may reach a code execution sink"
    },
    {
      "name": "Possible RCEss:",
      "code": 5001,
      "sources": [ "CustomUserControlled" ],
      "sinks": [ "CodeExecution" ],
      "message_format": "User specified data may reach a code execution sink"
    },
    {
      "name": "Possible RCEss:",
      "code": 5001,
      "sources": [ "CustomUserControlled" ],
      "sinks": [ "CodeExecution" ],
      "message_format": "User specified data may reach a code execution sink"
    }
  ]
}

Screenshot 2023-04-21 at 7 03 03 PM

  • Run pyre -n analyze --verify-taint-configuration on the documentation/pysa_tutorial/exercise1/ with taint.config changes reverted.

Screenshot 2023-04-21 at 7 03 50 PM

  • Run pyre analyze on the documentation/pysa_tutorial/exercise1/ to make sure everything is working as before.

Screenshot 2023-04-21 at 7 08 03 PM

Fixes part of MLH-Fellowship#82

@abishekvashok abishekvashok changed the title WIP: Add position information to taint.config errors Add position information to taint.config errors Apr 24, 2023
Add position information to taint.config validation errors -
RuleCodeDuplicate, SourceDuplicate, SinkDuplicate.

Add an additional option to the analyze command called
--verify-taint-configuration which just verifies the taint configuration
and nothing more.

What makes this commit long and sturdy is mostly due to YoJSON, the
current json file parser used to parse taint.config files, lack support
for line numbers and column numbers of matched elements. This leaves us
with two options:
1. Use another json parser which has support for getting line numbers
2. Get positioning information via hacky string operations

2. Has performance implications and for 1, the closest parser that does
this is Jsonm, which has a complete lack of utility functions that
yojson provide.

I opted for 1 and then converted the results to conform to the yojson
data format, so that parsing is done only once and we get to use
yojson's utility functions along with positioning information!

Why is this important? For the visual code extension, positioning
information is vital.

Signed-off-by: Abishek V Ashok <[email protected]>
@abishekvashok
Copy link
Contributor Author

NB: The pysa GitHub action/workflow was failing previously before this PR as well :)

Comment on lines +411 to +416
@click.option(
"--verify-taint-configuration",
is_flag=True,
default=False,
help="Verify taint.config files. Skips analysis.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be moved in a different PR? It seems unrelated to adding positions and makes it harder to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I will try splitting :)

Comment on lines +1830 to +1902
let rec to_yojson_data_model = function
| `Null -> `Null
| `Bool b -> `Bool b
| `String s -> `String s
| `Float f -> `Int (int_of_float f)
| `A a -> `List (List.map ~f:to_yojson_data_model a)
| `O o ->
let obj = List.map ~f:(fun (key, value) -> key, to_yojson_data_model value) o in
`Assoc obj
in
let object_with_position_information value decoder =
let (line, column), _ = Jsonm.decoded_range decoder in
`O
[
InternalJsonKeys.line_number, `String (string_of_int line);
InternalJsonKeys.column_number, `String (string_of_int column);
InternalJsonKeys.value, value;
]
in
let parse_with_jsonm parasable_string =
let decode decoder =
match Jsonm.decode decoder with
| `Lexeme lexeme -> lexeme
| `Error error ->
raise
(JsonmException
{
message = Format.asprintf "%a" Jsonm.pp_error error;
position = Jsonm.decoded_range decoder;
})
| `End
| `Await ->
raise
(JsonmException
{
message = "Encountered end of file or input";
position = Jsonm.decoded_range decoder;
})
in
let rec value element key decoder =
match element with
| `Os -> obj [] key decoder
| `As -> arr [] key decoder
| (`Null | `Bool _ | `String _ | `Float _) as data_value ->
key (object_with_position_information data_value decoder) decoder
| _ ->
raise
(JsonmException
{
message = "Encountered unexpected token or element";
position = Jsonm.decoded_range decoder;
})
and arr values key decoder =
match decode decoder with
| `Ae -> key (`A (List.rev values)) decoder
| element -> value element (fun value -> arr (value :: values) key) decoder
and obj ms key decoder =
match decode decoder with
| `Oe -> key (`O (List.rev ms)) decoder
| `Name name -> value (decode decoder) (fun value -> obj ((name, value) :: ms) key) decoder
| _ ->
raise
(JsonmException
{
message = "Encountered unexpected token or object sequence";
position = Jsonm.decoded_range decoder;
})
in
let decoder = Jsonm.decoder (`String parasable_string) in
let jsonm_results = value (decode decoder) (fun value _ -> value) decoder in
(* Convert Jsonm data model to that of YoJSON to make use of the later's utility functions. *)
to_yojson_data_model jsonm_results
in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we keep using the Yojson representation but add some positions in Assoc objects?
I understand the logic of doing it this way since that would lead to the least amount of changes but this is pretty ugly and hard to maintain.

Given that the json representation is not too large, I think I would prefer if we declared our own AST with positions, and parse into it.

That could look like (inspired from pyre's old AST):

module Node = struct
  type 'a t = { 
    location: Location.t; (* something to represent locations *)
    value: 'a;
  }
}


module Json = struct
  type expression =
   | Null
   | Bool of bool
   | String of string
   | Float of float
   | List of t list
   | Assoc of (string * t) list

  and type t = expression Node.t
end

Copy link
Contributor Author

@abishekvashok abishekvashok Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will experiment with this to see if it would work with YoJSON's utility functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will have to reimplement those, but it shouldn't be too hard. I would basically dump Yojson entirely and implement a proper library with an AST with locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's gonna be fun :)

Comment on lines +496 to +497
taint_config_position = "";
taint_config_path = PyrePath.current_working_directory ();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why do we need to store a position into AnnotationParser.t?
  2. Could we use a proper type (i.e something like Location.t) to represent locations? It's also odd that the position is a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If we don't store location in Annotation Paraer, we can't validate between sinks in different taint.config files. For example, consider a sink named A defined in two different taint.config files. Currently we parse the taint.config files one by one and then use a seperate validate function to validate things. If we don't store position, I don't know how we could cross compare.. same applies for rules (which currently if duplicate in multiple taint.config files is not registered as an error, this changed), sinks etc.

  2. We could define a custom Location type for this :)

@abishekvashok
Copy link
Contributor Author

@arthaud closing in favour of #734 and others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants