Datavalidation is a system which scans scouting data as its collected and flags possible data entry errors. The scouting admin is then alerted to these possible errors. This system is written in python
and is run after scouting data for each match is collected.
To configure datavalidation you only need to edit a couple fields in the config.yaml
file. This file cand be found under /backend/data_validation
directory.
Example:
year: 2022
event_code: "iri"
year
- specifies the year the competition is taking place(i.e. 2023)
event_code
- a string, written in all lowercase letters, which corresponds to the given event and can be found on TBA (i.e. 'iri', 'cmptx')
Additionally, a json file must exist in the backend/data
directory in the format {year}{event_code}_match_data.json
where {year} and {event_code} are replaced by the values specified in the config.yaml
file.
run_with_tba
- determines whether match schedule will be retrieved from tba or from file, also determines if checks based on TBA data will run
- default:
true
A couple of the checks in the datavalidation software rely having access to an accurate match schedule. These checks are important since they ensure that the data collected corresponds to the correct team number.
By default the software will attempt to retrieve the match schedule using TBA's api. However, at smaller competitions its possible that TBA may not have a copy of match schedule ready in time for the competition. In that case the scouting admin is able to edit a copy of the match schedule by hand.
To do this the run_with_tba
configuration field must be specified as false
in the config.yaml
file. The ../data/match_schedule.json
file, which currently contains the match schedule from 2022iri
, can then be edited.
Datavalidation uses inheritance to provide teams with some level of base functionality and allow them to add their own data checks
BaseDataValidation
Class- At the heart of the DataValidation software is a class called
BaseDataValidation
. This class contains basic checks which are essential to validating data from any scouting app(i.e. match schedule checks, defense checks). - It also includes two important abstract methods
validate_data
andvalidate_submission
, these two methods must be implimented in any child class and are where other checks are called from.
- At the heart of the DataValidation software is a class called
DataValidation2022
Child Class- Inherits
BaseDataValidation
class DataValidation2022(BaseDataValidation): def __init__(self, path_to_config: str = "config yaml"): super().__init__(path_to_config)
- Contains checks on data for a specific years game
validate_data
method- takes in all scouting data
- runs checks with require data from multiple matches(i.e. statistical outliers)
- runs
validate_submission
on each data submission
validate_submission
method- takes in a single submission as parameter, a submission refers to the set of data collected by one scout during one match
- calls all methods which check data from one submisison
- Inherits
- Each check is a method of the
DatValidation2022
class and should take in the data fields it uses as parameters- it is recommended to include the
match_key
andteam_number
as parameters of any check since they can be used as identifiers in the error message
- it is recommended to include the
- Example data check function signature:
def check_for_auto_great_than_6( self, match_key: str, team_number: int, auto_lower_hub: int, auto_upper_hub: int, auto_misses: int, ) -> None:
- Teams may then impliment whatever logic they like in the function body
- To flag an error you must call the
add_error
method which takes two arguments theerror_message
and theerror_type
- The
error_message
is simply a string which describes the error - The
error_type
takes in a value from a predefined enum which is used to categorize the error- Here are the possible
error_type
valuesclass ErrorType(Enum): DEBUG = 0 INFO = 1 WARNING = 2 INCORRECT_DATA = 3 EXTRA_DATA = 4 MISSING_DATA = 5 CRITICAL = 6 RESCOUT_MATCH = 7
- Here are the possible
- Example call to
add_error
if balls_shot_in_auto > 6: self.add_error( f"In {match_key}, {team_number} UNLIKELY AUTO SHOT COUNT", error_type=ErrorType.WARNING, )
- The
- Lastly to run the check the function must be called in either
validate_submission
orvalidate_data
and pass in the required data arguments - Example data check function call
def validate_submission(self, submission: Series) -> None: self.check_for_auto_great_than_6( match_key=submission["match_key"], team_number=submission["team_number"], auto_lower_hub=submission["auto_lower_hub"], auto_upper_hub=submission["auto_upper_hub"], auto_misses=submission["auto_misses"], )