You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The flags chapter (located in the spark-analysis section and in raw-notebooks/flags) currently only has one use case for flags (age differences). I'd like to see some additional use cases for flags - for example if we assume somebody would like to conduct some sort of analysis on whether incidents are more common in summer than winter, an idea for a flag would be an "occurs_in_summer" and "occurs_in_winter" flag.
Or we could look at some of the cost variables, for example a flag for whether an incident costs more than £X amount.
Relatively simple flags, but I think they'd cover some of the use cases people would be looking at flags for.
Just to give a realistic example of a use case for flags, back when I was working on the COVID infection survey we were doing a logistic regression to see whether people were more likely to be hospitalised for COVID/Respiratory illnesses/Cardiovascular illnesses in winter vs summer, so we had to create a bunch of flags for covid_in_summer_2021, covid_in_summer_2022, covid_in_winter_2021, covid_in_summer_2022 (repeated for respiratory/cardiovascular). At the time my PySpark knowledge was horrendous so I did something incredibly inefficient (I think I did a for loop doing a bunch of groupbys and then joins), which is why I think we could really help the people that were in the same situation I was back then.
The text was updated successfully, but these errors were encountered:
The flags chapter (located in the spark-analysis section and in raw-notebooks/flags) currently only has one use case for flags (age differences). I'd like to see some additional use cases for flags - for example if we assume somebody would like to conduct some sort of analysis on whether incidents are more common in summer than winter, an idea for a flag would be an "occurs_in_summer" and "occurs_in_winter" flag.
Or we could look at some of the cost variables, for example a flag for whether an incident costs more than £X amount.
Relatively simple flags, but I think they'd cover some of the use cases people would be looking at flags for.
Just to give a realistic example of a use case for flags, back when I was working on the COVID infection survey we were doing a logistic regression to see whether people were more likely to be hospitalised for COVID/Respiratory illnesses/Cardiovascular illnesses in winter vs summer, so we had to create a bunch of flags for covid_in_summer_2021, covid_in_summer_2022, covid_in_winter_2021, covid_in_summer_2022 (repeated for respiratory/cardiovascular). At the time my PySpark knowledge was horrendous so I did something incredibly inefficient (I think I did a for loop doing a bunch of groupbys and then joins), which is why I think we could really help the people that were in the same situation I was back then.
The text was updated successfully, but these errors were encountered: