-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenge 31 - Flood forecasting: the power of citizen science #9
Comments
Hi, I'm Sagarika, a BTech junior majoring in computer science and engineering. I am skilled in python, machine learning, deep learning and have worked on image classification problems before. I would love to contribute to this project. |
Hi Sagarika, I will check if I can provide any other information/tool. Best, Marie-Amélie |
Hi Marie-Amélie, |
Hi! Marie-Amélie |
Hi Marie-Amélie, |
Hi! Marie-Amélie |
Hi, |
Hi! The codes and project descriptions from the previous years are all available on GitHub, here: https://github.com/esowc/ Marie-Amélie |
Thank you so much for your assistance! |
Dear Mentor @colonesej @QueenMABhydro I just had a few questions?
From these sentences, it looks like we have already a river representation like a raster (saying where we have a river and where we don't) Can you share a little more in this regard? |
Hi, It is a tab separated file (there are comas in the descriptions). Regarding the matching: GloFAS has a relatively coarse representation of rivers. Only large rivers are represented. On the opposite, most Crowdwater stations (but not all) are for small streams. So first, some crowdwater points will not exist in GloFAS (and in EFAS). Also, forecasts (GloFAS or EFAS) are issued specific points (with lat lon coordinates), but represent a whole drainage area. The upstream area associated to each point is in the UpArea.nc file. For CrowdWater data, you will have the lat/lon coordinates of the point, but no upstream area. The crowdwater data lat/lon coordinates might be more or less accurate. It is usually not sufficient to do the matching only by comparing lat/lon points. You also have to make sure the drainage areas are the same. So... it might not be straightforward to do the spatial matching. I hope this helps! Marie-Amélie |
Thanks @QueenMABhydro! This is really helpful, to follow up on this, would it be right to say the idea here is to match the variation of the Coming to the next part. I would like to summarise the problem here, let us suppose there are two points of forecasts for GLoFAS (or EFAS) - O1 and O2 belonging to two different catchments D1 and D2. We have a measurement at a point P (as shown in the image), such that distance PO2 (d2) is smaller than the distance PO1 (d1). Still P might belong to catchment D1, and hence point matching would not work. The first part of the challenge would be to find the catchment (drainage area) to which point P belongs. We also have the raster of the UpArea.nc file, and the challenge is reduced to finding a point inside a polygon. This might be computationally expensive (depending upon the number of catchments in the UpArea.nc), but can be perfectly determined in most of the cases. The next challenge is to relate the CrowdWater This is my understanding considering that the GLOFAS(or EFAS) is a perfect model, which we know is not the case.
Based on the description of this part, we are trying to correct the model prediction. Suppose that the GLOFAS (or EFAS) prediction is not accurate (in space), then we correct the drainage area (catchment delineation) and find the right location for the prediction from GLoFAS or EFAS, and solve the above mentioned two problems. Is it the correct description of the challenge? Knowing this would really help me propose better solutions. |
I was just going through the dataset and I realise that the value of the area is given by UpArea.nc and not the spatial information about the catchment. This might make things tricky. Is it so? |
Hi, I will try to address all your questions one by one:
Yes, that is right
Also correct.
Yes, but I think it would probably be difficult to fit a ML model for every single point, mainly because there are not that many measurements for each point. It would probably be good to think about some grouping method, potentially by catchment, yes, but there might also be other possibilities.
They are indeed imperfect, but there are ways to potentially take that into account, at least to some extent, given that the forecasts are ensembles.
Correcting the forecasts (post-processing) would be a step further than what is proposed in the challenge, but it could be included in your proposal if you want.
Well, yes, it is definitely so. One additional information that could be helpful is the information from real official gauging stations, because the metadata for those stations usually includes the area of the catchment at the station. Then, if you can match the GloFAS area with the area for a station, this could be a good starting point to obtain more spatial info about the catchment. But of course, if a catchment is ungauged (appart from CrowdWater points), you won't have that information. I hope this helps! Marie-Amélie |
@QueenMABhydro Yes, exactly this is what I needed. I have some interesting ideas to match the point with the catchment. It will be interesting to find the ways in which we can group data points. Need to think about that little more. Just started with the proposal. I will come back if I have more doubts. |
@QueenMABhydro This is about
This is a hard problem, as we don't have the data about what a unit difference means in terms of hydrological variables. Can I have some suggestions on what kind of data we have to resolve this problem? One of the suggested thing is to train a ml model on water level variation with flow values at gauge station, but then there is no reason for it to generalise, unless we provide the dimension of the stream. I might be missing something, anything on this would be really helpful. |
Hi! It is indeed a difficult problem. It wouldn't be a challenge if it wasn't challenging ;) We can perhaps provide the dimension of the stream. For instance, it could probably be estimated from Lidar data. Marie-Amélie Boucher |
Challenge 31 - Flood forecasting: the power of citizen science
Goal
Develop a Python package to facilitate the use of crowdsourced hydrological measurements for forecast validation
Mentors and skills
Challenge description
Why do we need a solution
Floods are one of the biggest disasters killing countless numbers of people and destroying properties. Forecasting these killers is important to reduce such impacts. The key to improving these forecasts are observations and in particular new types of observation such as crowdsourced data that offer significant opportunities.
Recently, exciting initiatives such as CrowdWater have turned information from people into incredible rich scientific data. In the case of crowdsourcing, people send geo-referenced pictures of streams or rivers along with the corresponding variations of water level. Thousands of data have been gathered over the world like that, covering areas where no other observation is available. The challenge is to convert this precious data into something that can be used in flood forecasting models so that the information is not lost but used to improve the models to help save a life. This project is about solving two key challenges that stop CrowdWater information to be used in the CEMS GloFAS flood forecasting system:
Data and software
We plan to use CrowdWater virtual stations located on larger rivers and drainage networks from CEMS (EFAS and GloFAS). We plan to use OpenStreetMap to identify rivers and derive metadata. There is also a possibility of using synthetic data (i.e. designed to replicate data that could be obtained by CrowdWater in the future in addition to the data series which already exist.
What could be the solution
We are looking for a solution that will 1) transform water level variations to a variable that can be used for verification of GloFAS and EFAS forecasts and 2) map CrowdWater virtual stations to GloFAS and EFAS points. This can be achieved through a variety of methods, for instance by mimicking the human mapping procedure, through the use of image analysis and/or pattern recognition techniques to match the real river to the representation of the model and then map the stations to the correct model pixels, also exploiting additional metadata such as the station name or the river name. Another possibility is to compute stations' upstream drainage area by using a digital elevation model (DEM) and geomatics tools in Python. The mapping of each station should ideally include a quality flag showing a confidence level in the mapping result.
Ideas for the implementation
We envisage that implementation might include the following steps:
Using a selection of CrowdWater stations for which there also exists an official river gauge, train a machine learning (ML) model to learn the relationship between water level variations, other explanatory variables, and streamflow. Then, use this ML model to translate water level variations into streamflow for all candidate CrowdWater stations (whether or not an official river gauge is also available)
Extract the river map for the area surrounding the station and the available metadata, such as rivers names from OpenStreetMap or any other open dataset. Another option is to compute the station’s upstream drainage area using a DEM and geomatics tools.
Map the station using coordinates and metadata (like the name of the river or the name of a nearby location).
The text was updated successfully, but these errors were encountered: