Skip to content

Conversation

bscholer
Copy link
Contributor

Work Done

  • Adds a script, scrape_smartnet.py that will scrape the SmartNet webpage for all URLs and mountpoints available for all the different devices and regions that they support.
  • Adds the auto-generated smartnet data to data/World/smartnet.json.
  • A couple tweaks to black and flake8 so that they don't contradict each other and make committing impossible.

@bscholer
Copy link
Contributor Author

@javier-jimenez-shaw-pix4d GGRS87 does not appear to have a Geodetic 3D CRS. The best I could find was EPSG:4121, which is 2D. Any thoughts on this?

@javier-jimenez-shaw-pix4d
Copy link
Contributor

@bscholer I have to have a deeper look on it. But I finding strange things.

https://www.smartnetna.com/resources_configuration.cfm says in the webpage: "SmartNet North America". Then I do not see why there are many CRSs from Europe (including, but not only, GGRS87, the old system in Greece). That does not make any sense to me.

For instance "SmartNet - DE.SMARTNETNA.COM - Port 9101" using url http://de.smartnetna.com:9101 , filters by mountpoints "jdmrtk3" and "jdmrtk4", first with CRS "NAD83(2011)" and next with "ETRS89/DREF91/2016", that is the CRS for Germany, Europe. Even if it were correct (that I seriously doubt), the second condition will never we reached.

How can I get to that information in the webpage?

@bscholer
Copy link
Contributor Author

Regarding "SmartNet North America", it seems like they added global coverage but just haven't updated the logo or URL from that page. I doubt that data would be there if it truly didn't exist though.

Good catch on de.smartnetna.com:9101 though. I feel pretty comfortable with just removing the conflicting NAD83 (2011) streams from these (non-uniform) entries. Either that, or we could use bbox filters to split them out.

I'm not sure where you'd get more info, besides the link above. I'm also not sure how to navigate to the page we're scraping, @hernando provided it in a comment on the original PR, so I just ran with it.

Copy link
Contributor

@hernando hernando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass on the scraping script. I only reviewed the data collection so far. I didn't spot yet why many entries have stream definitions for ETRS89 and NAD83 (or others), which doesn't make sense.

# They are not strictly HTTP requests, so we need an alternative. -> https://github.com/pycurl/pycurl
pycurl

# Libraries for scrape_smartnet.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not slow down CI in the general case, I would put these packages in a separate requirements file.

"Content-Type": "application/x-www-form-urlencoded",
"Origin": base_url,
"Referer": base_url + "/resources_configuration.cfm",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the name of the script or something like that instead of impersonating a browser? I would even consider omitting "User-agent" even if it's recommended and the server accepts it.

Comment on lines +207 to +214
{
"Accept": "*/*",
"Content-Type": "application/x-www-form-urlencoded",
"Origin": "https://www.smartnetna.com",
"Referer": "https://www.smartnetna.com/resources_configuration.cfm",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36",
"DNT": "1",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this dict to a constant declared at the top and re-use it in fetch_manufacturer_devices.

return df


def fetch_connection_info(m_id, r_id, region):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter names are too cryptic.

Suggested change
def fetch_connection_info(m_id, r_id, region):
def fetch_connection_info(manufacturer_id, rover_id, region):

if resp is None:
return []

soup = BeautifulSoup(resp.text, "html.parser")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
soup = BeautifulSoup(resp.text, "html.parser")
html = BeautifulSoup(resp.text, "html.parser")

all_records = []
columns = None

for manufacturer_id in tqdm(range(1, 26), desc="Fetching manufacturer data"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of having the magic constant 26. What about:

Suggested change
for manufacturer_id in tqdm(range(1, 26), desc="Fetching manufacturer data"):
# Collecting of manufacturer ids from the combo box.
html = BeautifulSoup(r.text, "html.parser")
# The first element is skipped because it corresponds
# to the placeholder text "select"
manufacturer_ids = [x['value'] for x in html.find("select", id="ManufacturerID").find_all("option")[1:]]
for manufacturer_id in tqdm(manufacturer_ids, desc="Fetching manufacturer data"):

for _, row in df_devices.iterrows()
for region in range(
1, 120
) # This is just the min/max of the region IDs, some may be missing but this script will handle that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would collect the region IDs using a code similar to the suggestion above for the the manufacturer IDs.

However, I see that there are region IDs out of the US that the web pages doesn't list, but for which the script can collect data. Can we find the list of valid ID somewhere?

Comment on lines +313 to +314
executor.submit(fetch_connection_info, m, r, reg): (m, r, reg)
for (m, r, reg) in tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
executor.submit(fetch_connection_info, m, r, reg): (m, r, reg)
for (m, r, reg) in tasks
executor.submit(fetch_connection_info, brand, model, region): (brand, model, region)
for (brand, model, region) in tasks

if recs:
results.extend(recs)
except Exception as e:
m, r, reg = futures[future]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I can understand the code, I think it's not super readable to use a dict to recover the parameters used to create each future.
Can't you create a list of tuples (brand, model, region) and then use a list comprehension to create a list of futures and use a zip iterator in the for at line 316?

@javier-jimenez-shaw-pix4d
Copy link
Contributor

Hi @bscholer
Sorry for the delay.

I have been looking at the output, not the script yet. But I see many strange things. I think we should understand them before going forward.

As an intro, I would like to ask the provider (Hexagon). Hopefully the will tell us useful information. Do you know anybody there to contact?

After looking that the json, these are some of my findings:

  • There are many urls that do not work. That is very suspicious.
  • connections to port 9950 are on epoch 2024.75 for ITRF2024. I would like to know if they are going to update the epoch periodically. That date is "almost yesterday"... now I see that port 9401 is for epoch 2022.25. Interesting.
  • "SmartNet - ETRS89/DREF91/[email protected]" has a domain for "uk". They seems to be the different states in Germany, but I am not aware of any UK Bundesland.
  • "SmartNet - [email protected]": really the epoch used in Denmark is 1989?
  • "SmartNet - [email protected]" In Spain the IGN uses ETRF2000, I am surprised that Hexagon uses there epoch 2005, but could be.
  • "SmartNet - [email protected]". That is for Greece. But GGRS87 is an "old style" datum. I am very surpried that they use it for NTRIP. Especially without any ETRS89 alternative in Greece (some countries had this old systems, like DHDN in Germany. But they were removed at some point from the NTRIP services (2024 for Sapos). The user can convert afterwards, of course)
  • Something similar for Italy.
  • "SmartNet - DE.SMARTNETNA.COM - Port 9101" and other ports. It has first NAD83(2011), and then ETRS89/DREF91/2016. It will never work correctly.
  • "SmartNet - NL.SMARTNETNA.COM - Port 9101", same as Germany.

... now I see that DE can be Germany and Delaware (USA). NL can be Netherlands and New Foundland (Canada). There is something fishy with those country/state identifiers. It could be also problematic for the German states (Bundesländer), like SL for Saarland (Germany) or Slovenia, or Slovakia, or Sierra Leone.

@bscholer
Copy link
Contributor Author

bscholer commented Aug 5, 2025

@javier-jimenez-shaw-pix4d all great points, and thank you for the detailed review. As you've pointed out, there are a plethora of issues with the data scraped from this site, and it's anybody's guess as to whether it's actively maintained.

I think the best approach is likely to reach out to them and ask about this, but I don't have a contact there. I'll go ahead and reach out to their customer support about this though.

Not sure about Pix4D, but at DroneDeploy, we only see a couple customers using this network to begin with, and I've just manually added the ones we do see to our fork. Most customers use our Instant RTK via our flight app, which automatically connects to PointOne/Geodnet and handles all this complexity under the hood.

If you agree with this being more trouble than it's worth, feel free to close out this PR. I don't think I'll have the time to dive very deeply into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants