Add SmartNet data #21

bscholer · 2025-05-21T19:16:39Z

Work Done

Adds a script, scrape_smartnet.py that will scrape the SmartNet webpage for all URLs and mountpoints available for all the different devices and regions that they support.
Adds the auto-generated smartnet data to data/World/smartnet.json.
A couple tweaks to black and flake8 so that they don't contradict each other and make committing impossible.

bscholer · 2025-05-21T19:18:21Z

@javier-jimenez-shaw-pix4d GGRS87 does not appear to have a Geodetic 3D CRS. The best I could find was EPSG:4121, which is 2D. Any thoughts on this?

javier-jimenez-shaw-pix4d · 2025-05-21T19:41:28Z

@bscholer I have to have a deeper look on it. But I finding strange things.

https://www.smartnetna.com/resources_configuration.cfm says in the webpage: "SmartNet North America". Then I do not see why there are many CRSs from Europe (including, but not only, GGRS87, the old system in Greece). That does not make any sense to me.

For instance "SmartNet - DE.SMARTNETNA.COM - Port 9101" using url http://de.smartnetna.com:9101 , filters by mountpoints "jdmrtk3" and "jdmrtk4", first with CRS "NAD83(2011)" and next with "ETRS89/DREF91/2016", that is the CRS for Germany, Europe. Even if it were correct (that I seriously doubt), the second condition will never we reached.

How can I get to that information in the webpage?

bscholer · 2025-05-21T20:26:55Z

Regarding "SmartNet North America", it seems like they added global coverage but just haven't updated the logo or URL from that page. I doubt that data would be there if it truly didn't exist though.

Good catch on de.smartnetna.com:9101 though. I feel pretty comfortable with just removing the conflicting NAD83 (2011) streams from these (non-uniform) entries. Either that, or we could use bbox filters to split them out.

I'm not sure where you'd get more info, besides the link above. I'm also not sure how to navigate to the page we're scraping, @hernando provided it in a comment on the original PR, so I just ran with it.

hernando

First pass on the scraping script. I only reviewed the data collection so far. I didn't spot yet why many entries have stream definitions for ETRS89 and NAD83 (or others), which doesn't make sense.

hernando · 2025-06-10T11:39:40Z

requirements.txt

 # They are not strictly HTTP requests, so we need an alternative. -> https://github.com/pycurl/pycurl
 pycurl
+
+# Libraries for scrape_smartnet.py


To not slow down CI in the general case, I would put these packages in a separate requirements file.

hernando · 2025-06-10T12:12:14Z

scripts/scrape_smartnet.py

+            "Content-Type": "application/x-www-form-urlencoded",
+            "Origin": base_url,
+            "Referer": base_url + "/resources_configuration.cfm",
+            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36",


Should we use the name of the script or something like that instead of impersonating a browser? I would even consider omitting "User-agent" even if it's recommended and the server accepts it.

hernando · 2025-06-10T12:18:24Z

scripts/scrape_smartnet.py

+        {
+            "Accept": "*/*",
+            "Content-Type": "application/x-www-form-urlencoded",
+            "Origin": "https://www.smartnetna.com",
+            "Referer": "https://www.smartnetna.com/resources_configuration.cfm",
+            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36",
+            "DNT": "1",
+        }


I would move this dict to a constant declared at the top and re-use it in fetch_manufacturer_devices.

hernando · 2025-06-10T12:19:53Z

scripts/scrape_smartnet.py

+    return df
+
+
+def fetch_connection_info(m_id, r_id, region):


The parameter names are too cryptic.

Suggested change

def fetch_connection_info(m_id, r_id, region):

def fetch_connection_info(manufacturer_id, rover_id, region):

hernando · 2025-06-10T12:42:35Z

scripts/scrape_smartnet.py

+    if resp is None:
+        return []
+
+    soup = BeautifulSoup(resp.text, "html.parser")


Suggested change

soup = BeautifulSoup(resp.text, "html.parser")

html = BeautifulSoup(resp.text, "html.parser")

hernando · 2025-06-10T12:48:03Z

scripts/scrape_smartnet.py

+    all_records = []
+    columns = None
+
+    for manufacturer_id in tqdm(range(1, 26), desc="Fetching manufacturer data"):


I don't like the idea of having the magic constant 26. What about:

Suggested change

for manufacturer_id in tqdm(range(1, 26), desc="Fetching manufacturer data"):

# Collecting of manufacturer ids from the combo box.

html = BeautifulSoup(r.text, "html.parser")

# The first element is skipped because it corresponds

# to the placeholder text "select"

manufacturer_ids = [x['value'] for x in html.find("select", id="ManufacturerID").find_all("option")[1:]]

for manufacturer_id in tqdm(manufacturer_ids, desc="Fetching manufacturer data"):

hernando · 2025-06-10T12:55:56Z

scripts/scrape_smartnet.py

+        for _, row in df_devices.iterrows()
+        for region in range(
+            1, 120
+        )  # This is just the min/max of the region IDs, some may be missing but this script will handle that.


I would collect the region IDs using a code similar to the suggestion above for the the manufacturer IDs.

However, I see that there are region IDs out of the US that the web pages doesn't list, but for which the script can collect data. Can we find the list of valid ID somewhere?

hernando · 2025-06-10T12:58:43Z

scripts/scrape_smartnet.py

+            executor.submit(fetch_connection_info, m, r, reg): (m, r, reg)
+            for (m, r, reg) in tasks


Suggested change

executor.submit(fetch_connection_info, m, r, reg): (m, r, reg)

for (m, r, reg) in tasks

executor.submit(fetch_connection_info, brand, model, region): (brand, model, region)

for (brand, model, region) in tasks

hernando · 2025-06-10T13:02:24Z

scripts/scrape_smartnet.py

+                if recs:
+                    results.extend(recs)
+            except Exception as e:
+                m, r, reg = futures[future]


Although I can understand the code, I think it's not super readable to use a dict to recover the parameters used to create each future.
Can't you create a list of tuples (brand, model, region) and then use a list comprehension to create a list of futures and use a zip iterator in the for at line 316?

javier-jimenez-shaw-pix4d · 2025-08-01T11:48:01Z

Hi @bscholer
Sorry for the delay.

I have been looking at the output, not the script yet. But I see many strange things. I think we should understand them before going forward.

As an intro, I would like to ask the provider (Hexagon). Hopefully the will tell us useful information. Do you know anybody there to contact?

After looking that the json, these are some of my findings:

There are many urls that do not work. That is very suspicious.
connections to port 9950 are on epoch 2024.75 for ITRF2024. I would like to know if they are going to update the epoch periodically. That date is "almost yesterday"... now I see that port 9401 is for epoch 2022.25. Interesting.
"SmartNet - ETRS89/DREF91/[email protected]" has a domain for "uk". They seems to be the different states in Germany, but I am not aware of any UK Bundesland.
"SmartNet - [email protected]": really the epoch used in Denmark is 1989?
"SmartNet - [email protected]" In Spain the IGN uses ETRF2000, I am surprised that Hexagon uses there epoch 2005, but could be.
"SmartNet - [email protected]". That is for Greece. But GGRS87 is an "old style" datum. I am very surpried that they use it for NTRIP. Especially without any ETRS89 alternative in Greece (some countries had this old systems, like DHDN in Germany. But they were removed at some point from the NTRIP services (2024 for Sapos). The user can convert afterwards, of course)
Something similar for Italy.
"SmartNet - DE.SMARTNETNA.COM - Port 9101" and other ports. It has first NAD83(2011), and then ETRS89/DREF91/2016. It will never work correctly.
"SmartNet - NL.SMARTNETNA.COM - Port 9101", same as Germany.

... now I see that DE can be Germany and Delaware (USA). NL can be Netherlands and New Foundland (Canada). There is something fishy with those country/state identifiers. It could be also problematic for the German states (Bundesländer), like SL for Saarland (Germany) or Slovenia, or Slovakia, or Sierra Leone.

bscholer · 2025-08-05T18:46:32Z

@javier-jimenez-shaw-pix4d all great points, and thank you for the detailed review. As you've pointed out, there are a plethora of issues with the data scraped from this site, and it's anybody's guess as to whether it's actively maintained.

I think the best approach is likely to reach out to them and ask about this, but I don't have a contact there. I'll go ahead and reach out to their customer support about this though.

Not sure about Pix4D, but at DroneDeploy, we only see a couple customers using this network to begin with, and I've just manually added the ones we do see to our fork. Most customers use our Instant RTK via our flight app, which automatically connects to PointOne/Geodnet and handles all this complexity under the hood.

If you agree with this being more trouble than it's worth, feel free to close out this PR. I don't think I'll have the time to dive very deeply into it.

bscholer added 3 commits May 21, 2025 15:12

add smartnet script and data

5ae2be7

add pyproject.toml

f0c2fc7

change description

e277e95

hernando reviewed Jun 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SmartNet data #21

Add SmartNet data #21

Uh oh!

bscholer commented May 21, 2025

Uh oh!

bscholer commented May 21, 2025

Uh oh!

javier-jimenez-shaw-pix4d commented May 21, 2025

Uh oh!

bscholer commented May 21, 2025

Uh oh!

hernando left a comment

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

hernando Jun 10, 2025

Uh oh!

javier-jimenez-shaw-pix4d commented Aug 1, 2025

Uh oh!

bscholer commented Aug 5, 2025

Uh oh!

Uh oh!

	def fetch_connection_info(m_id, r_id, region):
	def fetch_connection_info(manufacturer_id, rover_id, region):

	soup = BeautifulSoup(resp.text, "html.parser")
	html = BeautifulSoup(resp.text, "html.parser")

-    for manufacturer_id in tqdm(range(1, 26), desc="Fetching manufacturer data"):
+    # Collecting of manufacturer ids from the combo box.
+    html = BeautifulSoup(r.text, "html.parser")
+    # The first element is skipped because it corresponds
+    # to the placeholder text "select"
+    manufacturer_ids = [x['value'] for x in html.find("select", id="ManufacturerID").find_all("option")[1:]]
+    for manufacturer_id in tqdm(manufacturer_ids, desc="Fetching manufacturer data"):

		executor.submit(fetch_connection_info, m, r, reg): (m, r, reg)
		for (m, r, reg) in tasks

Add SmartNet data #21

Are you sure you want to change the base?

Add SmartNet data #21

Uh oh!

Conversation

bscholer commented May 21, 2025

Work Done

Uh oh!

bscholer commented May 21, 2025

Uh oh!

javier-jimenez-shaw-pix4d commented May 21, 2025

Uh oh!

bscholer commented May 21, 2025

Uh oh!

hernando left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javier-jimenez-shaw-pix4d commented Aug 1, 2025

Uh oh!

bscholer commented Aug 5, 2025

Uh oh!

Uh oh!