Skip to content

[Bug] ScrapeService fails to fetch reports starting with SPECI, TAF AMD, or TAF COR #74

@yjmp14

Description

@yjmp14

Hi,

I have noticed an issue where the ScrapeService (specifically in classes like Nam and Olbs) fails to retrieve reports when the source returns a variant header, such as SPECI, TAF AMD, or TAF COR.

The Problem

The current implementation of _extract in several StationScrape subclasses relies on strict string matching based on self.report_type.

For example, when fetching a METAR, the code constructs a search tag using self.report_type.upper() (e.g., searching for >METAR <). However, if the station has issued a SPECI, the HTML response typically contains >SPECI < instead of >METAR <.

Because the code only looks for METAR, the extraction fails, and the service returns an error (or claims the station doesn't exist), even though valid data is present.

The same logic applies to TAFs. If a TAF is amended (TAF AMD) or corrected (TAF COR), the scraper looks for >TAF < and fails to match the longer headers used by some providers (like NorthAviMet).

Affected Classes

  • avwx.service.scrape.Nam (NorthAviMet)

  • avwx.service.scrape.Olbs (India)

  • Potentially others relying on _simple_extract with strict headers.

Root Cause Analysis

In avwx/service/scrape.py, lines like this cause the failure:

# In class Nam
starts = [f">{self.report_type.upper()} <", f">{station.upper()}<", "top'>"]

When self.report_type is "metar", it strictly looks for METAR. It does not account for SPECI.

Suggested Fix

The extraction logic should prioritize searching for specific variants before falling back to the generic type.

I recommend adding a property to ScrapeService to define these variants (Longest Match First):

@property
def search_tags(self) -> list[str]:
    rt = self.report_type.upper()
    if rt == "TAF":
        return ["TAF AMD", "TAF COR", "TAF"]
    if rt == "METAR":
        return ["SPECI", "METAR"]
    return [rt]

And updating the _extract methods to iterate through these tags to find which one exists in the raw HTML.

For example, in Nam._extract:

# Detect which tag is actually present in the raw HTML
tag = self.report_type.upper()
for candidate in self.search_tags:
    if f">{candidate} <" in raw:
        tag = candidate
        break

# Use the detected tag for extraction
starts = [f">{tag} <", f">{station.upper()}<", "top'>"]

This change ensures that SPECI and TAF AMD are correctly identified and parsed.

Thanks for your work on this library!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions