Skip to content

Conversation

fauzora
Copy link

@fauzora fauzora commented Sep 4, 2025

Problem

When data-exclude-hash="true" is set, hash fragments are only removed during navigation but NOT on the initial page load.
This can affect data collection in a production.

Root Cause

Looking at the original code, there was inconsistent URL normalization. Especially for handle url with hash

// Current code - Missing hash normalization on init
let currentUrl = href;
let currentRef = referrer.startsWith(origin) ? '' : referrer;

// only handlePush has normalization
const handlePush = (_state, _title, url) => {
  if (excludeHash) currentUrl.hash = ''; 
};

Solution

Apply Consistent URL Normalization.
Use the same normalize() function that handlePush already uses for initial URL assignment:

const normalize = raw => {
  if (!raw) return raw;
  try {
    const u = new URL(raw, location.href);
    if (excludeSearch) u.search = '';
    if (excludeHash) u.hash = '';
    return u.toString();
  } catch (e) {
    return raw;
  }
};

let currentUrl = normalize(href);
let currentRef = normalize(referrer.startsWith(origin) ? '' : referrer);

This ensures that both initial page loads and navigation events use the same URL normalization logic.

Copy link

vercel bot commented Sep 4, 2025

Someone is attempting to deploy a commit to the umami-software Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR addresses a URL normalization inconsistency in the Umami tracker's JavaScript code. The fix resolves a bug where hash fragments were being removed during navigation events but not on initial page load when data-exclude-hash="true" is configured.

The core issue was that the tracker had two different URL handling paths: one for initial page load (which didn't normalize URLs) and one for navigation events via handlePush (which did apply normalization). This resulted in inconsistent analytics data where initial page visits would include hash fragments while subsequent navigation would exclude them.

The solution introduces a centralized normalize() function that encapsulates the URL normalization logic for both excludeSearch and excludeHash configurations. This function safely handles URL parsing with proper error handling, falling back to the original raw URL if parsing fails. The fix updates three key areas:

  1. New normalize function: A reusable utility that creates a URL object, conditionally removes search parameters and hash fragments based on configuration, and returns the normalized string
  2. Updated handlePush: Now uses the centralized normalize function instead of inline normalization
  3. Fixed initial URL assignment: Both currentUrl and currentRef now use the normalize function to ensure consistent behavior from page load

This change ensures that URL normalization behavior is consistent across all tracking scenarios, providing more reliable analytics data collection. The implementation maintains backward compatibility while fixing the data inconsistency issue that could affect production analytics.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk as it fixes a clear data consistency bug without introducing breaking changes
  • Score reflects the straightforward nature of centralizing existing logic and the comprehensive error handling in the normalize function
  • No files require special attention as the changes are well-contained and the logic is defensive with proper fallbacks

1 file reviewed, no comments

Edit Code Review Bot Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant