Skip to content

Why are there missing dates in December 2021? #25

@matthewfeickert

Description

@matthewfeickert

This is a bit of a story to answer any questions that might come up in the future.

  • 2021-12-04: The Git scraping workflow gets paused by GitHub due to a lack of activity for reasons unknown (the repo is getting new commits each night).
  • 2021-12-16 08:45 US Central: Awkward Array gets posted to Hacker News and starts getting a lot of traction and lots of GitHub stars.
  • 2021-12-16 12:00 US Central: Matthew notices that the GitHub Action has been paused and restarts the action. There are now 13 days of missing time values in the data set.
  • 2021-12-17 UTC: The GitHub Action restarts and begins to Git scrape again, catching the start of the Awkward star rise in the 13 days of missing data.
  • 2021-12-30 UTC: The Hacker News inspired star activity on Awkward dies off.
  • 2022-03-21: @jpivarski gives a super nice talk on Metrics of computing trends in NHEP at the IRIS-HEP Topical meeting which includes the following plot on slide 6 (page 15)
    jim_slide

At the talk @alexander-held notices that the vertical dashed line of when the Hacker News post happened is after a large step and a bit of a climb. @matthewfeickert mentions that the jump is because during the Git scraping stoppage there was additional stars being added, but the vertical line still seems misplaced.

This is because the 13 missing days are not being plotted at all in matplotlib and so are simply being cut out. When preparing his plot, @jpivarski very reasonably

I drew the vertical line by hand, by doing a linear interpolation between the dates on the horizontal axis. (In Inkscape, I made a box connecting the two tick marks with a tool that snaps to points, calculated the fraction past the first date that I'd need, used a scaling dialog to shrink the box by exactly that percentage, then snapped the box to the first date and used it as a guide to add the dashed line.)

and so the linear interpolation is assuming there are dates there that don't exist, producing a shift in the location of the vertical line.

Drawing the vertical line in maplotlib when plotting the data itself

    _date="2021-12-17"  # Can't draw on 2021-12-16 as isn't in the data set
    ax.axvline(x=_date, color="grey", linestyle="dashed", label=_date)
    ax.text("2021-11-05", 450, _date, color="grey", size=20)  # hackily choosing a date to get the text in a reasonable location

shows that the data plotted line up correctly once the missing days are taken into account. 👍

time_series_stars_hackernews

This issue just is here to document what happened to avoid future confusion. Also good eye to @alexander-held for catching this in the talk and for starting a fun little bit of forensic visualization. 🙂

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions