Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are there missing dates in December 2021? #25

Open
matthewfeickert opened this issue Mar 22, 2022 · 0 comments
Open

Why are there missing dates in December 2021? #25

matthewfeickert opened this issue Mar 22, 2022 · 0 comments

Comments

@matthewfeickert
Copy link
Member

matthewfeickert commented Mar 22, 2022

This is a bit of a story to answer any questions that might come up in the future.

  • 2021-12-04: The Git scraping workflow gets paused by GitHub due to a lack of activity for reasons unknown (the repo is getting new commits each night).
  • 2021-12-16 08:45 US Central: Awkward Array gets posted to Hacker News and starts getting a lot of traction and lots of GitHub stars.
  • 2021-12-16 12:00 US Central: Matthew notices that the GitHub Action has been paused and restarts the action. There are now 13 days of missing time values in the data set.
  • 2021-12-17 UTC: The GitHub Action restarts and begins to Git scrape again, catching the start of the Awkward star rise in the 13 days of missing data.
  • 2021-12-30 UTC: The Hacker News inspired star activity on Awkward dies off.
  • 2022-03-21: @jpivarski gives a super nice talk on Metrics of computing trends in NHEP at the IRIS-HEP Topical meeting which includes the following plot on slide 6 (page 15)
    jim_slide

At the talk @alexander-held notices that the vertical dashed line of when the Hacker News post happened is after a large step and a bit of a climb. @matthewfeickert mentions that the jump is because during the Git scraping stoppage there was additional stars being added, but the vertical line still seems misplaced.

This is because the 13 missing days are not being plotted at all in matplotlib and so are simply being cut out. When preparing his plot, @jpivarski very reasonably

I drew the vertical line by hand, by doing a linear interpolation between the dates on the horizontal axis. (In Inkscape, I made a box connecting the two tick marks with a tool that snaps to points, calculated the fraction past the first date that I'd need, used a scaling dialog to shrink the box by exactly that percentage, then snapped the box to the first date and used it as a guide to add the dashed line.)

and so the linear interpolation is assuming there are dates there that don't exist, producing a shift in the location of the vertical line.

Drawing the vertical line in maplotlib when plotting the data itself

    _date="2021-12-17"  # Can't draw on 2021-12-16 as isn't in the data set
    ax.axvline(x=_date, color="grey", linestyle="dashed", label=_date)
    ax.text("2021-11-05", 450, _date, color="grey", size=20)  # hackily choosing a date to get the text in a reasonable location

shows that the data plotted line up correctly once the missing days are taken into account. 👍

time_series_stars_hackernews

This issue just is here to document what happened to avoid future confusion. Also good eye to @alexander-held for catching this in the talk and for starting a fun little bit of forensic visualization. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant