Revise pcap parser file selection algorithm to eventually process 100% of the data #1022

mattmathis · 2021-09-24T20:03:59Z

Revise the archive file selection algorithm for the pcap parser to rotate through all of the data in 10% batches.

Consider a hash based selection:
if (HASH(filename)+epoch) % 10 == 0 { process file }
where epoch is incremented every time the pcap gardner reaches the end of the data.

mlab-code-reviews · 2021-09-24T20:17:40Z

I don't think there is any particular reason we shouldn't just let this parse all the data. It should only take a few days. Then we should probably shut it off rather than reprocessing it regularly. A more useful bug fix would be to change the processing location, so that we aren't moving data between regions. This is the biggest concern when processing 100% of the pcaps. We could instead consider copying the table from staging.

…

On Fri, Sep 24, 2021 at 4:04 PM 'Matt Mathis' via code-reviews < ***@***.***> wrote: Revise the archive file selection algorithm for the pcap parser to rotate through all of the data in 10% batches. Consider a hash based selection: if (HASH(filename)+epoch) % 10 == 0 { process file } where epoch is incremented every time the pcap gardner reaches the end of the data. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1022>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHDGT54QNYH4HHUTFYGXRHDUDTKTXANCNFSM5EWUT7JA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. -- To unsubscribe from this group and stop receiving emails from it, send an email to ***@***.***

-- Greg Russell / Measurement-Lab https://memegen.googleplex.com/4558349824688128

mattmathis · 2022-03-23T01:47:02Z

We are now processing 10% of the pcaps every 16 days. Please update to process all current and historical files.
SELECT COUNT (DISTINCT date) AS days, MIN(parser.Time) OldestParse, FROM mlab-oti.ndt_raw.pcap`
Yields: 838 2022-03-06 02:31:10.345666 UTC on 2022-03-22

autolabel bot added the review/triage Team should review and assign priority label Sep 24, 2021

mattmathis assigned stephen-soltesz and cristinaleonr Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise pcap parser file selection algorithm to eventually process 100% of the data #1022

Revise pcap parser file selection algorithm to eventually process 100% of the data #1022

mattmathis commented Sep 24, 2021

mlab-code-reviews commented Sep 24, 2021 via email

mattmathis commented Mar 23, 2022

Revise pcap parser file selection algorithm to eventually process 100% of the data #1022

Revise pcap parser file selection algorithm to eventually process 100% of the data #1022

Comments

mattmathis commented Sep 24, 2021

mlab-code-reviews commented Sep 24, 2021 via email

mattmathis commented Mar 23, 2022