Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dash for xff, and region id starting the path #712

Merged
merged 1 commit into from
Oct 12, 2023

Conversation

kookster
Copy link
Member

@kookster kookster commented Oct 12, 2023

fixes #714

@@ -119,6 +120,16 @@ Resources:
});
};

const findIp = (xff, ip) => {
if (xff === '-') {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bug that made all the hashed ips the same - the xff is coming through most of the time as -, which is not blank, but also not an ip. This dash was being used as the ip instead of the client ip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof - good catch. I think I had similar in the counts-lambda, but apparently forgot about it here.

@@ -146,6 +157,12 @@ Resources:
// podcast id and episode guid (only works for dovetail3-cdn requests)
const datas = mappedRows.filter(data => {
const parts = data['cs-uri-stem'].split('/').filter(s => s);

// if the path starts with a region like usw2, shift that off
if (parts[0] && parts[0].match(/^[a-z][a-z0-9\-]+$/)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the other bug, that we have requests with an aws region name prefix, like /usw2/
These were all getting filtered out

@@ -163,8 +180,7 @@ Resources:
// calculate listener_ids
datas.forEach(data => {
// use leftmost XFF or IP
const xffParts = (data['x-forwarded-for'] || '').split(',').map(s => s.trim()).filter(s => s);
const leftMostIp = xffParts[0] || data['c-ip'];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comparison was picking the dash, -, value of the xff over the client ip

Copy link
Member

@cavis cavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good to me. Onwards to coordinating how to deploy this.

@@ -119,6 +120,16 @@ Resources:
});
};

const findIp = (xff, ip) => {
if (xff === '-') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof - good catch. I think I had similar in the counts-lambda, but apparently forgot about it here.

@@ -146,6 +157,12 @@ Resources:
// podcast id and episode guid (only works for dovetail3-cdn requests)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related directly to this change but ...

For re-processing purposes, it may be useful to also have this lambda log what S3 input file it's processing, and how many rows it had. Just above this line somewhere:

console.info(`Read ${rows.length} rows from s3://${Bucket}/${Key}`);

@kookster kookster merged commit 1147c48 into main Oct 12, 2023
@kookster kookster deleted the fix/xff_and_region_handling branch October 12, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Triton logs are processing with many fewer entries
2 participants