-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka Hardlimit #102
base: twitter-master
Are you sure you want to change the base?
Kafka Hardlimit #102
Conversation
Upgrading to 0.132-tw-0.16
Upgrade to 0.139-tw-17
…scovery Add a few logs in DiscoveryNodeManager to debug 'No worker nodes' error.
…_queries Add query logging of all presto queries
Upgrade to 0.139-tw-0.18
…sitive_parquet_column_match Look up parquet columns by name case-insensitive
…are_parquet_column_match Handle hive keywords when doing a name-based parquet field lookup
Upgrade to 0.139-tw-0.19
[maven-release-plugin] copy for tag 0.141 # Conflicts: # pom.xml
Upgrade to 0.141-tw-0.20
upgrade Presto to 0.179
…at-off-in-default Set default value for hive statistic feature to false
…function signature
Upgrade to 181
Kafka07 plugin
Fix build and style
…iver-node-version Tableau Presto ODBC Driver fix
@@ -76,7 +76,17 @@ | |||
/** | |||
* Fetch size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more about what Fetch Size
is
private DataSize fetchSize = new DataSize(10, Unit.MEGABYTE); | ||
|
||
/** | ||
* Default Query Interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a variable. So:
- Don't call it
defaultXXXX
. - Default value should be a
final
value. Here, it is good to have one final Duration value calledDEFAULT_QUERY_RANGE
and a variable calledqueryInterval
. And then assign the default value of the variable with the constant value.
Also, can you elaborate more about what this variable means? I have to read through the logic code using this variable to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think based on the code style in Presto, it's better just assign the default value here. I agree that the variable name here is a little bit confusing. But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basing on the code logic, i.e. this value can be assigned. It is not a default value. Default value is a constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultQueryInterval
is the variable and Duration.valueOf("10m")
is the default
value for this variable which is a constant
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above:
This is a variable. So:
- Don't call it
defaultXXXX
.- Default value should be a
final
value. Here, it is good to have one final Duration value calledDEFAULT_QUERY_RANGE
and a variable calledqueryInterval
. And then assign the default value of the variable with the constant value.
Call it queryInterval
if it is a variable. (Or queryRange
can be more descriptive). Otherwise, it is confusing; people need to go through the code logic to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above:
But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.
private Duration defaultQueryInterval = Duration.valueOf("10m"); | ||
|
||
/** | ||
* Hard limit on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you elaborate more about what this variable means? I have to read through the logic code using this variable to understand it.
/** | ||
* Hard limit on | ||
*/ | ||
private boolean hardLimitOn = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isHardLimitOn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. "isHardLimitOn" should be a method instead of a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the variable name could be hardLimited
{ | ||
return fetchSize; | ||
} | ||
|
||
@Config("kafka.fetch-size") | ||
public KafkaConnectorConfig setFetchSize(int fetchSize) | ||
public KafkaConnectorConfig setFetchSize(String fetchSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling you want to use the Builder pattern to achieve the immutable Config
class. But it seems self-finished; not that great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this class is reading the config files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
@@ -225,6 +227,11 @@ public ColumnMetadata getColumnMetadata(ConnectorSession session, ConnectorTable | |||
} | |||
|
|||
log.info("startTs: %s, endTs: %s", startTs, endTs); | |||
if (config.isHardLimitOn() && startTs == null && endTs == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how the hard_limit is applied. Can you add more comments to make it clear? Especially for the for (Map.Entry<ColumnHandle, Domain> entry : columnHandleDomainMap.entrySet())
loop above.
Also, could we check whether it meets the expectations:
- If users specify a time range exceeding the limitation, an exception should be thrown directly. Otherwise, the behavior is changed silently and unexpectedly.
- If users do not specify the range, we could either throw an exception directly, or trim to the limited time range with an explicit logging/print-out notifying users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some thoughts about your expectations:
- If the user doesn't know how to specify a range, they will fall into default query interval.
- If the user knows how to specify a range, they must agree with default query interval and must have a specific reason to go beyond the default query interval. It would be costly to serve the users if they actually have this specific request. It will involve code review and service deploy processes which wouldn't the ideal way for serving our customers for urgent needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
If the user doesn't know how to specify a range, Presto should either throw an exception directly, or trim to the limited time range with an explicit logging/print-out notifying users. Silently unexpected behavior is bad.
-
A design should be robust to users' mistakes/errors, rather than to rely on users always providing safe input. Also, this is what limitation means: it should not happen.
If users want to query exceeding the limitation, they need to notify all stakeholders, for instance, messaging team, since it could potentially break the whole kafka service, leading to a SEV0 or SEV1.
@@ -124,7 +124,7 @@ private static String kafkaTopicName(TpchTable<?> table) | |||
public static Session createSession() | |||
{ | |||
return testSessionBuilder() | |||
.setCatalog("kafka") | |||
.setCatalog("kafka07") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. I like the version tag attached.
private DataSize fetchSize = new DataSize(10, Unit.MEGABYTE); | ||
|
||
/** | ||
* Default Query Interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think based on the code style in Presto, it's better just assign the default value here. I agree that the variable name here is a little bit confusing. But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.
{ | ||
return fetchSize; | ||
} | ||
|
||
@Config("kafka.fetch-size") | ||
public KafkaConnectorConfig setFetchSize(int fetchSize) | ||
public KafkaConnectorConfig setFetchSize(String fetchSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this class is reading the config files
/** | ||
* Hard limit on | ||
*/ | ||
private boolean hardLimitOn = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. "isHardLimitOn" should be a method instead of a variable.
@@ -225,6 +227,11 @@ public ColumnMetadata getColumnMetadata(ConnectorSession session, ConnectorTable | |||
} | |||
|
|||
log.info("startTs: %s, endTs: %s", startTs, endTs); | |||
if (config.isHardLimitOn() && startTs == null && endTs == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some thoughts about your expectations:
- If the user doesn't know how to specify a range, they will fall into default query interval.
- If the user knows how to specify a range, they must agree with default query interval and must have a specific reason to go beyond the default query interval. It would be costly to serve the users if they actually have this specific request. It will involve code review and service deploy processes which wouldn't the ideal way for serving our customers for urgent needs.
No description provided.