You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're currently facing an issue where one of our collectors is timing out due to the large volume of logs we're handling. We're using IBM Guardium for our client, and our logs are being generated at a rate of over 10,000 per minute.
We're fetching these logs into Elasticsearch using the Logstash http_poller input plugin. However, one of our collectors is giving us a request timeout error because of the large number of records generated per minute. The default request time for the API is 180 seconds, which is quite high. We've tried adjusting the number of requests per minute, but because of the sheer amount of data, we've also decreased the fetch size.
To address this, it would be beneficial if we could get the last value of a specific column, such as qid or timestamp, from the fetched data and store it as last_value. The http_poller plugin could then track this last_value in subsequent requests.
Similar to how jdbc_streaming tracks the last value to handle this, we could implement a mechanism to track the last value of a column in the fetched data. By doing so, we can ensure that we're retrieving only the new logs without missing any, thus resolving the timeout issue.
I've searched through the community forums, but it seems everyone is encountering the same problem. Unfortunately, no one has found a solution yet.
The text was updated successfully, but these errors were encountered:
ikishorkumar
changed the title
Need Page Size and last value feature
Need Page Size, last value feature and tracking column in http_poller
May 11, 2024
We're currently facing an issue where one of our collectors is timing out due to the large volume of logs we're handling. We're using IBM Guardium for our client, and our logs are being generated at a rate of over 10,000 per minute.
We're fetching these logs into Elasticsearch using the Logstash http_poller input plugin. However, one of our collectors is giving us a request timeout error because of the large number of records generated per minute. The default request time for the API is 180 seconds, which is quite high. We've tried adjusting the number of requests per minute, but because of the sheer amount of data, we've also decreased the fetch size.
To address this, it would be beneficial if we could get the last value of a specific column, such as qid or timestamp, from the fetched data and store it as last_value. The http_poller plugin could then track this last_value in subsequent requests.
Similar to how jdbc_streaming tracks the last value to handle this, we could implement a mechanism to track the last value of a column in the fetched data. By doing so, we can ensure that we're retrieving only the new logs without missing any, thus resolving the timeout issue.
I've searched through the community forums, but it seems everyone is encountering the same problem. Unfortunately, no one has found a solution yet.
The text was updated successfully, but these errors were encountered: