-
Notifications
You must be signed in to change notification settings - Fork 281
Description
If one wants to fetch all attributes of a certain type, it seems the recommended approach is through the use of page and limit parameters, iterating through pages until the returned amount of attributes is less than limit.
As an example, I have more than 5 mio attributes of type md5 in my instance and want to fetch all of them.
From experiments it seems regardless of the value of limit, the search time increases, measured as such:
def fetch_attributes(self):
response_count = 1
l = 20000
p = 1
sum_attr = 0
sum_time = 0
while response_count > 0:
t0 = time.time()
attributes = self.client.search(controller="attributes", return_format="json", type_attribute=["md5"], page=p, limit=l)
t1 = time.time()
total = t1 - t0
response_count = len(attributes["Attribute"])
p += 1
sum_attr += response_count
sum_time += total
print(f"fetched {len(attributes["Attribute"])} attributes in {total}, sum_attr = {sum_attr}, sum_time = {sum_time}")Example output (see attachment for full output):
fetched 20000 attributes in 4.9379072189331055, sum_attr = 20000, sum_time = 4.9379072189331055
fetched 20000 attributes in 4.651666879653931, sum_attr = 40000, sum_time = 9.589574098587036
fetched 20000 attributes in 4.8340137004852295, sum_attr = 60000, sum_time = 14.423587799072266
fetched 20000 attributes in 3.9235310554504395, sum_attr = 80000, sum_time = 18.347118854522705
fetched 20000 attributes in 4.641859292984009, sum_attr = 100000, sum_time = 22.988978147506714
...
fetched 20000 attributes in 12.544357299804688, sum_attr = 1380000, sum_time = 558.8374326229095
fetched 20000 attributes in 11.658548831939697, sum_attr = 1400000, sum_time = 570.4959814548492
fetched 20000 attributes in 12.361718893051147, sum_attr = 1420000, sum_time = 582.8577003479004
fetched 20000 attributes in 13.921313285827637, sum_attr = 1440000, sum_time = 596.779013633728
Please do not pay attention to the actual values, but more the clear trend of queries taking longer and longer as pages are iterated.
Below I have included some of the configuration parameters I found relevant. From htop it does not seem like CPU or memory resources are exhausted, however, I am not an expert in interpreting it.
php memory_limit is set to 8192 MB.
/etc/my.cnf.d/server.cnf:
...
[mysqld]
datadir=/data/mysql-data
innodb_buffer_pool_size=4G
innodb_io_capacity=1000
innodb_log_file_size=600MB
innodb_read_io_threads=16
...
MISP version 2.4.198
PyMISP version 2.5.1
MariaDB version 11.4.3
Full print output:
output.txt
Please let me know if you need anymore information or if this issue belongs to the MISP project instead.