Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple KeyTuples support in Anna #46

Open
authwork opened this issue May 16, 2020 · 2 comments
Open

Multiple KeyTuples support in Anna #46

authwork opened this issue May 16, 2020 · 2 comments

Comments

@authwork
Copy link

authwork commented May 16, 2020

@vsreekanti
Many thanks for your help.
I found KeyRequest can bring multiple KeyTuples in once transmission, so I try to upgrade it:

string vget_async(const vector<Key>& keys, int size) {
        # to simplify the process, or to jump the check of pending_get_response_map_
	//if (pending_get_response_map_.find(keys[0])
	//			== pending_get_response_map_.end()) {
		KeyRequest request;
		request.set_request_id(get_request_id());
		request.set_response_address(ut_.response_connect_address());
		request.set_type(RequestType::GET);
		for (int i = 0; i < size; i++) {
			KeyTuple* tp = request.add_tuples();
			tp->set_key(keys[i]);
		}
		try_request(request);
		return request.request_id();
	//}
	//return "";
}

When I set size = 1, it works normal; when it is larger than 1.
The request_id cannot match

# size = 1
time: 164087
throughput: 6.09433e-06
time: 1348
throughput: 0.00074184
time: 318
throughput: 0.00314465
staleness of one key: 12
10.1.2.1:0_9=?10.1.2.1:0_9
number of keys: 1
10.1.2.1:0_11=?10.1.2.1:0_11


# size = 2
time: 103791
throughput: 1.92695e-05
time: 56
throughput: 0.0357143
time: 310
throughput: 0.00645161
staleness of one key: 0
10.1.2.1:0_11=?10.1.2.1:0_7
number of keys: 1
=?10.1.2.1:0_11
staleness: 15
[libprotobuf FATAL /usr/local/include/google/protobuf/repeated_field.h:1522] CHECK failed: (index) < (current_size_):
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_):
Aborted

Update:
I found the cause of this bug. It may not be related to the Multiple KeyTuples, but the receive function of batched put.
It is likes:

void receive(KvsClientInterface *client, int number) {
	vector<KeyResponse> responses = client->receive_async();
	while (responses.size() < number) {
		responses = client->receive_async();
		number = number - responses.size();
		if (number == 0)
			break;
	}
}

Update II:
The Python client said:

# PUT only supports one key operations, we only ever have to look at
# the first KeyTuple returned.

But I see user_request_handler goes through all tuples and handle all of them, then why it said PUT only supports one key operation?

 for (const auto &tuple : request.tuples()) {
    // first check if the thread is responsible for the key
    Key key = tuple.key();
    string payload = tuple.payload();
    ...
    else if (request_type == RequestType::PUT) {
           ...
    }
    ...
}

I have extended PUT operation like GET, but only the first key-value pair in the passed key vector is executed

x = client->vput_async(keys, values, count, LatticeType);
receive(client)
x = client->vget_async(keys, count);

only keys[0]'s value is returned.

@vsreekanti
Copy link
Member

The Python client only currently supports a single PUT not for a fundamental reason but because we just haven't implemented putting multiple keys in parallel. Thanks for catching the receive bug. Please make a PR with that change.

With regards to how the sends and receives might work, keep in mind that Anna uses a DHT under the hood. So when you call a PUT with two keys, say k1, and k2, those two keys might be on different machines (i.e., k1 goes to node1 and k2 goes to node2). Those nodes will send separate responses, so you will have to look at not just the request ID but also the key to which each node is responding to make sure you have the correct request/response mapping. That's also why you will only see 1 KeyTuple in the response -- node1 doesn't know you also sent a request for k2 to node2, so it only sends one response. Hope this clears things up!

@authwork
Copy link
Author

authwork commented May 19, 2020

Many thanks for your explaination.
I want to be sure:

  1. Only the Python client of Anna could support getting multiple keys in parallel, this is implemented at the client side by sending separate responses to different machines.
  2. If some keys are located on the same machine, it can use one request with multiple KeyTuples because the user_request_handler could handle all KeyTuples in each request. If it can do so, I configure the replica.memory=4 and replica.local=1 on 4 machines. Does it mean each machine will be necessary to have one replica of each key?

Currently, in my tests, I found that doing many requests is costly:

  1. 100 32-Bytes PUT requests
  2. 1 3200-Bytes PUT request
    The first way caused 10 times longer latency than the second way.
    I guess it may be caused by the number of the (PUT(key)-receive(client)) loop.

============================================
Some ideal case likes (sharding + replica):

  1. The client divided a set of keys into multiple groups based on their locations.
  2. The client send one request to one machine, carrying a group of keys.
  3. Collect them on the client and get the values.

I am also looking for batched way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants