Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privacy #2

Open
martindurant opened this issue Sep 6, 2017 · 103 comments
Open

privacy #2

martindurant opened this issue Sep 6, 2017 · 103 comments

Comments

@martindurant
Copy link
Collaborator

@bdrosen96 , would any of your work allow RCP "privacy" mode ("hadoop.rpc.protection": "privacy"}) to be enabled?

@bdrosen96
Copy link

yes definitely.

@martindurant
Copy link
Collaborator Author

As you can see from the commit log in this repo, I have already folded in some of your work. Do you know what it would take to get "privacy" to work? Even better, would you be willing to contribute :) ?

@bdrosen96
Copy link

There are three more new PRs from today:

bdrosen96/libhdfs3#18
bdrosen96/libhdfs3#19
bdrosen96/libhdfs3#20

They really should have just been one PR.

@martindurant
Copy link
Collaborator Author

What do these ones do, @bdrosen96 ?
Currently #3 is up to just before KMS on your list, but I suspect that those changes already exist in HAWQ so might be redundant here. Is there any way you can test #3? I can confirm that it builds and hdfs3 seems to work with it without changes, but I don't have a secured cluster to test against right now.

@bdrosen96
Copy link

bdrosen96 commented Sep 13, 2017

This fixes a bug with SASL negotiation when protection setting is privacy or integrity for kerberos/GSSAPI. The odd thing is that this only seems to reproduce on some clusters so it was not noticed before. In addition, I also added two new session config variables to allow a user to manually specify RPC protection and data transfer protection rather than just getting it via SASL in case that logic has further issues. I will have one more follow up PR, likely tomorrow (#21) that with fix the new config variable to use the string names used by hadoop instead of the SASL QOPS number equivalent.

As for testing 3, I can try to do so, assuming I can easily build it using same process as I do now in our internal tests, but I'm not sure I will definitely be able to get to it tomorrow.

Which changes do you think already exist in HAWQ?

@martindurant
Copy link
Collaborator Author

Well I'm glad that at least you have this in production, to pick up those bugs for as many configurations as possible.
I would appreciate description of the config variable changes - or are these things that are picked up by LIBHDFS3_CONF?

I think HAWQ implemented the KMS stuff, or at least these is some code touching that, and I haven't yet looked at your PRs to see if they amount to the same or not.

Side question: do you know if the SASL implementation can be called to wrap/unwrap general bytes for encrypting yarn RPC streams? I would like to be able to command yarn from python directly... Or, alternatively, there are various SASL implementations in python (or wrappers), but I have no idea whether they implement the more complicated security layers. Any insight would be very useful.

@bdrosen96
Copy link

The two new config vars are:

hadoop.rpc.protection
dfs.data.transfer.protection

and with last PR that went in this morning, they mean same thing as the XML hadoop vars of the same name and can have same values (privacy, authentication or integrity)

I'm not sure how the SASL implementation will work with yarn RPC since I have never tried to do so. I would not be surprised if it would work with it, assuming YARN uses similiar RPC as name node. Not sure that yarn has as many options for configuring RPC though.

@martindurant
Copy link
Collaborator Author

Excellent stuff, looks good. Do you intend to work further on this? I wonder, is https://github.com/martindurant/libhdfs3-downstream now a good place for you to work against? It might be convenient for you to be nearer the upstream.

As far as I know, YARN RPC uses exactly the same mechanisms as HDFS, even being defined by the same 'hadoop.rpc.protection' parameter. cerastes attempts to do this, but using a wrapper around cyrus-sasl.

@bdrosen96
Copy link

I don't have any current plans to make more changes unless/until a new issue may come up.

Once all the needed changes are in the downstream repo and it has been confirmed to work with all the various cases my current repo does, then I could possibly switch.

That probably explains why hadoop.rpc.protection is in core-site.xml and not hdfs-site.xml.

@martindurant
Copy link
Collaborator Author

@bdrosen96 : as expected, I am having trouble merging more of your code into the current HAWQ-based version, because it implements KMS, but in a different way to your code. For a trivial change, at https://github.com/ContinuumIO/libhdfs3-downstream/blob/master/libhdfs3/src/client/OutputStreamImpl.cpp#L255 it uses fileStatus.getFileEncryption() where you have getEncryptionInfo() - I am assuming these do the same, but the interface is different.

Do you have any recommendations for moving forward? It is possible to pull only the "private"-related material out of your code and trust that HAWQ's KMS stuff is right? I care much more about the former than the latter. Any other idea for how to consolidate would be appreciated. Unfortunately, I am no star c++ coder; just applying only your changes post-KMS is proving no simpler, since the base code is now different in places...

@bdrosen96
Copy link

From what I can see, the version of KMS supported in HAWQ may have some functionality gaps.

1 It does not support kerberos at all, where my version does.

2 It does not support specifying KMS auth type.

3 It does not support tokens.

4 I don't think it supports the Base64 variant:

std::replace(encoded.begin(), encoded.end(), '+', '-');
std::replace(encoded.begin(), encoded.end(), '/', '_');

on encrypt

or decrypt:

int rem = data.length() % 4;
if (rem) {
rem = 4 - rem;
while (rem != 0 ) {
data = data + "=";
rem -= 1;
}
}
std::replace(data.begin(), data.end(), '-', '+');
std::replace(data.begin(), data.end(), '_', '/');

I have not yet looked at how the kms itself is integrated into the encrypt/decrypt portions.

@martindurant
Copy link
Collaborator Author

Would you, then, advocate working from your branch's tip and integrating whatever is needed on the HAWQ end, rather than the other way around as I've been trying?

@bdrosen96
Copy link

For point 4, looks like this is actually partially handled, just in CryptoCodec.cpp.

As for encoding, I see that in one place, but I don't yet see decode and I think there is at least one other place that might need encoding.

I would suggest first intergrating existing PR (without KMS) plus the new PRs from last week as one unit into the code base. Then start with new PR to handle the KMS/HAWQ integration. Most likely way to try to handle this would be to first to try to add token and kerberos SPNEGO support for existing HAWQ KMS as one PR.

@bdrosen96
Copy link

Interesting thing is it looks like the HAWQ kms stuff may have copied mine to some extent (or we both copied from same reference). If you look at HttpClient.cpp you see we have same macros for CURL.

@bdrosen96
Copy link

Although they do seem to have lost CURLOPT_SSL_VERIFYPEER, option which is important to support self signed certificates which I think we want to support.

@martindurant
Copy link
Collaborator Author

I don't think I can merge the non-KMS PRs of yours, that's the problem - or at least, I'm unsure now how much knock-on code change there would be. For instance, the following single diff:

@@ -625,7 +625,8 @@ void PipelineImpl::createBlockOutputStream(const Token & token, int64_t gs, bool
                                           nodes[0].formatAddress(),
                                           config.getEncryptedDatanode(),
                                           config.getSecureDatanode(),
-                                          key, config.getCryptoBufferSize()));
+                                          key, config.getCryptoBufferSize(),
+                                          config.getDataProtection()));

doesn't match the current def of

void PipelineImpl::createBlockOutputStream(const Token & token, int64_t gs, bool recovery)

@bdrosen96
Copy link

I'm confused. I only see one version of createBlockOutputStream for both versions.

@martindurant
Copy link
Collaborator Author

Now I'm confused too - let me get back to you on that!

@martindurant
Copy link
Collaborator Author

OK, so I don't know where that came from, but you still have things like this: https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/compare/ae2e980821066030f29d4d3ee1cafb3eab3fface...Pivotal-Data-Attic:a366a8b#diff-51b855d2105da1e2e82c3f52a31df6cbL247

where the current line has no reference to encryption (but maybe it should have!), but the diff adds protection after encryption parameters.

@bdrosen96
Copy link

I did not do a lot of the work for supporting the read short circuit stuff, because I don't use it (mostly due to restrictions) and because it seems to be unclear whether it is HDFS-2246 or HDFS-347 based.

@bdrosen96
Copy link

I don't think it being missing in read short circuit should block it from merging, so long as it will compile and work at least as well as it did previously. I'm not even sure that local read support encryption and secure mode and kms anyway.

@martindurant
Copy link
Collaborator Author

This may be the missing commits 993a2b6 , I will try to build and test, although I only have non-secured HDFS right now.

@martindurant
Copy link
Collaborator Author

It does compile and the hdfs3 test suite passes

@bdrosen96
Copy link

So we just need:

bdrosen96/libhdfs3#18
bdrosen96/libhdfs3#19
bdrosen96/libhdfs3#20
bdrosen96/libhdfs3#21

and then that leaves only the KMS?

@martindurant
Copy link
Collaborator Author

I think those are in now, or are you saying I missed something?

@bdrosen96
Copy link

Ah, looks like I missed seeing those commits added. I do see one thing in SessionConfig.h in the getLogSeverity. Is that something you did?

Also, If I get a chance tomorrow I'm going to try to make a private repo with some docker image stuff that can be used to run hadoop in various modes as that might help with testing.

@martindurant
Copy link
Collaborator Author

That would be very useful. Here is the image I use with python dev environment, hdfs and yarn, but no security.
The logSeverity thing was already there in hawq: https://github.com/apache/incubator-hawq/blob/master/depends/libhdfs3/src/common/SessionConfig.h#L209

@bdrosen96
Copy link

The issue with getLogSeverity is that the PR seems to have lost the ++i to be just i which I don't believe is correct.

@bdrosen96
Copy link

Just sent you an invite to the repo I set up for this. I spent a couple of hours reworking and simplifying some existing testing framework that I use, so there might be some bugs. The script run_hadoop.sh in that repo should support secure and insecure and allow it to vary various rpc and data settings that are relevant to HDFS3. It will bring up a cluster which uses hostnames inside a private domain that can be accessed from the host machine, but might use slightly non standard ports. (ie 9000 vs 8020)

There are also docker images for HA insecure and HA secure, but the script does not currently support them. Adding support would likely not be too hard, but I don't think it is needed for most of the testing cases.

@martindurant
Copy link
Collaborator Author

Anyway I can turn on extra debug info to figure out the auth problem, i.e,. why it is trying to connect as a proxy user?

@martindurant
Copy link
Collaborator Author

According to the HDFS nameserver logs, when you use the hdfs CLI, you get an auth:KERBEROS followed by a client auth:KERBEROS both for the principal of the logged-in user. libhdfs3 as it stands, does a successful auth:KERBEROS, followed by an attempted auth:KERBEROS via auth:PROXY , where the user principal is being used in the proxy. This is not normally allowed, only service accounts (like hdfs itself) are allowed to act on behalf of other users.

@bdrosen96
Copy link

Interesting. That should allow you to test possible fixes for this. We definitely do not want the principal to be in the proxy (not with the @realm) . I had tried a simple scala test case to try to see what the behavior was for UGI code, etc and to try to mimic this in C++, but that made things worse, not better.

@ghost
Copy link

ghost commented Dec 8, 2018

Hi, I have build this version of libhdfs3, use it to access hdfs, but when the privacy is set in both client and server, I get error log from namenode below, it seem that, when doing the handshake2, the client still select the AUTH QOP.
And, After some debugging, I get some info in SaslClient.cpp, this is the relevant code
std::string decoded = decode(copied_challenge.c_str(), copied_challenge.length(), true);
int qop = (int)decoded.c_str()[0];
this is the challenge get from SASL server in handshake1, the first byte should be 4, but it is 96, I am wondering what is the problem. Is there some library I get wrong or other problem, any hint will be very appreciated!!

DEBUG org.apache.hadoop.ipc.Server: javax.security.sasl.SaslException: Problem with callback handler [Caused by javax.security.sasl.SaslException: Client selected unsupported protection: 1]
at com.sun.security.sasl.gsskerb.GssKrb5Server.doHandshake2(GssKrb5Server.java:331)
at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:161)

@bdrosen96
Copy link

Did you also use the modified version of libgsasl: https://github.com/bdrosen96/libgsasl ?

@ghost
Copy link

ghost commented Dec 9, 2018

Thank you, I will try it now. Currently I am using official gsasl 1.8.0

@ghost
Copy link

ghost commented Dec 11, 2018

@bdrosen96 Hi, I have spent lot time building the gsasl1.8.1, try this libhdfs3 again and still get the same error. The hadoop version is 2.9.2, parameter of hadoop.rpc.protection is set to privacy in core-site.xml and hdfs-client.xml. The first byte of token received by sasl server in dohandshake2 is still 1, which represent auth method and cause error. Is there other things I miss, like some other special library?

@ghost
Copy link

ghost commented Dec 12, 2018

@bdrosen96 My fault, I haven't installed the gsasl1.8.1 properly, after correcting it and it works fine. Thank you very much.

@martindurant
Copy link
Collaborator Author

@Librago , please document exactly what you did so that it can be useful to others.

@ghost
Copy link

ghost commented Dec 13, 2018

@martindurant Of course, I am testing the apache hawq database, but the libhdfs3 client can not work with namenode when hadoop.rpc.protection is set to privacy. So I replace it with this downstream version. There are three problems I have confronted which may be useful to others.

First, the gsasl_step return GSASL_GSSAPI_INIT_SEC_CONTEXT_ERROR, namenode log have nothing about it: this is because I have set an entry 127.0.0.1 localhost in /etc/hosts. It turn the principal name of kerberos from username@hostname to localhost@hostname which is not registered in KDC. After change the entry to 127.0.0.1 localhost, it works fine.

Second, the namenode report "Client selected unsupported protection: 1", which I asked above, this is because the current official gsasl 1.8.0 is not support the privacy QOP. This repo is depended on libgsasl. Then I change the gsasl version as bdrosen96 suggested.

Third, the libgsasl 1.8.1 is depended on openssl1.0.2 and my version is openssl1.1.1 which is used in many other places. The encryption interface called by digest-md5-encode and decode is different in these version. Then I changed those interface in libgsasl1.8.1 and solved it. If you are fine with openssl1.0.2, it will be easy to change it.

Last, there are some other things worth to be metioned, some parameter like dfs.block.access.token.enable, hadoop.rpc.protection, dfs.data.transfer.protection is need to be set.
And since this interface
hdfsFS hdfsBuilderConnect(struct hdfsBuilder * bld, const char * effective_user=NULL);
is using parameter initialization which is a C++ style and is in hdfs.h, which caused hawq can not compile it. Since I don't use impersonation, I just remove it.

@martindurant
Copy link
Collaborator Author

So not simple, then... You may want to wrap this into a script somehow and post it.

@ghost
Copy link

ghost commented Mar 26, 2019

Sorry, I didn't notice your last reply, and I have confronted another problem. When the parameter in core-site.xml of hadoop is set to authentication+privacy, then this client can not work, the call stack and error is below,

HdfsRpcException: RPC channel to "testcirpm33kerberosbasictest3589229-1:9000" got protocol mismatch: RPC channel cannot find pending call: id = -33.
@ Hdfs::Internal::RpcChannelImpl::getPendingCall(int)
@ Hdfs::Internal::RpcChannelImpl::readOneResponse(bool)
@ Hdfs::Internal::RpcChannelImpl::checkOneResponse()
@ Hdfs::Internal::RpcChannelImpl::invokeInternal(std::__1::shared_ptrHdfs::Internal::RpcRemoteCall)
@ Hdfs::Internal::RpcChannelImpl::invoke(Hdfs::Internal::RpcCall const&)
@ Hdfs::Internal::NamenodeImpl::invoke(Hdfs::Internal::RpcCall const&)
@ Hdfs::Internal::NamenodeImpl::getFsStats()
@ Hdfs::Internal::NamenodeProxy::getFsStats()
@ Hdfs::Internal::FileSystemImpl::getFsStats()
@ Hdfs::Internal::FileSystemImpl::connect()
@ Hdfs::FileSystem::connect(char const*, char const*, char const*)
@ Hdfs::FileSystem::connect(char const*)

Do you know the reason? @martindurant @bdrosen96

@martindurant
Copy link
Collaborator Author

Totally beyond me, sorry.

@bdrosen96
Copy link

Possible values are authentication, integrity and privacy. authentication means authentication only and no integrity or privacy; integrity implies authentication and integrity are enabled; and privacy implies all of authentication, integrity and privacy are enabled.

So if you previously had privacy working, that should have included everything

@ghost
Copy link

ghost commented Mar 28, 2019

Yes, thanks, it should be work, this parameter should only affect the supported QOP choice responded by the SASL server. It should be other problems there.

@Qsimple
Copy link

Qsimple commented Mar 27, 2021

Hi, @bdrosen96 , I meet same issue with ghost. And find the libhdfs3 in https://github.com/erikmuttersbach/libhdfs3 don't support hadoop.rpc.protection=PRIVACY, it don't process the rpc response with wrap saslState.

@kgundamaraju
Copy link

I have integrated Apache HAWQ LibHdfs3 with my application and during testing I found that the KMS Client Provider in that repository doesn't support Kerberos based Authentication. If I understand it right, this repo seems to have support for this specific use case. Can someone please confirm that this is indeed the case. Also, is there any plan to merge changes in this repo back to the Apache HAWQ repo?

@bdrosen96
Copy link

I have not done anything with this stuff in several years, so I would have a hard time debugging things. There was an effort to get this merged into HAWQ back in 2017, but that seemed to have stalled before everything got in and I don't recall the current status or where it stalled out. I think some context may be in the comment history here - not sure if it has links to the PRs that did not make it in

@martindurant
Copy link
Collaborator Author

@kgundamaraju : Kerberos authentication was indeed known to work in one of these variants, but I'm no longer sure which one. It's PRIVACY mode that was the tricky thing (see the discussion above) - and since you mention KMS, I assume that's actually what you'll want in the end. Can you use webHDFS perhaps?

It proved too hard to match all the use cases here, when the reference implementation is poorly documented java. Since HDFS is available as a java JNI, and this was exposed by pyarrow for python, usage of libhdfs3 dropped off, and that's why this repo is dormant. Even fsspec no longer uses hdfs3, which was the very first fsspec-style implementation.

@kgundamaraju
Copy link

My sincere thanks to @bdrosen96 and @martindurant for your prompt responses. @martindurant , as you have correctly pointed out, the use case that I'd like to support is for my application to communicate with Hadoop KMS with Kerberos Authentication. This, if I understand it correctly, would require LibHdfs3 to support Kerberos HTTP SPNEGO Authentication, which currently it doesn't, not at least in the Apache HAWQ repository that I downloaded Libhdfs3 from. I also entered a defect in the Apache HAWQ JIRA Database (https://issues.apache.org/jira/browse/HAWQ-1791?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17318956#comment-17318956) and I did get the confirmation that there is indeed no support for this specific use case in the LibHdfs3 in the Apache HAWQ repo.

The reason why I am trying to use LibHdfs3 is because I found it to be much more performant than the Java JNI based libHdfs. Before I give up on this effort, could you please comment on whether the Kerberos HTTP SPNEGO Authentication was ever added to this downstream repo? If yes, then I would like to spend some time trying to understand how I can merge these changes as I have already invested a lot of time in trying to integrate LibHdfs3 with my application and testing Kerberos support to communicate with the Hadoop cluster itself. The only use case that is currently not working for me is the communication with the Hadoop KMS with Kerberos Authentication.

Thanks in advance.

@martindurant
Copy link
Collaborator Author

It's been too long, I don't remember for sure. Yes, Kerberos authentication worked, but I suspect it was not HTTP SPNEGO (use that for HTTP, like webHDFS!), but direct connections with kinit, etc. Authentication and secure encrypted communication are not the same thing.

From my vantage, libhdfs3 was only used with python hdfs3, so that was probably not the performance boost you were after anyway.

@bdrosen96
Copy link

HTTP SPNEGO was used for communication with KMS I believe. I added it to https://github.com/bdrosen96/libhdfs3 in this PR primarily: https://github.com/bdrosen96/libhdfs3/pull/15/files

I do not recall the current status of that code with respect to bringing it into HAWQ though. Hopefully the info in the linked PR (as well as the other PRs in the linked repo which set the state for that or which fixed bugs after that) will be enough info to help you try to make sense of what might be required.

@kgundamaraju
Copy link

Thanks again @martindurant and @bdrosen96. As @bdrosen96 has stated, the use case I was interested in also required making HTTP SPNEGO protocol, which is used between the HDFS Client and the Hadoop KMS, to work as well. @bdrosen96 , I will download this repo and try to figure out what portion of this code has been merged into the Apache HAWQ repo and which portion of this is missing. Many thanks for pointing me to this repo as well as the PR.

@michael1589
Copy link

michael1589 commented Apr 20, 2022

@bdrosen96, great appreciation to your work! libhdfs3 in clickhouse encountered the same problem(doesn't support hadoop.rpc.protection=privacy). Could you kindly contribute your code to https://github.com/ClickHouse/libhdfs3.git, or allow me to merge your code into ClickHouse/libhdfs3? There're several steps according to my understanding: 1. update clickhouse's libgsasl as yours; 2.cherry pick your PR into libhdfs3 of clickhouse. 3. modify clickhouse's cmake. Step 2 may be a challenge to me. I'm not sure how many PRs should be merged into clickhouse/libhdfs3. I think those PRs should kick off KMS related. Am I right?

@siwen-yu
Copy link

Sorry, I didn't notice your last reply, and I have confronted another problem. When the parameter in core-site.xml of hadoop is set to authentication+privacy, then this client can not work, the call stack and error is below,

HdfsRpcException: RPC channel to "testcirpm33kerberosbasictest3589229-1:9000" got protocol mismatch: RPC channel cannot find pending call: id = -33. @ Hdfs::Internal::RpcChannelImpl::getPendingCall(int) @ Hdfs::Internal::RpcChannelImpl::readOneResponse(bool) @ Hdfs::Internal::RpcChannelImpl::checkOneResponse() @ Hdfs::Internal::RpcChannelImpl::invokeInternal(std::__1::shared_ptrHdfs::Internal::RpcRemoteCall) @ Hdfs::Internal::RpcChannelImpl::invoke(Hdfs::Internal::RpcCall const&) @ Hdfs::Internal::NamenodeImpl::invoke(Hdfs::Internal::RpcCall const&) @ Hdfs::Internal::NamenodeImpl::getFsStats() @ Hdfs::Internal::NamenodeProxy::getFsStats() @ Hdfs::Internal::FileSystemImpl::getFsStats() @ Hdfs::Internal::FileSystemImpl::connect() @ Hdfs::FileSystem::connect(char const*, char const*, char const*) @ Hdfs::FileSystem::connect(char const*)

Do you know the reason? @martindurant @bdrosen96

Hi ghost! I've also encountered the same error in clickhouse/libhdfs3. Just as you described, I have also set 'hadoop.rpc.protection' in the 'core-site.xml' configuration file of my Hadoop environment to 'authentication+privacy,' but I don't actually know what it means. So, could you please tell me how I can quickly resolve this issue?

@martindurant
Copy link
Collaborator Author

So, could you please tell me how I can quickly resolve this issue?

Unfortunately no, as you can see from the long thread. The short answer may be: use pyarrow, which is now the default hdfs backend within fsspec.

@siwen-yu
Copy link

So, could you please tell me how I can quickly resolve this issue?

Unfortunately no, as you can see from the long thread. The short answer may be: use pyarrow, which is now the default hdfs backend within fsspec.

Are you suggesting that I should give up using ClickHouse to access HDFS?

@martindurant
Copy link
Collaborator Author

I can't speak for Clickhouse, but this repo no longer aspires to allow "privacy" mode and is no longer being developed.

@siwen-yu
Copy link

I can't speak for Clickhouse, but this repo no longer aspires to allow "privacy" mode and is no longer being developed.

Thank you very much for your patient response. I believe I should focus more on the issue of connecting ClickHouse with HDFS rather than the issue with this repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants