-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for copy local from file #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting PR. My questions are more to help my understanding of the code. I don't see any problems and will go ahead and approve.
b35ef8c
to
d1ff86c
Compare
fileContents.push(this.reader.int64LE()) | ||
} | ||
} else { | ||
fileContents = this.reader.string(fileLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you read the whole file content from the wire into memory, and then write them into disk?
There is a risk of out-of-memory. python client does this in a loop: read a chunk of content from the wire and write it into the disk, then read the next chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will modify to read file contents in chunks for safety
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like the server already sends Writefile messages containing rejects/exceptions in chunks of 8Kb. If it's larger than 8Kb there are multiple Writefile messages sent in a row. That seems small enough to me, only 8Kb needs to be in memory at a time before writing to disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the python client reading chunks smaller than 8kb? Is that necessary?
We might want to update the protocol doc with that information, as it was a surprise to me too. I only discovered it by using the dc_client_server_messages table and seeing a string of WriteFile messages from the backend in a row. Each 8Kb in size except for the last which was just contained the remaining data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed an update that appends instead of writes over rejection/exception files to handle when there are multiple writeFile messages. I added a test as well to show that this is working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation, I never know about this 8kb setting. I'll check with python client and update the protocol doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great. I did find this line in the guide:
"The backend should then send zero or more WriteFile messages for rejected row descriptions and an EndOfBatchResponse message for acknowledgement of data loading from this file."
So there is a small hint that there could be multiple messages but we probably could elaborate that, and possibly the diagram as well if you wanted to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is originally indicating if your set both rejection and exception files, there are 2 WriteFile messages sent for each file.
I verified that rejects/exceptions are sent in chunks of 8Kb. I also found that if the command uses the RETURNREJECTED parameters, there is no 8kb limitation.
My question is for extend_copy_reject_info=True
(https://integrators.vertica.com/data-protocols-formats/frontend-backend/#writefile-o), is there a content size limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We modify contents but not the mechanism for message transfer to the frontend when extend_copy_reject_info=True, so I expect there is an 8kb limit. I would operate under that assumption, and I will test and verify this for the protocol doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we have not discussed this elsewhere ( I can't remember anymore ) I have verified that the extend_copy_reject_info property has no impact on the size of the writefile message, it is still a max of 8Kb
*Added backend message parsers and frontend message serializers for all copy messages
*Added getRejectedRows() returning a private variable that is otherwise hidden from normal logging of a result object
*Added 64bit little-endian parser required for copy protocol messages
*Added handlers for all backend messages received, not all have actions associated with them for copy from local file
*Removed postgres replication protocol message (should have been removed awhile ago, but isn't hurting anything)
*Fixed ND test failure in query-tests.js
*Added test cases for copy from local file
*Other general refactoring