Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for copy local from file #128

Merged
merged 19 commits into from
Jan 19, 2024

Conversation

DMickens
Copy link
Collaborator

@DMickens DMickens commented Jan 9, 2024

*Added backend message parsers and frontend message serializers for all copy messages
*Added getRejectedRows() returning a private variable that is otherwise hidden from normal logging of a result object
*Added 64bit little-endian parser required for copy protocol messages
*Added handlers for all backend messages received, not all have actions associated with them for copy from local file
*Removed postgres replication protocol message (should have been removed awhile ago, but isn't hurting anything)
*Fixed ND test failure in query-tests.js
*Added test cases for copy from local file
*Other general refactoring

@CLAassistant
Copy link

CLAassistant commented Jan 9, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@williammatherjones williammatherjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting PR. My questions are more to help my understanding of the code. I don't see any problems and will go ahead and approve.

@DMickens DMickens force-pushed the support-for-copy-local branch from b35ef8c to d1ff86c Compare January 11, 2024 23:22
fileContents.push(this.reader.int64LE())
}
} else {
fileContents = this.reader.string(fileLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you read the whole file content from the wire into memory, and then write them into disk?
There is a risk of out-of-memory. python client does this in a loop: read a chunk of content from the wire and write it into the disk, then read the next chunk.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify to read file contents in chunks for safety

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like the server already sends Writefile messages containing rejects/exceptions in chunks of 8Kb. If it's larger than 8Kb there are multiple Writefile messages sent in a row. That seems small enough to me, only 8Kb needs to be in memory at a time before writing to disk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the python client reading chunks smaller than 8kb? Is that necessary?

We might want to update the protocol doc with that information, as it was a surprise to me too. I only discovered it by using the dc_client_server_messages table and seeing a string of WriteFile messages from the backend in a row. Each 8Kb in size except for the last which was just contained the remaining data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed an update that appends instead of writes over rejection/exception files to handle when there are multiple writeFile messages. I added a test as well to show that this is working

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation, I never know about this 8kb setting. I'll check with python client and update the protocol doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great. I did find this line in the guide:

"The backend should then send zero or more WriteFile messages for rejected row descriptions and an EndOfBatchResponse message for acknowledgement of data loading from this file."

So there is a small hint that there could be multiple messages but we probably could elaborate that, and possibly the diagram as well if you wanted to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is originally indicating if your set both rejection and exception files, there are 2 WriteFile messages sent for each file.

I verified that rejects/exceptions are sent in chunks of 8Kb. I also found that if the command uses the RETURNREJECTED parameters, there is no 8kb limitation.

My question is for extend_copy_reject_info=True (https://integrators.vertica.com/data-protocols-formats/frontend-backend/#writefile-o), is there a content size limitation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We modify contents but not the mechanism for message transfer to the frontend when extend_copy_reject_info=True, so I expect there is an 8kb limit. I would operate under that assumption, and I will test and verify this for the protocol doc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we have not discussed this elsewhere ( I can't remember anymore ) I have verified that the extend_copy_reject_info property has no impact on the size of the writefile message, it is still a max of 8Kb

@DMickens DMickens merged commit 12438af into vertica:master Jan 19, 2024
7 checks passed
@DMickens DMickens linked an issue Jan 30, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is copy local supported?
4 participants