-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zenfs gc sekhar feature #4
base: master
Are you sure you want to change the base?
Conversation
ext_to_zone_map
UpdateMetadataAfterMerge() function
friend class ZenFSGCWorker;
Indentation
There was a typo
Fixed Misplaced unlock
No need for ext_to_zone_map, removing it
Changed return type from void to IOStatus
Added MoveValidDataToNewDestZone() function
…rn type and removed ext_to_zone_map related code. Modified UpdateMetadataAfterMerge() function to support IOStatus return type and removed ext_to_zone_map related code.
Changed void from IOStatus
Missing variable IOStatus
comment
Added ReadExtent() function declaration
Added definition of ReadExtent()
Modified MoveValidDataToNewDestZone()
Type cast to char*
TODO comment
Modified MoveValidDataToNewDestZone() and added helper ReadExtent()
fixed Slice param
@soumendus I have updated your code and formatted it. Please run |
int dont_read = 0; | ||
|
||
// Sort the Extent list in decreasing order. | ||
std::sort(extent_list.begin(), extent_list.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is okay to change the extent order of a file? File content is store sequentially as its order of extent. Say that we have three extents, 1: 1KB, 2: 1KB, 3: 2KB. When we read from the 2048 byte, we should begin at extent 3. After we sort the extents, the file content is changed. I have mistaken this extent list for file extent list. So I have another question: How to construct the contents of ZenFSGCWorker
? Do we have any plan on this?
|
||
// Store the new starting position for the extent | ||
// which will be later made persistent. | ||
new_start = zone_dst->wp_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there could be multiple threads appending data to a zone, how could we ensure that new_start
is really at zone_dst->wp_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there could be multiple threads appending data to a zone, how could we ensure that
new_start
is really atzone_dst->wp_
?
Yes that's a valid concern. We need to figure out a way to serialize that. I did not find a way to lock the zone and operate on it or maybe there is a way. Perhaps we need a bit more thought on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As far as I can see, in ZenFS, there is only one writable file in one zone. AllocateZone
will mark a zone as being written
after allocating, making this zone not being able to used by other files for write. Maybe allocating free zone in dst_zone_list
could solve this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As far as I can see, in ZenFS, there is only one writable file in one zone.
AllocateZone
will mark a zone asbeing written
after allocating, making this zone not being able to used by other files for write. Maybe allocating free zone indst_zone_list
could solve this issue.
Hmm. OK, we have to add that logic in the function GetDestZoneToMoveValidData() and the function should push zones to the dst_zone_list member variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As far as I can see, in ZenFS, there is only one writable file in one zone.
AllocateZone
will mark a zone asbeing written
after allocating, making this zone not being able to used by other files for write. Maybe allocating free zone indst_zone_list
could solve this issue.Hmm. OK, we have to add that logic in the function GetDestZoneToMoveValidData() and the function should push zones to the dst_zone_list member variable.
- For getting the list of destination zones, in the function GetDestZoneToMoveValidData(), we may need to call AllocateZone() with open_for_write_ set to false or write another version of AllocateZone() ex. AllocateZoneForGC() which sets that flag to false. Then in the function MoveValidDataToNewDestinationZone(), after we fetch the current parameters of the destination zones like wp_ (write pointer), we can set the flag open_for_write_ back to true and then start appending to that zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in https://github.com/bzbd/zenfs/blob/master/fs/zbd_zenfs.cc#L518, allocate a zone without setting open_for_write
to true may lead to some issues. The zone might be seen in "open but not allocated" state, and will be allocated to other files. Therefore, I don't think this would work.
ZoneFile* file_moved; | ||
file_moved = *zone_file_it; | ||
|
||
// What if the file is deleted before coming here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should lock the metadata and scan if there is any new records before updating metadata. This may require more careful design work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOStatus s;
is never assigned or modified throughout this function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should lock the metadata and scan if there is any new records before updating metadata. This may require more careful design work.
SyncFileMetadata() function calls the PersistRecord() function which holds lock before updating the metadata so I did not use any lock. Maybe you are right. I was thinking maybe we can remove the UpdateMetadataAfterMerge() completely and somehow figure out to update the metadata changes in the MoveValidDataToNewDestZone() function itself. It would have been less code and easier because why to have one more function. But to do that we need to get the reference to the files whose extents have been moved. From the files, its easier to get the reference to the extents but not sure how to get the reference of the files from the extents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. If possible, please update the design doc for new information.
std::vector<ZoneFile*> files_moved_to_dst_zone; | ||
|
||
std::atomic<uint64_t> | ||
total_residue_; // Is atomic necessary since only one thread at one time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this variable is not used in this patch?
// will have a new starting position. No need to | ||
// change the length of the extent as it will be the | ||
// same. | ||
ext->start_ = new_start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same lock issue here. When we are changing extent information, it is possible that some other operating is in the process of reading where the extent is. We need to apply some kind of lock throughout the GC process.
|
||
// The current zone cannot fit this extent because of lack | ||
// of space, so get the next zone from the dst_zone_list. | ||
zone_it++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider the case that there is not enough destination zone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider the case that there is not enough destination zone?
Yes, that possibility could arise. The function GetDestZoneToMoveValidData() should do that math and figure out how many zones are needed to fit all the extents(cold valid data) and called AllocateZone() for moving the valid cold data. If the desired amount of zones to fit all the cold valid data( or extents ) is not possible to allocate then it will send IOStatus error that "Not enough zones". In that case, the GC cannot progress. Maybe a slight design change can be done that we try to move only that much valid data, for which we have space(zones). Since this is a periodic background reclaim, in the next cycles, the remaining zones could be reclaimed. Just a thought, but this could make the implementation intricate.
ZoneFile* file_moved; | ||
file_moved = *zone_file_it; | ||
|
||
// What if the file is deleted before coming here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. If possible, please update the design doc for new information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good.
I've seen an issue in RocksDB facebook/rocksdb#8504 about |
I saw, 's == IOStatus::NoSpace()' has been used at several other places in the ZenFS. So it might be OK. For instance the following code in the file fs_zenfs.cc, 's == IOStatus::NoSpace()' has been used. |
No description provided.