Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap: add support for devdax and block devices #189

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

uarif1
Copy link

@uarif1 uarif1 commented Feb 14, 2022

metadata.len() returns 0 when trying to mmap files such as block devices
and devdax (character devices) that are not regular files, hence returning
MappingPastEof even if the mapping would fit at the provided file_offset.

This patch adds support for checking the size of devdax and block devices,
and returns a new error, InvalidFileType, if the mmap being created is not
for a regular file, block or devdax device, or if the size the devices
couldn't be found in sysfs.

Usecase:
Devdax and block devices can be used in cloud-hypervisor as memory-zone.
MmapRegion::build from vm-memory is called while creating a GuestRegionMmap
for the VM memory-zone.

Reviewed-by: Fam Zheng [email protected]
Signed-off-by: Usama Arif [email protected]

@uarif1 uarif1 force-pushed the mmap_devdax_block branch 3 times, most recently from ee29eed to b32e080 Compare February 14, 2022 14:57
metadata.len() returns 0 when trying to mmap files such as block
devices and devdax (character devices) that are not regular files,
hence returning MappingPastEof even if the mapping would fit at the
provided file_offset.

This patch adds support for checking the size of devdax and block
devices, and returns a new error, InvalidFileType, if the mmap being
created is not for a regular file, block or devdax device, or if the
size the devices couldn't be found in sysfs.

Usecase:
Devdax and block devices can be used in cloud-hypervisor as
memory-zone. MmapRegion::build from vm-memory is called while
creating a GuestRegionMmap for the VM memory-zone.

Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
@jiangliu
Copy link
Member

The implementation is linux specific and not generic enough.
For devdax device, it return 0 for metadata(). How about seek()? Suppose file.seek(SeekFrom::Start(mmap_size)) should work as expect and is generic enough:)

@uarif1
Copy link
Author

uarif1 commented Feb 15, 2022

Hi,

Thanks for the reply!

check_file_offset is already unix specific, so i thought it calling another unix function calculate_file_size wouldn't be an issue?

For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so i dont think this implementation would work.

Another issue is as it says in https://doc.rust-lang.org/stable/std/io/trait.Seek.html#tymethod.seek
"A seek beyond the end of a stream is allowed, but behavior is defined by the implementation."
so maybe some block devices could just wrap around and give a Result eventhough mmap_size might be bigger than seek size?

Also i guess this would make mmap much slower if we seek rather than just read from sysfs? and also would be dependent on the charachter/block device file_operations actually supporting seek.

I can move check_file_offset and calculate_file_size to mmap_unix.rs. I think that would make it better?

Thanks,
Usama

@jiangliu
Copy link
Member

Hi,

Thanks for the reply!

check_file_offset is already unix specific, so i thought it calling another unix function calculate_file_size wouldn't be an issue?

For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so i dont think this implementation would work.

Another issue is as it says in https://doc.rust-lang.org/stable/std/io/trait.Seek.html#tymethod.seek "A seek beyond the end of a stream is allowed, but behavior is defined by the implementation." so maybe some block devices could just wrap around and give a Result eventhough mmap_size might be bigger than seek size?

Also i guess this would make mmap much slower if we seek rather than just read from sysfs? and also would be dependent on the charachter/block device file_operations actually supporting seek.

I can move check_file_offset and calculate_file_size to mmap_unix.rs. I think that would make it better?

Thanks, Usama

I means the code between line 128-line 160 are linux/block specific. Actually we have encountered similar case, but the device is a special memory device instead of block/dax device. So I hope we could handle them in a generic way.
Something like seek() and read() or pread() should work.

@uarif1
Copy link
Author

uarif1 commented Feb 15, 2022

Hi,
Thanks for the reply!
check_file_offset is already unix specific, so i thought it calling another unix function calculate_file_size wouldn't be an issue?
For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so i dont think this implementation would work.
Another issue is as it says in https://doc.rust-lang.org/stable/std/io/trait.Seek.html#tymethod.seek "A seek beyond the end of a stream is allowed, but behavior is defined by the implementation." so maybe some block devices could just wrap around and give a Result eventhough mmap_size might be bigger than seek size?
Also i guess this would make mmap much slower if we seek rather than just read from sysfs? and also would be dependent on the charachter/block device file_operations actually supporting seek.
I can move check_file_offset and calculate_file_size to mmap_unix.rs. I think that would make it better?
Thanks, Usama

I means the code between line 128-line 160 are linux/block specific.

Yes but this should not be an issue as this code is part of calculate_file_size which is just called by check_file_offset which is itself linux specific. check_file_offset is only called in the mmap build in mmap_unix so it should be ok being linux specific?

Actually we have encountered similar case, but the device is a special memory device instead of block/dax device. So I hope >we could handle them in a generic way. Something like seek() and read() or pread() should work.

For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so this implementation would not work. Trying seek will only give a nop and that would cause the seek to always pass, no matter the size of the devdax.

Also even if seek is supported in other devices, there could be other issues as mentioned above as seek beyond the end of a stream is allowed, but behavior is defined by the implementation. Seek would probably also be slower than just reading from sysfs?

@famz
Copy link

famz commented Feb 15, 2022

Since the later mmap() will return an error anyway if we get a wrong size, maybe we can just do a best effort check here for regular files or maybe block devices only (with BLKGETSIZE64 ioctl), and ignore char devices and defer it to the actual mmap()? IIUC there isn't much user experience difference whether or not we do this size pre-check, is there?

@uarif1
Copy link
Author

uarif1 commented Feb 15, 2022

If the BLKGETSIZE64 ioctl is linux specific as well and wont work in other unix OS, maybe we just do below and let mmap return an error later if the devdax/block device size is not correct as suggested by Fam?

`
pub fn check_file_offset(
file_offset: &FileOffset,
size: usize,
) -> result::Result<(), MmapRegionError> {
let file = file_offset.file();
let start = file_offset.start();

if let Some(end) = start.checked_add(size as u64) {
    if let Ok(metadata) = file.metadata() {
        if metadata.is_file() {
            if metadata.len() < end {
                return Err(MmapRegionError::MappingPastEof);
            }
        }
    }
} else {
    return Err(MmapRegionError::InvalidOffsetLength);
}

Ok(())

}

`

@uarif1
Copy link
Author

uarif1 commented Feb 15, 2022

Or we could also do something like below as well, the file.seek(SeekFrom::Start((size as u64))) will always pass for devdax as it doesnt support seek, but maybe for other devices it is better. I guess this is what you meant @jiangliu ?

diff --git a/src/mmap.rs b/src/mmap.rs
index 3f83516..c342534 100644
--- a/src/mmap.rs
+++ b/src/mmap.rs
@@ -15,7 +15,9 @@
 use std::borrow::Borrow;
 use std::error;
 use std::fmt;
-use std::io::{Read, Write};
+use std::io::{Read, Seek, SeekFrom, Write};
+#[cfg(unix)]
+use std::os::unix::fs::FileTypeExt;
 use std::ops::Deref;
 use std::result;
 use std::sync::atomic::Ordering;
@@ -116,13 +118,28 @@ pub fn check_file_offset(
     file_offset: &FileOffset,
     size: usize,
 ) -> result::Result<(), MmapRegionError> {
-    let file = file_offset.file();
+    let mut file = file_offset.file();
     let start = file_offset.start();
 
     if let Some(end) = start.checked_add(size as u64) {
         if let Ok(metadata) = file.metadata() {
-            if metadata.len() < end {
-                return Err(MmapRegionError::MappingPastEof);
+            let file_type = metadata.file_type();
+
+            if file_type.is_file() {
+                if metadata.len() < end {
+                    return Err(MmapRegionError::MappingPastEof);
+                }
+            } else if file_type.is_block_device() || file_type.is_char_device() {
+                // if seek is not supported by the device, this will always return Ok
+                if let Err(_) = file.seek(SeekFrom::Start((size as u64))) {
+                    return Err(MmapRegionError::MappingPastEof);
+                }
+                // Rewind file to the start after seek.
+                if let Err(_) = file.rewind() {
+                    return Err(MmapRegionError::InvalidFileType);
+                }
+            } else {
+                return Err(MmapRegionError::InvalidFileType);
             }
         }
     } else {
diff --git a/src/mmap_unix.rs b/src/mmap_unix.rs
index 2f1a744..6779aaf 100644
--- a/src/mmap_unix.rs
+++ b/src/mmap_unix.rs
@@ -29,6 +29,8 @@ pub enum Error {
     InvalidOffsetLength,
     /// The specified pointer to the mapping is not page-aligned.
     InvalidPointer,
+    /// The specified file type is invalid.
+    InvalidFileType,
     /// The specified size for the region is not a multiple of the page size.
     InvalidSize,
     /// The forbidden `MAP_FIXED` flag was specified.
@@ -56,6 +58,7 @@ impl fmt::Display for Error {
                 f,
                 "The specified size for the region is not a multiple of the page size",
             ),
+            Error::InvalidFileType => write!(f, "The specified file type is invalid"),
             Error::MapFixed => write!(f, "The forbidden `MAP_FIXED` flag was specified"),
             Error::MappingOverlap => write!(
                 f,

@famz
Copy link

famz commented Feb 18, 2022

@jiangliu any suggestions? Thanks.

@uarif1
Copy link
Author

uarif1 commented Feb 22, 2022

Hi,

To summarize i believe there are 2 options available:

  • Wrap the current metadata.len() test in "if file_type.is_file()" and do no check for block and devdax devices. mmap will fail later for these devices if they are not of the right size so it will still be ok.
  • Wrap the current metadata.len() test in "if file_type.is_file()" while for block and character devices, seek and check if the mmap size is valid as suggested by @jiangliu , this might pass for devices that implement seek file operation, but thats still ok as mmap will fail if the device size size not correct. This is in the diff in mmap: add support for devdax and block devices #189 (comment)

I am happy with either of these approaches and can change the commit to it. Which one do you prefer @jiangliu ?
Happy to try other ideas as well.

@uarif1
Copy link
Author

uarif1 commented Mar 14, 2022

Hi, I just wanted to check if there were any opinions on the above 2 options or any other ideas? Thanks! @jiangliu @bonzini @alexandruag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants