Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved detection of files and directories in GoogleStorageMixin #129

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jesteria
Copy link

@jesteria jesteria commented May 7, 2019

filebrowser_safe performed oddly against Google Cloud Storage,
inappropriately considering files to be directories and vice-versa.

the changeset handles edge cases in isfile() and correctly iterates
the file+dir listing returned by listdir().

RandomJo and others added 8 commits January 30, 2019 15:58
The listdir function returns two values, one for directory names and one
for file names. These are lists of strings and thus do not have a delete
method. Instead of trying to delete each array, we loop through them to
build keys that we can delete.
…or_gcs

Fixes issues with old methods in GoogleStorageMixin
Fix rmtree method of S3BotoStorageMixin
…ication

Fix cs specification that is breaking compilemessages command
Fix issues with old methods in GoogleStorageMixin
filebrowser_safe performed oddly against Google Cloud Storage,
inappropriately considering files to be directories and vice-versa.

the changeset handles edge cases in ``isfile()`` and correctly iterates
the file+dir listing returned by ``listdir()``.
@jesteria
Copy link
Author

jesteria commented May 7, 2019

(Build errors appear unrelated.)

Copy link
Author

@jesteria jesteria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned this up a bit and left explanatory comments.

@@ -125,7 +126,7 @@ def rmtree(self, name):
class GoogleStorageMixin(StorageMixin):

def isfile(self, name):
return self.exists(name)
return bool(name) and self.exists(name)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by relying only on exists(), the method isfile() inappropriately considered the bucket root to be a file.

filebrowser_safe/storage.py Outdated Show resolved Hide resolved

# Check whether the iterator is empty
# Check for contents
for item in dirlist:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dirlist returned by listdir() is a tuple of (dirs, files), so the iteration check was previously not testing anything about the listing -- (it would always return True).

Copy link
Author

@jesteria jesteria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…And on a second pass-through, I rediscovered issues with file/dir detection with GCS, and worked to ameliorate the huge performance hit that the filebrowser takes with (this) cloud storage.

This discussion reveals some of the issue with detecting directories on GCS. The last couple comments in particular refer to the same issue I ran into – that even extant blobs might in fact just be fakes inserted by the web console, to look like "folders."

And the performance tweaks brought down browser directory listing times significantly – for example, one directory that took about 5 seconds for browse to return now takes 2.3 seconds. (This is still much slower than before. And another directory still takes 7 seconds, which is probably going to be a problem. But, it's better than it was.)

@@ -47,7 +47,7 @@ def __len__(self):

@cached_property
def filetype(self):
if self.is_folder:
if not default_storage.type_checks_slow and self.is_folder:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the browse view performs certain checks – which with GCS are now network calls – over and over, for each object in the directory listing. this one in particular seemed unnecessary, (especially when GCS's concept of a "folder" is so uncertain as it is).

@@ -98,7 +98,9 @@ def browse(request):
raise ImproperlyConfigured(_("Error finding Upload-Folder. Maybe it does not exist?"))
redirect_url = reverse("fb_browse") + query_helper(query, "", "dir")
return HttpResponseRedirect(redirect_url)
abs_path = os.path.join(get_directory(), path)

base_dir = get_directory()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_directory() also requires a network call, and simply holding onto its result, here, made a significant difference.

@jerivas jerivas force-pushed the master branch 2 times, most recently from e04a9c1 to c7eceb0 Compare August 12, 2020 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants