Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up syncing to S3 #34

Open
gtaylor opened this issue Jul 8, 2011 · 8 comments
Open

Speed up syncing to S3 #34

gtaylor opened this issue Jul 8, 2011 · 8 comments

Comments

@gtaylor
Copy link
Contributor

gtaylor commented Jul 8, 2011

Perhaps investigate using multiprocessing to speed up syncing to S3? For projects with very large media sets to sync, this would be great.

@jcarbaugh
Copy link
Contributor

This is a great idea! My first attempt resulted in a deluge of pickling errors so I'll have to play around with the best way to accomplish this. Might result in a rewrite of the S3 client.

@robhudson
Copy link
Contributor

+1 :)

@jcarbaugh
Copy link
Contributor

Okay! I just committed a new s3async backend on the 3.0dev branch. I'd appreciate it if you could try it out and see how it works. It defaults to 4 workers, but you can change it by setting AWS_ASYNC_WORKERS in mediasync settings. Sync time for 262 files went from 35.648 seconds on the s3 backend to 7.550 seconds on the s3async backend with 8 workers. And just for fun, 16 workers synced in 4.770 seconds.

This is my first time using multiprocessing so please let me know if it's doing something weird that would slow it down.

@gtaylor
Copy link
Contributor Author

gtaylor commented Jul 11, 2011

Awesome, we'll get a few of our larger projects trying this tomorrow and will let you know how it goes.

@robhudson
Copy link
Contributor

In my testing I'm not seeing a lot of speedup... 4 workers was 67s and I tried 8, 32, then 50 and never got below 54s. Most of what's there is .js files so I'm wondering if the minifiers aren't part of the pool and everything stalls while minifying? At least that's what the output looks like -- it blazes past the jpgs and pngs and slows down on the css and js files.

@jcarbaugh
Copy link
Contributor

Well that's disappointing. Are you using the YUI compressor? It seems as if
the way the processor is implemented now, the JVM is going to have to be
restarted for each JavaScript file. Could that be adding the overhead that
would slow down the overall syncing process?

There are very few JS and CSS files in the project that I tested with so the
speedup was fairly significant.

The implementation right now is a bit hacky. Once a backend client has been
opened, it can no longer be pickled (issue with the boto S3 connection
object) so a new client has to be created in each worker thread. This should
have minimal overhead, but each worker creates a new S3 client to sync just
one file. I'd like to find a way around this.

On Mon, Jul 11, 2011 at 6:40 PM, robhudson <
[email protected]>wrote:

In my testing I'm not seeing a lot of speedup... 4 workers was 67s and I
tried 8, 32, then 50 and never got below 54s. Most of what's there is .js
files so I'm wondering if the minifiers aren't part of the pool and
everything stalls while minifying? At least that's what the output looks
like -- it blazes past the jpgs and pngs and slows down on the css and js
files.

Reply to this email directly or view it on GitHub:

#34 (comment)

@robhudson
Copy link
Contributor

Yes, using YUI Compressor. I think I may be the odd case (having more JS files than images)? I think I was expecting to see multiple JS files being crunched at the same time but it seems to just do one at a time? When I have time I may dig in a little more and try to help.

@gtaylor
Copy link
Contributor Author

gtaylor commented Jul 14, 2011

We just tried the branch out and our gunicorn+eventlet project hung up on any page requests. Not sure what's going on, will have to see what else changed when I have some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants