Scipy lambda layer for 3.9 and 3.10 #360

explomind1 · 2023-08-07T23:18:58Z

Since the current AWS lambda layers doesnt support scipy only on 3.9 and above, it would be great if we could create an arn for scipy as well. Does anyone when will there be a aws layer for scipy for python 3.9 and 3.10

I have tried creating a custom layer for scipy that supports 3.9 o 3.10, however, it always gives C-extension error or says the the scipy module is broken when i try to create it from the cloud9 ide without numpy and then upload back to lambda. Moreover, it is not possible to add scipy from the cloud9 as well because it is above the mb limit that lambda can handle (the only way is to delete the numpy directories and scipy can be succesfully installed to lambda without any errors.

I would really appreciate it if anyone knows when will AWS will provide a aws layer just like in 3.7 or 3.8.

keithrozario · 2023-08-16T08:32:13Z

If you're asking about the office AWS layer, I don't really know.

We can try to add Scipy for 3.10 here, but we may run into the size MB limit, which is a hard limit that can't be worked around.

dschmitz89 · 2024-01-07T09:26:35Z

Scipy wheels are roughly 30-40 MB in size lately: https://github.com/scipy/scipy/releases/tag/v1.11.4 . Does that seem too much?

I would like to see if I can help out with this issue. As regular SciPy contributor, I am familiar with the scipy tooling and I use Lambda at my day job but I am pretty new to Lambda layer creation. Do you have old scripts for SciPy still lying around?

keithrozario · 2024-01-11T01:50:54Z

If someone would make a pull request to add these packages, then I'll merge them and automatically build :)

keithrozario · 2024-01-16T04:47:30Z

I tried building SciPy for Lambda, but currently it's size exceeds the accepted in Lambda.

Lambda has a limit of 50MB, and ScIPy size is above that (~57MB). Note this is the result of a pip install scipy ... which includes not just SciPy but numpy as well.

I will see if we can remove the cache files to resuce the size, but at the moment this is the size :(

keithrozario · 2024-01-16T04:50:09Z

Currently the output looks like this:

keithrozario · 2024-01-16T04:51:00Z

I will experiment with removing pycache directories --- and just keeping the pycache directories to see what happens.

rgommers · 2024-01-16T10:45:43Z

IIRC the size limit is 250 MB unzipped, rather than 50 MB on upload.

You can significantly cut down the size by deleting all the tests/ directories. Also, you probably don't need the 3 scipy/misc/*.dat test images and they are large. Deleting all that may cut the package size by ~25% or so.

It used to be possible to get numpy/scipy/pandas in a single layer. I'd be curious what the status is now.

keithrozario · 2024-01-22T10:19:56Z

Thanks, I'll check and see if it's possible. But there's a lot of bespoke effort that may be unsustainable.

The Lambda limit is 50MB zipped, and currently the total zipped size is bigger than that :(.

gpap-gpap · 2024-01-26T08:27:44Z

I am also interested in a scipy layer for 3.10+, and can't find a workaround for the size limit. I am not sure if you already do this but running something like find . | grep -E "(/tests$|__pycache__$|\.pyc$|\.pyo$)" | xargs rm -rf before zipping gets rid of files that are not needed in the layer. If that fails then all you can do is install submodules of scipy separately as needed which is not ideal

dschmitz89 · 2024-02-18T09:23:24Z

Friendly ping: was there any progress here? For the custom removal of code, is it possible to automatically inject such package specific code into the whole terraform build script?

keithrozario · 2024-02-29T15:54:38Z

If someone could modify the build function, that'd be much appreciated :). I think for now we can remove all pycache files to save space, that may help.

alexiskat · 2024-03-01T16:17:55Z

Not sure if this would help at all but this saved a lot of space when building the layer.

docker run -v "$PWD":/var/task "public.ecr.aws/sam/build-python3.9" /bin/sh -c "pip install -r requirements.txt --platform manylinux2014_x86_64 --implementation cp --python 3.9 --only-binary=:all: --upgrade --trusted-host pypi.org --trusted-host files.pythonhosted.org -t python/lib/python3.9/site-packages/; exit"

aperture147 · 2024-04-22T12:23:11Z

IIRC the size limit is 250 MB unzipped, rather than 50 MB on upload.

You can significantly cut down the size by deleting all the tests/ directories. Also, you probably don't need the 3 scipy/misc/*.dat test images and they are large. Deleting all that may cut the package size by ~25% or so.

It used to be possible to get numpy/scipy/pandas in a single layer. I'd be curious what the status is now.

Tested and it works. I also added --no-compile and delete all dist-info directory and now NumPy, SciPy and Pandas can be placed in a single layer. All of them take approx 195M, so I can have extra 50M for all of my imagination.

This is one of the most hilarious black magic I've ever seen.

keithrozario · 2024-04-24T13:14:51Z

Wow. I need to find someway to automate this. What does --no-compile do?

rgommers · 2024-04-24T15:08:41Z

@keithrozario we have just implemented proper support for this in NumPy, via "install tags" in the build system. Here is how to use it: numpy/numpy#26289 (comment). I'm planning to do the same for SciPy. It would come down to adding -Cinstall-args="--tags=runtime,devel,python-runtime" to your pip install (or pip wheel or python -m build) invocation in order to drop the test suite.

--no-compile is a pip flag: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-no-compile

That together should make all this a one-liner. It should work for NumPy now.

aperture147 · 2024-04-25T04:19:43Z

Wow. I need to find someway to automate this. What does --no-compile do?

It will not precompile python code into byte code during the install process. But the test suites are those which consuming a lot of megabytes, the byte code takes just a few megabytes at most.

My approach is summed up in this script:

# install cpython implementation only
pip install numpy pandas scipy --no-compile --implementation cp -t python

# remove all dist-info
rm -r *.dist-info

# delete all tests directories
find . | grep -E "*/tests$" | xargs rm -rf

# clean up python byte code if any
find . | grep -E "(/__pycache__$|\.pyc$|\.pyo$)" | xargs rm -rf

# Xoá cả pyproject vì không cần đến
find . | grep -E "pyproject.toml$" | xargs rm -rf

# delete unused .dat file which is deprecated since scipy 1.10
find . | grep -E "scipy\misc\*.dat$" | xargs rm -rf

Btw i think modifying the bundled source code is not a good practice tho.

aperture147 · 2024-04-25T04:31:39Z

@keithrozario we have just implemented proper support for this in NumPy, via "install tags" in the build system. Here is how to use it: numpy/numpy#26289 (comment). I'm planning to do the same for SciPy. It would come down to adding -Cinstall-args="--tags=runtime,devel,python-runtime" to your pip install (or pip wheel or python -m build) invocation in order to drop the test suite.

--no-compile is a pip flag: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-no-compile

That together should make all this a one-liner. It should work for NumPy now.

I don't get why numpy and scipy has their test suite in the wheel, when they don't contribute anything on the run process. I thought it was the sanity check in every import but it's just the test package during the meson build phase. It's bummer to rebuild numpy just to get rid of the test suite.

rgommers · 2024-04-25T07:16:10Z

@aperture147 that's historical. Once upon a time, many more users built from source. And then it was critical to be able to run tests with numpy.test() in order to diagnose all sorts of weird issues. Having tests in tests/ subfolders of the package used to be very common, maybe even the standard way for where to store tests.

For new projects started today, the test suite usually goes outside of the importable tree. Moving everything in numpy now though would be very disruptive, as it would (among other things) make all open PRs unmerge-able.

keithrozario · 2024-04-28T23:52:42Z

Thank you so much all. I'll look into this next week or so, and hopefully we can get a layer of scipy out!!!

I'm not sure how much of this is generic (can be applied to all packages) and how much is specific to scipy though. Will have to think a bit more.

aperture147 · 2024-04-29T12:14:49Z

AFAIK, SciPy and NumPy are safe to have tests directory removed. NumPy tests directory size is even larger than SciPy.

keithrozario · 2024-05-27T08:54:59Z

Test layer is here:

arn:aws:lambda:ap-southeast-1:367660174341:layer:Klayers-p312-scipy:1

We perform the --no-compile flag to reduce the .pyc and pycache files, and also delete all directories marked 'tests', as recommended by experts on this thread :)

Feel free to run some test on the layer. If all goes well, I'll push this before end of this week into production, and we'll have 'optimized' builds going forward.

keithrozario · 2024-05-27T08:55:24Z

I forgot to remove .dat and dist-info as well. That's up next.

rgommers · 2024-05-27T09:08:01Z

I'll need to write docs for it, but this command will already remove test data is well as some large-ish _test_xxx.so extension modules that live outside of the tests/ directories:

$ python -m build -wnx -Cinstall-args=--tags=runtime,python-runtime,devel

It's available in SciPy's main branch since a week ago (scipy/scipy#20712).

I forgot to remove .dat and dist-info as well. That's up next.

You probably want to keep .dist-info. It's actually a functional part of a package, e.g. importlib.metadata uses it. And the license file is mandatory to keep when you're redistributing. .dist-info is also small, ~100 kb or so. If you really need to shave things off, I'd only remove RECORD since it's both the largest file and a not very important one within a Lambda layer.

keithrozario · 2024-05-27T23:31:38Z

Thanks. Unfortunately, I do not build the package from source, I merely pip install.

Will take your comment on keeping the dist-info, but I'll see if I can identify any _test_xxx.so files to be removed as well.

rgommers · 2024-05-28T11:03:58Z

I think this is the full list:

$ ls -l build/scipy/*/*.so | rg test
-rwxr-xr-x 1 rgommers rgommers    28664 17 mei 18:04 build/scipy/integrate/_test_multivariate.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers   270968 17 mei 18:04 build/scipy/integrate/_test_odeint_banded.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers   151456 17 mei 18:04 build/scipy/io/_test_fortran.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers    52912 17 mei 18:04 build/scipy/_lib/_test_ccallback.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers   158752 21 mei 13:45 build/scipy/_lib/_test_deprecation_call.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers    92272 21 mei 13:45 build/scipy/_lib/_test_deprecation_def.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers    31336 17 mei 18:04 build/scipy/ndimage/_ctest.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers   386480 21 mei 13:45 build/scipy/ndimage/_cytest.cpython-312-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers  1095216 21 mei 13:45 build/scipy/special/_test_internal.cpython-312-x86_64-linux-gnu.so

keithrozario · 2024-05-28T12:40:23Z

thanks -- the challenge from Klayers at least is that we need to make the script generic. I'm very hesitant to include package specific build steps for something like scipy, because maintaining that going forward would be difficult.

Although it sounds OK, deleting something like every file that meets the _test*.so might cause issues with other packages, but i would say the probability that someone has a runtime required .so file that begins with _test is very low.

Still pondering. Wonder what others are thinking.

dschmitz89 · 2024-05-28T15:00:45Z

Yep, this would be a nightmare to maintain in the long run.

I would be interested to test it out on a fork of this repo though without making a PR to your main repo. Any chance we can make that work?

aperture147 · 2024-06-02T15:38:56Z

You could try adding some specific script for specific library, like adding a file called scipy.sh to customize installation (by deleting unwanted files), then whenever you install scipy, you can check that is there any scipy.sh exists in the repo, if there is than use scipy.sh instead of plain pip install scipt to install to the layer.

I noticed that scipy and numpy are using GFortran and OpenBLAS, but both scipy and numpy are using a slightly different version of GFortran and OpenBLAS, which is separatedly stored as .so files in numpy.lib and scipy.lib directory. I'm thinking that there could be a way to make scipy and numpy use the same GFortran and OpenBLAS library, then we could save about 25M of size. Is there anyway to achieve this @rgommers? I'm not a guru on building static-linked library, especially using mesos build system. If we build this layer on amazonlinux2 and dynamically link some libraries which is already exists in the environment then we can shrink the layer even more.

aperture147 · 2024-06-02T15:40:39Z

Test layer is here:

arn:aws:lambda:ap-southeast-1:367660174341:layer:Klayers-p312-scipy:1

We perform the --no-compile flag to reduce the .pyc and pycache files, and also delete all directories marked 'tests', as recommended by experts on this thread :)

Feel free to run some test on the layer. If all goes well, I'll push this before end of this week into production, and we'll have 'optimized' builds going forward.

I can notice that having python byte-code truncated will increase cold start time. Should we keep those to reduce the cold start time or it's just me fiddling too much with the layer?

rgommers · 2024-06-02T15:46:50Z

I'm thinking that there could be a way to make scipy and numpy use the same GFortran and OpenBLAS library, then we could save about 25M of size. Is there anyway to achieve this @rgommers? I'm not a guru on building static-linked library, especially using mesos build system. If we build this layer on amazonlinux2 and dynamically link some libraries which is already exists in the environment then we can shrink the layer even more.

Not really when building the layer from wheels published to PyPI. NumPy uses 64-bit (ILP64) OpenBLAS, while SciPy uses 32-bit (LP64). We have a long-term plan to unify these two builds, but PyPI/wheels make this very complex. I would not recommend doing manual surgery here.

keithrozario · 2024-06-02T23:26:17Z

Test layer is here:

arn:aws:lambda:ap-southeast-1:367660174341:layer:Klayers-p312-scipy:1

We perform the --no-compile flag to reduce the .pyc and pycache files, and also delete all directories marked 'tests', as recommended by experts on this thread :)

Feel free to run some test on the layer. If all goes well, I'll push this before end of this week into production, and we'll have 'optimized' builds going forward.

I can notice that having python byte-code truncated will increase cold start time. Should we keep those to reduce the cold start time or it's just me fiddling too much with the layer?

Yes. Do you know how much slower the cold start time. Python will need to convert the .py into byte code, and that will incur some latency. For big packages this might be a lot, but not sure.

aperture147 · 2024-06-04T02:26:06Z

Yes. Do you know how much slower the cold start time. Python will need to convert the .py into byte code, and that will incur some latency. For big packages this might be a lot, but not sure.

Normally it only takes about 500ms to 1s to warm up the lambda, but now it takes 2s+ (sometimes up to 5s+ if I import all numpy, scipy and pandas) to turn it up (tested on 1024GiB RAM python 3.10 lambda function). It's bytecode compilation problem or it's just me doing too much surgeries on the layer.

keithrozario · 2024-06-04T23:20:34Z

No it's probably bytecode compilation. Let me think about this a bit more. Bytecode is major version specific, so should be shareable across functions even if the runtime is upgraded.

But bytecode also takes space, we have to trade off between space considerations and speed considerations. Nothing will work for everyone -- so my thoughts are to remove bytecode only if the package is large.

keithrozario · 2024-06-06T01:22:25Z

I love this conversation. I did a test today using just numpy. Comparing a layer that had __pycache__ vs a layer that didn't have __pycache__ on a 128MB function using Python 3.12

The findings:

With __pycache__ init times were: 635ms, 593ms, 637ms
Without __pycache__ init times were: 677ms, 684ms, 708ms

Which suggest a ~50ms time penalty for compiling from py into .pyc. I think unless the package is huge (numpy is quite big already) you won't see any discernible performance gain. I think if you tweak the lambda settings like memory size, that performance difference would shrink even further.

Given this, if you're importing something like boto3, or requests, the difference is so little nobody will notice if the cache is included or not. For the larger packages like numpy and scipy, most (not all) will want to optimize for space, so that their own code or additional layers can be larger. Defaulting to removing pycache seems to be a logical decision.

So right now, we will remove .pyc files from all layers moving forward. Again, will not meet 100% of the requirements from everyone, but will meet the majority of users for the majority of times. Let me know your thoughts below.

Does that mean I can remove the need for separate packages for different versions of python??? Interesting....!!

explomind1 assigned keithrozario Aug 7, 2023

dschmitz89 mentioned this issue Jan 14, 2024

Add scipy #384

Merged

dschmitz89 mentioned this issue Jan 16, 2024

BLD:RfC: nanobind as a possible dependency? scipy/scipy#19878

Open

rgommers mentioned this issue May 21, 2024

ENH/BLD: Add install tags for tests scipy/scipy#20712

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scipy lambda layer for 3.9 and 3.10 #360

Scipy lambda layer for 3.9 and 3.10 #360

explomind1 commented Aug 7, 2023

keithrozario commented Aug 16, 2023

dschmitz89 commented Jan 7, 2024

keithrozario commented Jan 11, 2024

keithrozario commented Jan 16, 2024 •

edited

Loading

keithrozario commented Jan 16, 2024

keithrozario commented Jan 16, 2024

rgommers commented Jan 16, 2024

keithrozario commented Jan 22, 2024

gpap-gpap commented Jan 26, 2024

dschmitz89 commented Feb 18, 2024

keithrozario commented Feb 29, 2024

alexiskat commented Mar 1, 2024

aperture147 commented Apr 22, 2024 •

edited

Loading

keithrozario commented Apr 24, 2024

rgommers commented Apr 24, 2024

aperture147 commented Apr 25, 2024 •

edited

Loading

aperture147 commented Apr 25, 2024

rgommers commented Apr 25, 2024

keithrozario commented Apr 28, 2024

aperture147 commented Apr 29, 2024

keithrozario commented May 27, 2024

keithrozario commented May 27, 2024

rgommers commented May 27, 2024

keithrozario commented May 27, 2024

rgommers commented May 28, 2024

keithrozario commented May 28, 2024

dschmitz89 commented May 28, 2024 •

edited

Loading

aperture147 commented Jun 2, 2024

aperture147 commented Jun 2, 2024

rgommers commented Jun 2, 2024

keithrozario commented Jun 2, 2024

aperture147 commented Jun 4, 2024

keithrozario commented Jun 4, 2024

keithrozario commented Jun 6, 2024 •

edited

Loading

Scipy lambda layer for 3.9 and 3.10 #360

Scipy lambda layer for 3.9 and 3.10 #360

Comments

explomind1 commented Aug 7, 2023

keithrozario commented Aug 16, 2023

dschmitz89 commented Jan 7, 2024

keithrozario commented Jan 11, 2024

keithrozario commented Jan 16, 2024 • edited Loading

keithrozario commented Jan 16, 2024

keithrozario commented Jan 16, 2024

rgommers commented Jan 16, 2024

keithrozario commented Jan 22, 2024

gpap-gpap commented Jan 26, 2024

dschmitz89 commented Feb 18, 2024

keithrozario commented Feb 29, 2024

alexiskat commented Mar 1, 2024

aperture147 commented Apr 22, 2024 • edited Loading

keithrozario commented Apr 24, 2024

rgommers commented Apr 24, 2024

aperture147 commented Apr 25, 2024 • edited Loading

aperture147 commented Apr 25, 2024

rgommers commented Apr 25, 2024

keithrozario commented Apr 28, 2024

aperture147 commented Apr 29, 2024

keithrozario commented May 27, 2024

keithrozario commented May 27, 2024

rgommers commented May 27, 2024

keithrozario commented May 27, 2024

rgommers commented May 28, 2024

keithrozario commented May 28, 2024

dschmitz89 commented May 28, 2024 • edited Loading

aperture147 commented Jun 2, 2024

aperture147 commented Jun 2, 2024

rgommers commented Jun 2, 2024

keithrozario commented Jun 2, 2024

aperture147 commented Jun 4, 2024

keithrozario commented Jun 4, 2024

keithrozario commented Jun 6, 2024 • edited Loading

keithrozario commented Jan 16, 2024 •

edited

Loading

aperture147 commented Apr 22, 2024 •

edited

Loading

aperture147 commented Apr 25, 2024 •

edited

Loading

dschmitz89 commented May 28, 2024 •

edited

Loading

keithrozario commented Jun 6, 2024 •

edited

Loading