Lambda kept breaking down and became unreliable the more it ran with lastest image. #246

Saurabh-Mudgal · 2024-05-14T09:27:15Z

The image works fine initially but the more I run it (aka call the Lambda), the more often do I get this error back:

disconnected: Unable to receive message from renderer\n  (failed to check if window was closed: disconnected: not connected to DevTools)\n  (Session info: chrome=124.0.6367.207)\nStacktrace:

Dockerfile

FROM umihico/aws-lambda-selenium-python:latest

COPY main.py ./
COPY status.py ./
RUN pip install requests
CMD [ "main.lambda_handler" ]

status.py is just wrappers for returning appropriate status codes. My only dependencies are selenium and requests.

main.py

Here is the relevant code calling selenium:

class HtmlToPdf:
    @staticmethod
    def _to_html(shipping_label: str) -> str:
        try:
            shipping_bytes = base64.b64decode(shipping_label)
            if str(shipping_bytes).startswith('b\'%PDF'):
                raise AlreadyPDFException
            return shipping_bytes.decode()
        except (binascii.Error, ValueError, UnicodeDecodeError):
            return shipping_label

    @staticmethod
    def _html_to_uri(html_string: str):
        return "data:text/html;charset=utf-8," + quote(html_string)

    @staticmethod
    def _get_driver():
        user_data_dir = mkdtemp()
        data_path = mkdtemp()
        disk_cache_dir = mkdtemp()
        selenium_dir = "/tmp/selenium"
        if not os.path.exists(selenium_dir):
            os.mkdir(selenium_dir)

        options = webdriver.ChromeOptions()
        service = webdriver.ChromeService("/opt/chromedriver")

        options.binary_location = '/opt/chrome/chrome'
        options.add_argument("--headless=new")
        options.add_argument('--no-sandbox')
        options.add_argument("--disable-gpu")
        options.add_argument("--window-size=1280x1696")
        options.add_argument("--single-process")
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument("--disable-dev-tools")
        options.add_argument("--no-zygote")
        options.add_argument(f"--user-data-dir={user_data_dir}")
        options.add_argument(f"--data-path={data_path}")
        options.add_argument(f"--disk-cache-dir={disk_cache_dir}")
        options.add_argument("--remote-debugging-port=9222")
        options.add_argument(f"--homedir={selenium_dir}")



        chrome = webdriver.Chrome(options=options, service=service)

        return chrome

    @classmethod
    def selenium_converter(cls, b64_html: str) -> str:
        """
        Converts a base64 encoded HTML string into a base65 encoded PDF string
        :param b64_html: Base64 encoded HTML string
        :return: Base64 encoded PDF string
        """
        try:
            html = cls._to_html(b64_html)
        except AlreadyPDFException:
            return b64_html

        html_uri = cls._html_to_uri(html)
        driver = cls._get_driver()

        # Navigate to the HTML page
        driver.get(html_uri)

        # Wait for the page to fully load (adjust the timeout as needed)
        driver.implicitly_wait(3)

        # Save the page as PDF
        pdf_bytes = driver.execute_cdp_cmd("Page.printToPDF", {"landscape": False})

        # Close the WebDriver
        driver.quit()

        return pdf_bytes['data']
    
def lambda_handler(event, context):
    try:
        body = event.get('body')
        if body is not None:
            try:
                body = json.loads(body, use_decimal=True)
                bs64_encoded_html = body['html']
            except ValueError:
                raise status.Base400Exception('Invalid json received.')
            except KeyError:
                raise status.Base400Exception('No HTML was provided')
        else:
            raise status.Base400Exception('No body provided.')

        b64_pdf = HtmlToPdf().selenium_converter(bs64_encoded_html)

        if not b64_pdf:
            raise status.Base500Exception('PDF could not be converted')

        raise status.Success(
            {
                "message": "PDF conversion was successful",
                "data": {
                    "pdf": b64_pdf,
                    "type": "base64_encoded",
                }
            }
        )
    except status.Success as response:
        return response.json()
    except status.Base500Exception as response:
        return response.json()
    except status.Base400Exception as response:
        return response.json()
    except Exception as e:
        return status.Base500Exception(f"Something unexpected happened: {e}").json()

Lambda Configs

Architecture: x86_64
Memory: 1024MB
Ephemeral Storage: 512MB

For context, I would call the lambda in a loop of 5000. It would return the aforementioned error 4 times. When I do that again without changing the image, it returns error 8-9 times. Then 20+ times and so on.

This goes away and resets when I deploy the (same) image again.

What I have tried so far

Deleting the folders created by mkdtmemp()
Deleting the /tmp/ folder as cleanup at end of function execution

I appreciate any help and advice. Thank you!

The text was updated successfully, but these errors were encountered:

alfasareekkan · 2024-05-21T17:41:29Z

Hi @Saurabh-Mudgal , did you able resolve the problem, i have the same issue

Saurabh-Mudgal · 2024-05-22T09:49:14Z

Hi @Saurabh-Mudgal , did you able resolve the problem, i have the same issue

I have not identified the root cause yet. However, increasing the ephemeral storage to 1024MB works as a band-aid fix for me.

Any other information or solutions are much appreciated.

chrismaille · 2024-10-04T14:23:36Z

Same issue here. Something related to --single-process option in chrome options

umihico · 2024-10-06T14:27:42Z

How about setting ephemeralStorageSize in serverless.yml, for example 1024? (The default is 512MB)

My playwright (not selenium) script get unstable sometime, but it fixed the issue.

chrismaille · 2024-10-07T19:03:03Z

I'm using ephemeralStorageSize=2048 on my lambda - changing these settings, unfortunately, doesn't have any effect.

I solved the problem by reverting the image to umihico/aws-lambda-selenium-python:3.12.1, which is the last one I tested without the error.

To give more context, the error happened to me, when I selected a link with a PDF and instead of the browser starting the download it started to open a second window to render the PDF viewer using an element <embed>. Any command from selenium, trying to access the second window gives the disconnection error.

The image 3.12.1 is the last one I tested which chrome still downloads the PDF directly. So I no longer have the disconnection errors with the browser.

Hope this helps.

JoLBree · 2024-10-10T02:34:35Z

I'm also using ephemeralStorageSize=2048 and had this issue as well, worked around it by configuring the lambda to retry a couple of times. I haven't tried a downgrade but good to know that might help.

To try to avoid umihico/docker-selenium-lambda#246

jo-room pushed a commit to jo-room/job-scrape that referenced this issue Nov 1, 2024

Downgrade docker image version

95d475e

To try to avoid umihico/docker-selenium-lambda#246

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lambda kept breaking down and became unreliable the more it ran with lastest image. #246

Lambda kept breaking down and became unreliable the more it ran with lastest image. #246

Saurabh-Mudgal commented May 14, 2024

alfasareekkan commented May 21, 2024

Saurabh-Mudgal commented May 22, 2024

chrismaille commented Oct 4, 2024

umihico commented Oct 6, 2024

chrismaille commented Oct 7, 2024 •

edited

Loading

JoLBree commented Oct 10, 2024

Lambda kept breaking down and became unreliable the more it ran with lastest image. #246

Lambda kept breaking down and became unreliable the more it ran with lastest image. #246

Comments

Saurabh-Mudgal commented May 14, 2024

Dockerfile

main.py

Lambda Configs

What I have tried so far

alfasareekkan commented May 21, 2024

Saurabh-Mudgal commented May 22, 2024

chrismaille commented Oct 4, 2024

umihico commented Oct 6, 2024

chrismaille commented Oct 7, 2024 • edited Loading

JoLBree commented Oct 10, 2024

chrismaille commented Oct 7, 2024 •

edited

Loading