Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrap_to_json() returns error #1

Open
dinonovak opened this issue Jun 14, 2024 · 3 comments
Open

scrap_to_json() returns error #1

dinonovak opened this issue Jun 14, 2024 · 3 comments

Comments

@dinonovak
Copy link

Hi,
unfortunately I am getting following error:

[WDM] - Driver [/Users/dino/.wdm/drivers/geckodriver/macos/v0.34.0/geckodriver] found in cache
new layout loaded
2024-06-14 15:55:50,841 - facebook_page_scraper.driver_utilities - ERROR - Error at close_modern_layout_signup_modal: Message: Element

is not clickable at point (892,121) because another element
obscures it
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:192:5
ElementClickInterceptedError@chrome://remote/content/shared/webdriver/Errors.jsm:291:5
webdriverClickElement@chrome://remote/content/marionette/interaction.js:166:11
interaction.clickElement@chrome://remote/content/marionette/interaction.js:125:11
clickElement@chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:204:29
receiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:92:31
Traceback (most recent call last):
File "/Users/dino/Codings/python/FacebookRSSInformer/.venv/lib/python3.11/site-packages/facebook_page_scraper/driver_utilities.py", line 74, in __close_modern_layout_signup_modal
close_button.click()
File "/Users/dino/Codings/python/FacebookRSSInformer/.venv/lib/python3.11/site-packages/selenium/webdriver/remote/webelement.py", line 81, in click
self._execute(Command.CLICK_ELEMENT)
File "/Users/dino/Codings/python/FacebookRSSInformer/.venv/lib/python3.11/site-packages/selenium/webdriver/remote/webelement.py", line 710, in _execute
return self._parent.execute(command, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dino/Codings/python/FacebookRSSInformer/.venv/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
self.error_handler.check_response(response)
File "/Users/dino/Codings/python/FacebookRSSInformer/.venv/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementClickInterceptedException: Message: Element
is not clickable at point (892,121) because another element
obscures it
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:192:5
ElementClickInterceptedError@chrome://remote/content/shared/webdriver/Errors.jsm:291:5
webdriverClickElement@chrome://remote/content/marionette/interaction.js:166:11
interaction.clickElement@chrome://remote/content/marionette/interaction.js:125:11
clickElement@chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:204:29
receiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:92:31

all_posts length: 3
no post_url, skipping
no post_url, skipping
no post_url, skipping
all_posts length: 3
all_posts length: 7

@alexgower
Copy link

Same here

@lullu57
Copy link

lullu57 commented Jun 25, 2024

I think facebook have changed their layout. I have managed to fix this error, but I am running into many more. Below are fixes that I have applied:

Accept cookies before try to login:

def scrap_to_json(self, minimum_timestamp = None):
        # call the __start_driver and override class member __driver to webdriver's instance
        self.__start_driver()
        starting_time = time.time()
        # navigate to URL
        self.__driver.get(self.URL)
        # only login if username is provided
        Finder._Finder__accept_cookies(self.__driver)
        self.username is not None and Finder._Finder__login(self.__driver, self.username, self.password)
        
        self.__layout = Finder._Finder__detect_ui(self.__driver)
        # sometimes we get popup that says "your request couldn't be processed", however
        # posts are loading in background if popup is closed, so call this method in case if it pops up.
        Utilities._Utilities__close_error_popup(self.__driver)
        # wait for post to load
        elements_have_loaded = Utilities._Utilities__wait_for_element_to_appear(
            self.__driver, self.__layout, self.timeout)
        # scroll down to bottom most
        Utilities._Utilities__scroll_down(self.__driver, self.__layout)
        self.__handle_popup(self.__layout)
        # timestamp limitation for scraping posts
        timestamp_edge_hit = False
        while (not timestamp_edge_hit) and (len(self.__data_dict) < self.posts_count) and elements_have_loaded:
            self.__handle_popup(self.__layout)
            # self.__find_elements(name)
            timestamp_edge_hit = self.__find_elements(minimum_timestamp)
            current_time = time.time()
            if self.__check_timeout(starting_time, current_time) is True:
                logger.setLevel(logging.INFO)
                logger.info('Timeout...')
                break
            Utilities._Utilities__scroll_down(
                self.__driver, self.__layout)  # scroll down
        # close the browser window after job is done.
        Utilities._Utilities__close_driver(self.__driver)
        # dict trimming, might happen that we find more posts than it was asked, so just trim it
        self.__data_dict = dict(list(self.__data_dict.items())[
                                0:int(self.posts_count)])

        return json.dumps(self.__data_dict, ensure_ascii=False)

Change cookie selector:

def __accept_cookies(driver):
        try:
            # Use JavaScript to find the button containing the text "Allow all cookies"
            buttons = driver.execute_script("""
                return Array.from(document.querySelectorAll('div[role="none"] span'))
                            .filter(span => span.textContent.includes('Allow all cookies'));
            """)
            
            # Check if any elements were found
            if buttons:
                ActionChains(driver).move_to_element(buttons[-1]).click().perform()  # Click the last one if multiple are found
            else:
                logger.info("No 'Allow all cookies' button found.")
        except NoSuchElementException:
            logger.info("No such element exception occurred.")
        except IndexError:
            logger.info("Index error occurred.")
        except Exception as ex:
            logger.exception("Error at accept_cookies: {}".format(ex))
            sys.exit(1)

Change Login selector:

def __login(driver, username, password):
        try:

            wait = WebDriverWait(driver, 4)  # considering that the elements might load a bit slow

            # NOTE this closes the login modal pop-up if you choose to not login above
            try:
                element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '[aria-label="Close"]')))
                element.click()  # Click the element
            except Exception as ex:
                logger.debug(f"no pop-up")

            time.sleep(1)
            #target username
            username_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='email']")))
            password_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='pass']")))

            #enter username and password
            username_element.clear()
            username_element.send_keys(str(username))
            password_element.clear()
            password_element.send_keys(str(password))

            #target the login button and click it
            try:
                # Try to click the first button of type 'submit'
                WebDriverWait(driver, 2).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[role='button'][aria-label='Accessible login button']"))).click()
            except TimeoutException:
                # If the button of type 'submit' is not found within 2 seconds, click the first 'button' found
                WebDriverWait(driver, 2).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button"))).click()
        except (NoSuchElementException, IndexError):
            pass
        except Exception as ex:
            logger.exception("Error at login: {}".format(ex))
            # sys.exit(1)

@moda20 if issue is replicable, let me know so that I create a PR

@lullu57
Copy link

lullu57 commented Jul 9, 2024

I have made it work by providing a URL, and have fixed some other fields such as name, image, that it does not wait for timeout if there are no posts (because of my needs). Feel free to have a look here, and see what can be implemented in original:

https://github.com/lullu57/facebook_page_scraper

@moda20 @shaikhsajid1111

edit: my version kind of requires the url and can maintain persistence between sessions (for my needs), but a lot of selectors and functionality has been improved. It is not a direct one is to one replacement though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants