Linkedin-sildeshare-scraper

Linkedin slideshare web crawler downloading files on SlideShare.

Installation

Python 2.7.*
Beautiful Soup 4

$ pip install bs4

Selenium Webdriver

$ pip install selenium

Replace or Update chromedriver to latest version according to your OS. Download

Usage

Open the sharesilde_crawler.py with a text editor.
Set parameters.

output_path: the path you want to save files. Use ABSOLUTE PATH!
start_point: the page you start to scrape. I have set one for you, but you can change it!
username: Your Linkedin account. You'd better register another account for testing in case that Linkedin blocked your original account. :)(I used Selenium so it seems not gonna happen. But just in case.)
password: Your Linkedin password.
search_depth: Depth you want search into. I used DFS in search algorithms. The program will stop and certain depths. You can also stop the program manually.

Run the program

$ python sharesilde_crawler.py

Linkedin will limit the number of downloads in 24 hours each account. So you can try more test accounts.

Results

The scraper will automatically download files in your output directory.(Resumes)

ToDoList

headless doesn't work...
Maybe Multiprocessing
Detect duplicate downloaded files

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
screenshots		screenshots
LICENSE		LICENSE
README.md		README.md
chromedriver		chromedriver
sharesilde_crawler.py		sharesilde_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linkedin-sildeshare-scraper

Installation

Usage

Results

ToDoList

About

Releases

Packages

Languages

License

XiyanHu/Linkedin-sildeshare-scraper

Folders and files

Latest commit

History

Repository files navigation

Linkedin-sildeshare-scraper

Installation

Usage

Results

ToDoList

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages