Skip to content

Commit

Permalink
Merge branch 'main' into add-action
Browse files Browse the repository at this point in the history
  • Loading branch information
pacoxu authored Nov 27, 2024
2 parents 330ab17 + 73590f8 commit 5670aba
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 4 deletions.
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,27 @@
# slides-crawl

download all slides from sched.com event: kubecon/istiocon

1. KubeCon China 2023: https://kccncosschn2023.sched.com/list/descriptions/
2. IstioCon China 2023: https://istioconchina2023.sched.com/list/descriptions/

- `/list/descriptions/`: this page will include all slides links in the page directly.

### Run with python3

```
python3 -m pip install requests BeautifulSoup4
python3 download_slides.py
```

### Quick Run in Docker

```
docker run ghcr.io/pacoxu/slides-crawl:latest
```

For KubeCon NA 2024, you can run
```
docker run -e SCHED_LINK=https://kccncna2024.sched.com/list/descriptions/ ghcr.io/pacoxu/slides-crawl:latest
```
download files will be inside the container, and you can use volume or `docker cp` to get them to your PC.
8 changes: 4 additions & 4 deletions download_slides.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ def download_file(url, file_path, timeout=30):


# Step 1: Find all topic links on the main page
# topics_url = "http://localhost:8000/view-source_https___pytorch2023.sched.com_list_descriptions_.html"
# topics_url = "https://colocatedeventsna2024.sched.com/list/descriptions/"
topics_url = "https://kccncna2024.sched.com/list/descriptions/?iframe=no"
# topics_url = "https://kcsna2024.sched.com/list/descriptions?iframe=no"
# default_topics_url = "https://colocatedeventsna2024.sched.com/list/descriptions/"
# default_topics_url = "https://kcsna2024.sched.com/list/descriptions?iframe=no"
default_topics_url = "https://kccncna2024.sched.com/list/descriptions/?iframe=no"
topics_url = os.getenv('SCHED_LINK', default_topics_url)
response = requests.get(topics_url)
topic_soup = BeautifulSoup(response.text, "html.parser")
print(response.text)
Expand Down

0 comments on commit 5670aba

Please sign in to comment.