A small sitemap generator.
- The sitemap excludes and does not follow any external links.
- The sitemap excludes files with the following extensions:
- .png
- Is the sitemap directed or undirected?
- How are the internal links represented?
- We need to show the static assets each url depends on
- Show the links between pages
- Internal links can start with a /, for example
/archive
- Internal links can start with the site name, for example
http://mysite.com/rss
What are the advantages of traversing the site using either a BFS or a DFS?
- BFS: Good for finding the shortest path - not something we really need here
- DFS: exhausting all possibilities - this sounds more appropriate.
I decided to use the deque type, due to it's O(1) push/pop/popleft benefits over a regular list. The url that I'm testing with has lots of duplicate links however. In one test, for DFS it took around 30 seconds, whereas for BFS it took 250 seconds. This is due to this particular site having a lot of duplicate links on other blog pages. Because it's BFS, these are all added before they are marked as visited, so there's lots of duplicates in the queue. It terminates eventually, but takes a long time!
- regular list with pop/append operations
- deque: High performance data type, is thread-safe and memory efficient.
Some of the options:
- edge lists
- adjacency matrix
- adjacency list
- instances of a Node class linked to children Nodes
- Print out some ASCII form of a sitemap
- Generate a pdf with nodes (circles) with text inside(url) and directed lines linking to other nodes
- Display adjacency matrix
- Create XML document
Install pyenv if you don't have it already, to manage the different python runtimes you might need. If you're on OSX, use homebrew with the following command, otherwise defer to the github page
brew install pyenv
Locate the version of Python to install in your home directory.
pyenv versions -l
Install the version you want, for example 3.7.0.
pyenv install 3.7.0
Create the virtual environment, for example:
~/.pyenv/versions/3.7.0/bin/python -m venv venv
Next, we should activate our new virtual environment.
source ./venv/bin/activate
We should then generate our requirements.txt file for others to use later.
pip freeze > requirements.txt