Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

Latest commit

 

History

History
31 lines (23 loc) · 2.94 KB

INSTRUCTIONS.md

File metadata and controls

31 lines (23 loc) · 2.94 KB

Some notions to know before using the program!


When opening the program you will find an entry labeled URL . Here is where you can input the URL of the website that you want to search. !Be careful: the URL needs to be valid otherwise you will get an error! It also needs to have the domain (https/http) before the www. Here is an example: https://infallible-varahamihira-e94f86.netlify.app/. In fact you can try this website. It is a testing website and it was used to test the program.

Below the entry you can find the Settings section. As the names suggests, this section is where you can modify some features which are the following:

  • number of pages: Here you can input the number of pages that you want to crawl. For example the number 1 will only show the links of the first page. Now if you input 5, the program will extract the links from the first page and it will continue the extraction on the links that were already extracted. However be aware that the more you crawl the more time it will take to finish;
  • node/edge color: If enabled the program will color the nodes, or the edges or both. Be aware that the colouring is just a matter of aesthetic;
  • catch broken-link: Here you can decide if you want the program to catch broken-links (links that essentially don't exist). If you enable then the program will not add broken links to the graph, but you have the option to see the broken links that the program caught while running. Suggestion: if you don't mind having some broken-links then I will strongly advise you to disable this option, as the time it will take to run the program is considerably less.

Now moving on to the buttons. To enable the buttons view the graph, View the nodes data, View the edges data , and the View the broken-links data you will need to first click on the button Input. Once you click it the program will crawl the website and extract the data. When the it finishes running, then the buttons mention above will be enabled. !Note that the button View the broken-links data will only be enabled if you have the option to catch the broken-links enabled. The functionality of the buttons:

  • input: will run the program, extract the data from the website;
  • View the graph: Shows the graph generated;
  • View the nodes/edges/broken-links data: Allows you to view the data extracted by the program in form of a .txt file.

REQUIREMENTS To Run the Program

To be able to run this program you need to have Python3 installed in your machine and you will also need to install some libraries which are:

 - requests (in the terminal: pip install requests)
    
 - BeatifullSoup4 (in the terminal: pip install beautifulsoup4)
    
 - networkX (in the terminal: pip install networkx)
    
 - matplotlib (in the terminal: pip install matplotlib)
    
 - spicy (in the terminal: pip intall spicy)

Congratulations! Your are now ready to use the program, hope you like it! 😀