phpGlluchMiriadaX is a collection of scripts to obtain the metadata from a group of CourseTalk courses.
Tested in january 2017
Rip all the site, for example with HTTrak.
Go to the ripped site, to coursetalk.com/providers. For each provider, go to PROVIDERNAME/courses. The list of directories is the list of the courses from that provider. Get it and add to a courses_links0.txt in your home directory.
ls -d $PWD/*/ >> ~/courses_links0.txt
The above comand has to be done for every provider.
Move the file courses_links0.txt from your home directory to the directory of phpGlluchCourseTalk.
You need API Key for detectlanguage.php. The write it in lang_sample.php and rename the file to lang.php
This files has to be executed in php CLI in this order:
- php getInfos.php. Get the basic information.
- php lang.php. Detects the language.
- Check the files in maybe directories and others and, if it is the case, move the courses in the correct directory
- php preclean_en.php. Deletes the keywords from english courses.
- php preclean_es.php. Deletes the keywords from spanish courses.
The keywords are the same for all courses and doesn't add any useful information
The courses information will be in the directory courses2/en for english moocs and in courses2/es for spanish courses.
phpGlluchCoursera
phpGlluchEdX
phpGlluchMiriadaX