-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrape the "all rooms" page #6
Comments
Check it out: https://github.com/alistaire47/room_utils/blob/master/rooms.R I tried to get the roxygen comments roughly right, but please double-check them before we integrate it; I'm still pretty new to package development. Also, I realized that despite prefixing all the non-base functions with |
Of course you can. You need to specify exports in your roxygen part of the On Thu, Mar 31, 2016 at 9:24 AM, Edward Visel [email protected]
In God we trust, all others bring data. |
@romunov Oh perfect, thanks! I updated the script linked above. Also, here's a little add-on function, which is useful but slow because it scrapes everything every time. (Maybe find_room <- function(room_name, exact = FALSE){
pattern <- ifelse(exact == TRUE, paste0('/', room_name, '$'), room_name)
grep(pattern, rooms(), value = TRUE, ignore.case = TRUE)
} Documented: https://github.com/alistaire47/room_utils/blob/master/find_room.R |
If you feel this overhead cost is too much, consider exporting the data into an external file and look for the existence (and time stamp) before scraping all the rooms again. I would also suggest you write the code to the R package folder in a different branch. Once everyone likes the functionality (and it compiles OK), that branch can be merged seamlessly to the main branch. |
There should be a function to scrape the "all rooms" page (with the relevant options of at least "active" and "people") (http://chat.stackoverflow.com/?tab=all&sort=active and http://chat.stackoverflow.com/?tab=all&sort=people) and return a
data.frame
of the relevant URLs. This would make the package more generally relevant.@alistaire47, you seem to know what's up when it comes to scraping ;-)
I'm guessing it's something along the lines of starting with:
The text was updated successfully, but these errors were encountered: