You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are multiple ways of selecting elements by using XPath, CSS selectors, regular expression.
To reach some elements easier I've written a function that is used like dplyr functions. This function gathers three functions' features which they are starts_with(), contains() and ends_with().
Before I didn't know using regular expression on web scraping and had no idea about selectors. I've kind of learned them now and I can reach the elements without the function I wrote. However, beginners like me are supposed to research and learn how to reach the elements.
I wonder your opinions, adding a function as a new feature like that in the rvest package makes sense to reach the elements easier?
# Packages
library(rvest)
library(dplyr)
# Functionhtml_nodes_regex<-function(html, node_name, attr, regex_type= c("equal", "startswith", "contains", "endswith")){
#https://developer.mozilla.org/en-US/docs/Web/CSS/Pseudo-classes#https://medium.com/yonder-techblog/css-regex-attribute-selectors-98075b7f4726# Checksif(missing(node_name)){stop("`node_name` cannot be missing!")}
if(missing(attr)){stop("`attr` cannot be missing!")}
if(missing(regex_type)){stop("`regex_type` cannot be missing!")}
if(!is.character(node_name)){stop("The class of `node_name` has to be character!")}
if(!is.character(attr)){stop("The class of `node_name` has to be character!")}
if(!is.character(regex_type)){stop("The class of `node_name` has to be character!")}
if(length(regex_type%in% c("equal","startswith", "contains", "endswith")) !=1){
stop("`regex_type` has to be one of them: `equal`, `startswith`, `contains` or `endswith`!")
}
# Regex Typeregex_type_check<-switch(regex_type,
equal="",
startswith="^",
contains="*",
endswith="$",
stop("Unknown `regext_type!` Type must be `equal`, `startswith`, `contains` or `endswith`", call.=FALSE)
)
# Selector Query query<- paste0("[", attr, regex_type_check, "=", node_name, "]")
# Selecting Elementshtml %>% rvest::html_nodes(query)
}
# Reading the HTML page of the Premier Leagueurl<-"https://fbref.com/en/comps/9/Premier-League-Stats"page<-rvest::read_html(url)
There are multiple ways of selecting elements by using XPath, CSS selectors, regular expression.
To reach some elements easier I've written a function that is used like dplyr functions. This function gathers three functions' features which they are
starts_with()
,contains()
andends_with()
.Before I didn't know using regular expression on web scraping and had no idea about selectors. I've kind of learned them now and I can reach the elements without the function I wrote. However, beginners like me are supposed to research and learn how to reach the elements.
I wonder your opinions, adding a function as a new feature like that in the rvest package makes sense to reach the elements easier?
Best regards,
Ekrem.
The text was updated successfully, but these errors were encountered: