HTML => XML + CSS with XLinq 🤘

This project uses SponsorLink and may issue IDE-only warnings if no active sponsorship is detected.

Read HTML as XML and query it with CSS over XLinq (or HtmlAgilityPack killer 😉). Provides HtmlDocument.Load and CssSelectElement(s) extension methods for XDocument/XElement.

No need to learn an entirely new object model for a page 🤘. This makes it the most productive and lean library for web scraping using the latest and greatest that .NET can offer.

Usage

using System.Xml.Linq;
using Devlooped.Web;

XDocument page = HtmlDocument.Load("page.html")
IEnumerable<XElement> elements = page.CssSelectElements("div.menuitem");

XElement title = page.CssSelectElement("html head meta[name=title]");

By default, HtmlDocument.Load will skip non-content elements script and style, turn all element names into lower case, and ignore all XML namespaces (useful when loading XHTML, for example) for easier querying. These options as well as granular whitespace handling can be configured using the overloads receiving an HtmlReaderSettings.

The underlying parsing is performed by the amazing SgmlReader library by Microsoft's Chris Lovett.

In addition, the following extension methods make it easier to work with XML documents where you want to query with CSS or XPath without having to deal with XML namespaces:

using System.Xml;
using System.Xml.Linq;
using Devlooped.Web;

var doc = XDocument.Load("doc.xml")
// Will remove all xmlns declarations, and allow querying elements 
// as if none had namespaces, returns the root element
XElement nons = doc.RemoveNamespaces();

// Alternatively, you can also ignore at the XmlReader level
using var reader = XmlReader.Create("doc.xml").IgnoreNamespaces();
doc = XDocument.Load(reader);

// Finally, you can also skip elements at the reader level
using var reader = XmlReader.Create("doc.xml").SkipElements("foo", "bar");
doc = XDocument.Load(reader);

CSS

At the moment, supports the following CSS selector features:

And all combinators

Non-CSS features:

text() pseudo-attribute selector: selects the node text contents, as specified in the XPath text() location path. Can be used instead of an attribute name selector, such as div[text()=foo]. All attribute value selectors are also supported:
- [text()=val]: Represents an element whose text contents is exactly "val".
- [text()~=val]: Represents an element whose text contents is a whitespace-separated list of words, one of which is exactly "val". If "val" contains whitespace, it will never represent anything (since the words are separated by spaces). Also if "val" is the empty string, it will never represent anything.
- [text()|=val]: Represents an element whose text contents either being exactly "val" or beginning with "val" immediately followed by "-" (U+002D).
- [text()^=val]: Represents an element whose text contents begins with the prefix "val". If "val" is the empty string then the selector does not represent anything.
- [text()$=val]: Represents an element whose text contents ends with the suffix "val". If "val" is the empty string then the selector does not represent anything.
- [text()*=val]: Represents an element whose text contents contains at least one instance of the substring "val". If "val" is the empty string then the selector does not represent anything.

Dogfooding

We also produce CI packages from branches and pull requests so you can dogfood builds as quickly as they are produced.

The CI feed is https://pkg.kzu.io/index.json.

The versioning scheme for packages is:

PR builds: 42.42.42-pr[NUMBER]
Branch builds: 42.42.42-[BRANCH].[COMMITS]

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github		.github
assets		assets
src		src
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.netconfig		.netconfig
Directory.Build.rsp		Directory.Build.rsp
Gemfile		Gemfile
Web.sln		Web.sln
_config.yml		_config.yml
changelog.md		changelog.md
license.txt		license.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTML => XML + CSS with XLinq 🤘

Usage

CSS

Dogfooding

Sponsors

About

Releases 9

Sponsor this project

Contributors 3

Languages

License

devlooped/Web

Folders and files

Latest commit

History

Repository files navigation

HTML => XML + CSS with XLinq 🤘

Usage

CSS

Dogfooding

Sponsors

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 9

Sponsor this project

Contributors 3

Languages