A Go implementation of Glypto - a CLI tool for scraping metadata from websites using a provider-based architecture. Extract Open Graph tags, Twitter Cards, standard meta tags, and RSS/Atom feeds from web pages.
Glypto Go extracts comprehensive metadata from websites including:
- Page metadata: Titles, descriptions, images, favicons
- Social media: Open Graph tags, Twitter Cards
- Feed discovery: RSS/Atom feeds with automatic detection
- Site information: Site names, canonical URLs
The tool features a modular provider system with priority-based resolution, making it easy to extend and customize metadata extraction.
- π Comprehensive Metadata Scraping: Open Graph, Twitter Cards, standard meta tags, and RSS/Atom feeds
- π§© Extensible Provider System: Plug-and-play architecture for adding new metadata sources
- π Priority-Based Resolution: Intelligent fallback system for metadata values (OpenGraph β Twitter β Standard β Other)
- β‘ Fast HTML Parsing: Built on
golang.org/x/net/html
for efficient parsing - π¦ Multiple Usage Patterns: CLI tool and programmatic Go API
- π― Type-Safe: Full Go type safety with interfaces and structured data
- π¨ Colorized Output: Beautiful CLI output with color-coded results
- π Feed Discovery: Automatic detection and parsing of RSS/Atom feeds
# Clone the repository
git clone https://github.com/alvincrespo/glypto-go.git
cd glypto-go
# Build the project
go build -o bin/glypto ./cmd/glypto
# Run the CLI
./bin/glypto --help
go install github.com/alvincrespo/glypto-go/cmd/glypto@latest
# Scrape metadata from a URL
./bin/glypto scrape https://example.com
# Interactive mode (will prompt for URL)
./bin/glypto scrape
# Get help
./bin/glypto --help
./bin/glypto scrape --help
$ ./bin/glypto scrape https://github.com
β Metadata scraped successfully:
Title: GitHub Β· Build and ship software on a single, collaborative platform
Description: Join the world's most widely adopted, AI-powered developer platform...
Image: https://github.githubassets.com/assets/home24-5939032587c9.jpg
URL: https://github.com/
Site Name: GitHub
Favicon: https://github.githubassets.com/favicons/favicon.svg
Feeds:
1. Untitled () - https://github.com/?locale=ja
2. Untitled () - https://github.com/?locale=ko
Open Graph Tags:
site_name: GitHub
type: object
title: GitHub Β· Build and ship software on a single, collaborative platform
url: https://github.com/
image: https://github.githubassets.com/assets/home24-5939032587c9.jpg
Twitter Card Tags:
card: summary_large_image
site: @github
title: GitHub Β· Build and ship software on a single, collaborative platform
package main
import (
"fmt"
"log"
"net/http"
"github.com/alvincrespo/glypto-go/pkg/scraper"
"golang.org/x/net/html"
)
func main() {
// Fetch webpage
resp, err := http.Get("https://example.com")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
// Parse HTML
doc, err := html.Parse(resp.Body)
if err != nil {
log.Fatal(err)
}
// Scrape metadata
metadata, err := scraper.ScrapeMetadata(doc)
if err != nil {
log.Fatal(err)
}
if title := metadata.Title(); title != nil {
fmt.Printf("Title: %s\n", *title)
}
if description := metadata.Description(); description != nil {
fmt.Printf("Description: %s\n", *description)
}
if image := metadata.Image(); image != nil {
fmt.Printf("Image: %s\n", *image)
}
// Access provider-specific data
ogData := metadata.OpenGraph()
twitterData := metadata.TwitterCard()
fmt.Printf("Found %d Open Graph tags\n", len(ogData))
fmt.Printf("Found %d Twitter Card tags\n", len(twitterData))
}
package main
import (
"log"
"net/http"
"github.com/alvincrespo/glypto-go/pkg/metadata"
"github.com/alvincrespo/glypto-go/pkg/providers"
"github.com/alvincrespo/glypto-go/pkg/scraper"
"golang.org/x/net/html"
)
func main() {
// Create custom provider list (only OpenGraph and Twitter)
providerList := []metadata.MetadataProvider{
providers.NewOpenGraphProvider(),
providers.NewTwitterProvider(),
}
// Create scraper with custom providers
scraperInstance := scraper.CreateScraperWithProviders(providerList)
// Or use provider names for convenience
scraperByNames, err := scraper.CreateScraperWithProviderNames([]string{
"opengraph", "twitter", "standardmeta",
})
if err != nil {
log.Fatal(err)
}
// Fetch and parse HTML...
resp, _ := http.Get("https://example.com")
defer resp.Body.Close()
doc, _ := html.Parse(resp.Body)
// Scrape with custom configuration
metadata, err := scraperInstance.Scrape(doc)
if err != nil {
log.Fatal(err)
}
// Process results...
}
Glypto Go uses a modular provider architecture with clear separation of concerns:
Scraper
: Main scraping engine with fluent method chainingProviderRegistry
: Manages and prioritizes metadata providersMetadata
: Result object with intelligent value resolutionMetadataProvider
: Interface for implementing custom providers
glypto-go/
βββ .github/ # GitHub Actions workflows and configuration
β βββ workflows/ # CI/CD pipelines
β βββ dependabot.yml # Dependency management
β βββ labeler.yml # PR auto-labeling
βββ cmd/glypto/ # CLI entry point
β βββ main.go # Application main function
βββ pkg/
β βββ cli/ # Cobra CLI commands and logic
β βββ metadata/ # Core metadata types and interfaces
β βββ providers/ # Provider implementations and registry
β βββ scraper/ # Scraping engine and factory functions
βββ bin/ # Compiled binaries (created on build)
βββ CLAUDE.md # AI coding assistant instructions
βββ go.mod # Go module definition
βββ go.sum # Go module checksums
The following providers are included by default, listed by priority:
- OpenGraph Provider (Priority 1): Extracts
og:*
properties - Twitter Provider (Priority 2): Extracts
twitter:*
properties - Standard Meta Provider (Priority 3): Extracts standard meta tags
- Other Elements Provider (Priority 4): Extracts from
<title>
,<h1>
,<link>
tags
- Go 1.24 or higher
- Git (for cloning the repository)
# Build the CLI
go build -o bin/glypto ./cmd/glypto
# Build and run
go run ./cmd/glypto scrape https://example.com
# Install dependencies
go mod tidy
# Run tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Run tests verbosely
go test -v ./...
# Format code
go fmt ./...
# Run linter (if golangci-lint is installed)
golangci-lint run
The project includes comprehensive tests using Go's built-in testing framework:
# Run all tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Run specific package tests
go test ./pkg/metadata -v
# Run tests with race detection
go test -race ./...
The project includes comprehensive test coverage with:
- Unit tests for all packages (
*_test.go
files) - Table-driven tests for comprehensive coverage
- Interface-based testing for provider system
- Integration tests for CLI commands
- Mock implementations for testing provider behavior
Test Coverage by Package:
pkg/cli/
- CLI command functionality and HTTP handlingpkg/metadata/
- Metadata structure and value resolutionpkg/providers/
- All provider implementations and registrypkg/scraper/
- Scraping engine and factory functions
The project includes GitHub Actions workflows for:
- Continuous Integration: Automated testing, linting, and building on every push/PR
- Security Scanning: Vulnerability checking with
govulncheck
- Code Quality:
golangci-lint
for comprehensive code analysis - Dependency Management: Dependabot for automated dependency updates
- Releases: Automated multi-platform binary builds on version tags
- Auto-labeling: Automatic PR labeling based on changed files
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run tests and ensure they pass
- Submit a pull request
This project is a Go translation of the original Glypto TypeScript project.
MIT License - see LICENSE file for details.