Glypto Go

A Go implementation of Glypto - a CLI tool for scraping metadata from websites using a provider-based architecture. Extract Open Graph tags, Twitter Cards, standard meta tags, and RSS/Atom feeds from web pages.

Overview

Glypto Go extracts comprehensive metadata from websites including:

Page metadata: Titles, descriptions, images, favicons
Social media: Open Graph tags, Twitter Cards
Feed discovery: RSS/Atom feeds with automatic detection
Site information: Site names, canonical URLs

The tool features a modular provider system with priority-based resolution, making it easy to extend and customize metadata extraction.

Features

🔍 Comprehensive Metadata Scraping: Open Graph, Twitter Cards, standard meta tags, and RSS/Atom feeds
🧩 Extensible Provider System: Plug-and-play architecture for adding new metadata sources
🚀 Priority-Based Resolution: Intelligent fallback system for metadata values (OpenGraph → Twitter → Standard → Other)
⚡ Fast HTML Parsing: Built on golang.org/x/net/html for efficient parsing
📦 Multiple Usage Patterns: CLI tool and programmatic Go API
🎯 Type-Safe: Full Go type safety with interfaces and structured data
🎨 Colorized Output: Beautiful CLI output with color-coded results
📝 Feed Discovery: Automatic detection and parsing of RSS/Atom feeds

Installation

From Source

# Clone the repository
git clone https://github.com/alvincrespo/glypto-go.git
cd glypto-go

# Build the project
go build -o bin/glypto ./cmd/glypto

# Run the CLI
./bin/glypto --help

Using Go Install

go install github.com/alvincrespo/glypto-go/cmd/glypto@latest

Usage

CLI Usage

# Scrape metadata from a URL
./bin/glypto scrape https://example.com

# Interactive mode (will prompt for URL)
./bin/glypto scrape

# Get help
./bin/glypto --help
./bin/glypto scrape --help

Example Output

$ ./bin/glypto scrape https://github.com

✓ Metadata scraped successfully:
Title: GitHub · Build and ship software on a single, collaborative platform
Description: Join the world's most widely adopted, AI-powered developer platform...
Image: https://github.githubassets.com/assets/home24-5939032587c9.jpg
URL: https://github.com/
Site Name: GitHub
Favicon: https://github.githubassets.com/favicons/favicon.svg

Feeds:
  1. Untitled () - https://github.com/?locale=ja
  2. Untitled () - https://github.com/?locale=ko

Open Graph Tags:
  site_name: GitHub
  type: object
  title: GitHub · Build and ship software on a single, collaborative platform
  url: https://github.com/
  image: https://github.githubassets.com/assets/home24-5939032587c9.jpg

Twitter Card Tags:
  card: summary_large_image
  site: @github
  title: GitHub · Build and ship software on a single, collaborative platform

Programmatic Usage

Simple Usage with Factory

package main

import (
    "fmt"
    "log"
    "net/http"

    "github.com/alvincrespo/glypto-go/pkg/scraper"
    "golang.org/x/net/html"
)

func main() {
    // Fetch webpage
    resp, err := http.Get("https://example.com")
    if err != nil {
        log.Fatal(err)
    }
    defer resp.Body.Close()

    // Parse HTML
    doc, err := html.Parse(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    // Scrape metadata
    metadata, err := scraper.ScrapeMetadata(doc)
    if err != nil {
        log.Fatal(err)
    }

    if title := metadata.Title(); title != nil {
        fmt.Printf("Title: %s\n", *title)
    }
    if description := metadata.Description(); description != nil {
        fmt.Printf("Description: %s\n", *description)
    }
    if image := metadata.Image(); image != nil {
        fmt.Printf("Image: %s\n", *image)
    }

    // Access provider-specific data
    ogData := metadata.OpenGraph()
    twitterData := metadata.TwitterCard()
    fmt.Printf("Found %d Open Graph tags\n", len(ogData))
    fmt.Printf("Found %d Twitter Card tags\n", len(twitterData))
}

Custom Providers

package main

import (
    "log"
    "net/http"

    "github.com/alvincrespo/glypto-go/pkg/metadata"
    "github.com/alvincrespo/glypto-go/pkg/providers"
    "github.com/alvincrespo/glypto-go/pkg/scraper"
    "golang.org/x/net/html"
)

func main() {
    // Create custom provider list (only OpenGraph and Twitter)
    providerList := []metadata.MetadataProvider{
        providers.NewOpenGraphProvider(),
        providers.NewTwitterProvider(),
    }

    // Create scraper with custom providers
    scraperInstance := scraper.CreateScraperWithProviders(providerList)

    // Or use provider names for convenience
    scraperByNames, err := scraper.CreateScraperWithProviderNames([]string{
        "opengraph", "twitter", "standardmeta",
    })
    if err != nil {
        log.Fatal(err)
    }

    // Fetch and parse HTML...
    resp, _ := http.Get("https://example.com")
    defer resp.Body.Close()
    doc, _ := html.Parse(resp.Body)

    // Scrape with custom configuration
    metadata, err := scraperInstance.Scrape(doc)
    if err != nil {
        log.Fatal(err)
    }

    // Process results...
}

Architecture

Glypto Go uses a modular provider architecture with clear separation of concerns:

Core Components

Scraper: Main scraping engine with fluent method chaining
ProviderRegistry: Manages and prioritizes metadata providers
Metadata: Result object with intelligent value resolution
MetadataProvider: Interface for implementing custom providers

Project Structure

glypto-go/
├── .github/             # GitHub Actions workflows and configuration
│   ├── workflows/       # CI/CD pipelines
│   ├── dependabot.yml   # Dependency management
│   └── labeler.yml      # PR auto-labeling
├── cmd/glypto/          # CLI entry point
│   └── main.go          # Application main function
├── pkg/
│   ├── cli/             # Cobra CLI commands and logic
│   ├── metadata/        # Core metadata types and interfaces
│   ├── providers/       # Provider implementations and registry
│   └── scraper/         # Scraping engine and factory functions
├── bin/                 # Compiled binaries (created on build)
├── CLAUDE.md           # AI coding assistant instructions
├── go.mod              # Go module definition
└── go.sum              # Go module checksums

Built-in Providers

The following providers are included by default, listed by priority:

OpenGraph Provider (Priority 1): Extracts og:* properties
Twitter Provider (Priority 2): Extracts twitter:* properties
Standard Meta Provider (Priority 3): Extracts standard meta tags
Other Elements Provider (Priority 4): Extracts from <title>, <h1>, <link> tags

Development

Prerequisites

Go 1.24 or higher
Git (for cloning the repository)

Building

# Build the CLI
go build -o bin/glypto ./cmd/glypto

# Build and run
go run ./cmd/glypto scrape https://example.com

# Install dependencies
go mod tidy

Project Commands

# Run tests
go test ./...

# Run tests with coverage
go test -cover ./...

# Run tests verbosely
go test -v ./...

# Format code
go fmt ./...

# Run linter (if golangci-lint is installed)
golangci-lint run

Testing

The project includes comprehensive tests using Go's built-in testing framework:

# Run all tests
go test ./...

# Run tests with coverage
go test -cover ./...

# Run specific package tests
go test ./pkg/metadata -v

# Run tests with race detection
go test -race ./...

Test Structure

The project includes comprehensive test coverage with:

Unit tests for all packages (*_test.go files)
Table-driven tests for comprehensive coverage
Interface-based testing for provider system
Integration tests for CLI commands
Mock implementations for testing provider behavior

Test Coverage by Package:

pkg/cli/ - CLI command functionality and HTTP handling
pkg/metadata/ - Metadata structure and value resolution
pkg/providers/ - All provider implementations and registry
pkg/scraper/ - Scraping engine and factory functions

CI/CD

The project includes GitHub Actions workflows for:

Continuous Integration: Automated testing, linting, and building on every push/PR
Security Scanning: Vulnerability checking with govulncheck
Code Quality: golangci-lint for comprehensive code analysis
Dependency Management: Dependabot for automated dependency updates
Releases: Automated multi-platform binary builds on version tags
Auto-labeling: Automatic PR labeling based on changed files

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Run tests and ensure they pass
Submit a pull request

Acknowledgments

This project is a Go translation of the original Glypto TypeScript project.

License

MIT License - see LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Glypto Go

Table of Contents

Overview

Features

Installation

From Source

Using Go Install

Usage

CLI Usage

Example Output

Programmatic Usage

Simple Usage with Factory

Custom Providers

Architecture

Core Components

Project Structure

Built-in Providers

Development

Prerequisites

Building

Project Commands

Testing

Test Structure

CI/CD

Contributing

Acknowledgments

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
cmd/glypto		cmd/glypto
pkg		pkg
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

License

alvincrespo/glypto-go

Folders and files

Latest commit

History

Repository files navigation

Glypto Go

Table of Contents

Overview

Features

Installation

From Source

Using Go Install

Usage

CLI Usage

Example Output

Programmatic Usage

Simple Usage with Factory

Custom Providers

Architecture

Core Components

Project Structure

Built-in Providers

Development

Prerequisites

Building

Project Commands

Testing

Test Structure

CI/CD

Contributing

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages