Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support connecting to BrightData Scraping Browsers #1092

Open
apuigsech opened this issue Jul 17, 2024 · 4 comments
Open

Support connecting to BrightData Scraping Browsers #1092

apuigsech opened this issue Jul 17, 2024 · 4 comments
Labels
discuss Need review and discussion enhance New feature or request

Comments

@apuigsech
Copy link

I'd like to use scalable browser infrastructure services, such as the BrightData Scraping Browser, which integrate well with Puppeteer. But, I have encountered some issues when trying to use these services with go-rod. I would like to request the following enhancements to improve compatibility:

1. WebSocket Authentication:

The connection to these services’ WebSocket requires authentication (e.g., wss://user:pass@host:9222). However, go-rod does not currently send the necessary authentication headers, which I think are not defined on any WebSocket standard.

Through my research, I discovered that authentication is performed using Basic tokens. I have implemented a working solution to inject the Authorization header. However, I am unsure if this is the optimal place to inject it. If this solution aligns with the project's direction, I am willing to submit a PR with my implementation.

2. Less Restrictive WebSocket Response Handling:

The services sometimes send responses that deviate from the expected go-rod Response structure, causing panics due to unmarshalling failures. Specifically, the Error struct expects an integer Code, but some responses include a string (e.g., "navigate_limit").

The Response structure is defined this way:

type Response struct {
  ID     int             `json:"id"`
  Result json.RawMessage `json:"result,omitempty"`
  Error  *Error          `json:"error,omitempty"`
}

type Error struct {
  Code    int    `json:"code"`
  Message string `json:"message"`
  Data    string `json:"data"`
}

And Brightdata is sending to me struct that panic, like this

{
  "id": 27,
  "sessionId": "BRD_461626884EEF95862B6188C2DBB766D1",
  "error": {
    "message": "Page.navigate limit reached",
    "code": "navigate_limit" // This is expected as an Int.
  },
  "duration": 1.2261550000112038
}

To make go-rod more compatible, it may be necessary to relax the strictness of the standard for the Error struct. I would appreciate guidance on the best approach to achieve this flexibility. If you agree with this, I am happy to work on the implementation with a bit of guidance.

@apuigsech apuigsech added the enhance New feature or request label Jul 17, 2024
Copy link

Please add a valid Rod Version: v0.0.0 to your issue. Current version is v0.116.2

generated by check-issue

@apuigsech
Copy link
Author

I am using the las version of go-rod (v0.116.2).

@ysmood
Copy link
Member

ysmood commented Jul 18, 2024

Have you checked this example file? You can use other websocket lib to do any kind of auth you like:

// WebSocket is a custom websocket that uses gobwas/ws as the transport layer.
type WebSocket struct {
conn net.Conn
}
// NewWebSocket ...
func NewWebSocket(u string) *WebSocket {
conn, _, _, err := ws.Dial(context.Background(), u)
if err != nil {
log.Fatal(err)
}
return &WebSocket{conn}
}
// Send ...
func (w *WebSocket) Send(b []byte) error {
return wsutil.WriteClientText(w.conn, b)
}
// Read ...
func (w *WebSocket) Read() ([]byte, error) {
return wsutil.ReadServerText(w.conn)
}

@ysmood
Copy link
Member

ysmood commented Jul 18, 2024

About the error string, you can also use your customized websocket to convert the error to a number:

// Read ...
func (w *WebSocket) Read() ([]byte, error) {
	b, err := wsutil.ReadServerText(w.conn)
        // parse b, and replace the string to int, then encode it to json bytes
        ...
        return normalized, err
}

I think the error string is a bug of BrightData, we should raise an issue about it. It should follow the cdp protocol definition.

@ysmood ysmood added the discuss Need review and discussion label Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Need review and discussion enhance New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants