-
-
Notifications
You must be signed in to change notification settings - Fork 128
Lots More Bots #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Lots More Bots #106
Conversation
|
My hesitance in merging this is two fold
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances bot detection in the user agent parser by adding support for numerous web crawlers and bots. The main changes implement a new pattern-based bot detection mechanism that can identify bots with URL references in their user agent strings, while maintaining backward compatibility for existing bot detection.
- Added a new regex-based bot detection system that identifies bots by their characteristic
(name/version; +http://...)pattern - Removed hardcoded bot names from the main browser regex and moved them to the new bot detection logic
- Added constants for 11 new bot/crawler types that are commonly used
- Added 116 new test cases for various bot user agents
- Added browser constant exclusions for lesser-known bots to keep the API surface manageable
Reviewed Changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/UserAgentParser.php | Refactored bot detection logic with new regex pattern and added prev capture group to main browser regex |
| src/UserAgent/Browsers.php | Added constants for 11 new commonly-used bot types (archive.org_bot, Discordbot, Google-Read-Aloud, Google-Safety, Pinterestbot, WellKnownBot, wpbot, YandexImages, YandexMobileBot, YandexRCA, YandexUserproxy) |
| bin/constant_generator.php | Added exclusion list for 30 bot types to prevent cluttering the Browsers interface |
| bin/user_agent_sorter.php | Added error handling for JSON decode failures |
| tests/user_agents.dist.json | Added 116 new bot user agent test cases covering various crawlers and bots |
| README.md | Updated documentation with new browser constants |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if( | ||
| ( | ||
| empty($result[BROWSER][0]) | ||
| || ($result['prev'][0] !== '') |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition will throw an 'Undefined index' notice if $result['prev'][0] doesn't exist. The regex on line 99 uses (?P<prev>.)? with a ? quantifier, meaning the capture group may not be set. This condition should use isset() or !empty() to check for existence before accessing the array element. Suggested fix: || (isset($result['prev'][0]) && $result['prev'][0] !== '')
| || ($result['prev'][0] !== '') | |
| || (isset($result['prev'][0]) && $result['prev'][0] !== '') |
No description provided.