sanitiseTableName unicode/emoji escaping closes #4801#4845
sanitiseTableName unicode/emoji escaping closes #4801#4845
Conversation
PostgreSQL requires identifier escaping for unicode and emoji characters, but sanitiseTableName() only checks for spaces, hyphens, and uppercase. This test demonstrates the bug by showing that table names with: - Chinese characters (用户) - Emoji (😀) - French accented characters (données) - Cyrillic characters (таблица) - Arabic characters (جدول) are not being escaped as they should be. The test fails on all unicode/emoji cases, proving the bug exists.
8a89595 to
0b0a131
Compare
PostgreSQL requires escaping for non-ASCII characters in identifiers. The sanitiseTableName() function previously only checked for spaces, hyphens, and uppercase letters, but missed unicode and emoji characters. Added containsNonASCII() helper function to detect any characters with rune values > 127 (non-ASCII). Now properly escapes table names with: - Unicode characters (Chinese, Cyrillic, Arabic, etc.) - Emoji - Accented characters This prevents SQL errors and potential injection issues when using international table names in the interactive client.
|
This PR correctly fixes the unicode/emoji escaping bug, but it raises an interesting UX question worth discussing: Current Behavior: Text ≠ OutputLooking at // Qualified tables
{Text: qualifiedTableName, Output: qualifiedTableName} // Line 88
// Unqualified tables
{Text: tableName, Output: sanitiseTableName(tableName)} // Line 103The inconsistency:
This violates WYSIWYG principles and could be surprising to users. The QuestionShould we show the escaped form in autocomplete to match what actually gets inserted? // Option 1 (current): Clean display, escaped output
{Text: "MyTable", Output: "\"MyTable\""}
↑ user sees ↑ gets inserted
// Option 2 (WYSIWYG): Show escaped form
{Text: "\"MyTable\"", Output: "\"MyTable\""}
↑ user sees ↑ gets inserted (same!)Trade-offsCurrent approach (clean display):
WYSIWYG approach (show escaped):
My TakeFor Steampipe (a developer tool), transparency > visual cleanliness. Developers expect to see the actual SQL that will be used. The "ugliness" of quotes is actually useful information. Worth discussing whether to show the escaped form in the Thoughts? |
|
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
Summary
Fixes a bug where
sanitiseTableName()in the interactive client doesn't escape unicode or emoji characters in table names, causing SQL errors and potential security issues.PostgreSQL requires identifier escaping for non-ASCII characters, but the function only checked for spaces, hyphens, and uppercase letters.
Changes
containsNonASCII()helper to detect and escape non-ASCII charactersTest Results
Before fix (Commit 1):
After fix (Commit 2):
Impact
Severity
MEDIUM - SQL errors / potential security issue
Files Modified
pkg/interactive/interactive_helpers_test.go(new file - test)pkg/interactive/interactive_client_autocomplete.go(production fix)