Skip to content

Enhance embed URL handling and validation system#4

Open
hussam789 wants to merge 1 commit intoembed-url-handling-prefrom
embed-url-handling-post
Open

Enhance embed URL handling and validation system#4
hussam789 wants to merge 1 commit intoembed-url-handling-prefrom
embed-url-handling-post

Conversation

@hussam789
Copy link

@hussam789 hussam789 commented Oct 30, 2025

User description

PR #4


PR Type

Enhancement


Description

  • Implement embeddable Discourse comments system with iframe support

  • Add RSS/ATOM feed polling for automatic topic creation and updates

  • Support raw HTML rendering for imported content with URL absolutization

  • Add embed controller with host validation and rate limiting


Diagram Walkthrough

flowchart LR
  A["External Site"] -->|embed_url| B["EmbedController"]
  B -->|validate host| C["TopicRetriever"]
  C -->|check cache| D["TopicEmbed"]
  C -->|poll feed| E["PollFeed Job"]
  E -->|parse RSS| F["SimpleRSS"]
  F -->|import content| G["TopicEmbed.import"]
  G -->|create post| H["PostCreator"]
  H -->|raw_html cook| I["Post"]
  I -->|render| J["Embed View"]
  J -->|iframe| A
Loading

File Walkthrough

Relevant files
Enhancement
14 files
embed_controller.rb
New embed controller for comment display                                 
+34/-0   
retrieve_topic.rb
Async job to retrieve embedded topics                                       
+24/-0   
poll_feed.rb
Scheduled job for RSS/ATOM feed polling                                   
+41/-0   
post.rb
Add raw HTML cook method support                                                 
+9/-0     
topic_embed.rb
New model for managing embedded topics                                     
+82/-0   
topic_retriever.rb
Topic retrieval with validation and throttling                     
+55/-0   
post_creator.rb
Support cook_method parameter in post creation                     
+1/-0     
post_revisor.rb
Add skip_validations option to post revision                         
+2/-2     
best.html.erb
View template for embedded comments display                           
+30/-0   
loading.html.erb
Loading state view for embedded comments                                 
+12/-0   
embed.html.erb
Embed layout with postMessage height resizing                       
+20/-0   
embed.js
Client-side embed script with iframe integration                 
+27/-0   
embed.css.scss
Styling for embedded comments display                                       
+69/-0   
disqus.thor
Refactor Disqus import to use TopicEmbed                                 
+1/-8     
Configuration changes
5 files
routes.rb
Add embed best comments route                                                       
+2/-0     
20131217174004_create_topic_embeds.rb
Create topic_embeds table with indexes                                     
+13/-0   
20131219203905_add_cook_method_to_posts.rb
Add cook_method column to posts table                                       
+5/-0     
20131223171005_create_top_topics.rb
Add force true to top_topics table creation                           
+1/-1     
site_settings.yml
Add embedding configuration settings                                         
+6/-0     
Formatting
1 files
20131210181901_migrate_word_counts.rb
Fix trailing whitespace in migration file                               
+2/-2     
Documentation
2 files
client.en.yml
Add embedding category to admin settings                                 
+1/-0     
server.en.yml
Add embed-related i18n strings and settings                           
+13/-0   
Dependencies
1 files
Gemfile
Add ruby-readability and simple-rss gems                                 
+4/-0     
Tests
4 files
embed_controller_spec.rb
Tests for embed controller functionality                                 
+58/-0   
poll_feed_spec.rb
Tests for feed polling job execution                                         
+40/-0   
topic_embed_spec.rb
Tests for topic embed model and import                                     
+48/-0   
topic_retriever_spec.rb
Tests for topic retriever validation logic                             
+46/-0   

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Clickjacking exposure

Description: X-Frame-Options is set to "ALLOWALL", which disables clickjacking protection and allows
the page to be framed by any site, enabling potential UI redress attacks.
embed_controller.rb [28-28]

Referred Code
  response.headers['X-Frame-Options'] = "ALLOWALL"
rescue URI::InvalidURIError
Insecure postMessage target

Description: The postMessage target origin is set to the unvalidated request referer, allowing a
malicious referer to receive sensitive data via window.postMessage.
embed.html.erb [7-15]

Referred Code
  (function() {
    window.onload = function() {
      if (parent) {
        // Send a post message with our loaded height
        parent.postMessage({type: 'discourse-resize', height: document['body'].offsetHeight}, '<%= request.referer %>');
      }
    }
  })();
</script>
Unvalidated external fetch

Description: External feed content and links are fetched and used without URL allowlisting or timeouts,
and HTML is unescaped and imported, potentially enabling SSRF or importing malicious
content.
poll_feed.rb [28-37]

Referred Code
require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

rss.items.each do |i|
  url = i.link
  url = i.id if url.blank? || url !~ /^https?\:\/\//

  content = CGI.unescapeHTML(i.content.scrub)
  TopicEmbed.import(user, url, i.title, content)
end
Raw HTML rendering

Description: Posts are created with cook_method raw_html and skip_validations true, allowing raw HTML
from external sources to be stored and rendered, increasing XSS risk if sanitization is
bypassed.
topic_embed.rb [22-37]

Referred Code
    creator = PostCreator.new(user, title: title, raw: absolutize_urls(url, contents), skip_validations: true, cook_method: Post.cook_methods[:raw_html])
    post = creator.create
    if post.present?
      TopicEmbed.create!(topic_id: post.topic_id,
                         embed_url: url,
                         content_sha1: content_sha1,
                         post_id: post.id)
    end
  end
else
  post = embed.post
  # Update the topic if it changed
  if content_sha1 != embed.content_sha1
    revisor = PostRevisor.new(post)
    revisor.revise!(user, absolutize_urls(url, contents), skip_validations: true, bypass_rate_limiter: true)
    embed.update_column(:content_sha1, content_sha1)
Weak origin check

Description: Origin check uses indexOf substring match instead of strict equality, which can be
bypassed by attacker-controlled domains containing the trusted domain as a substring.
embed.js [16-21]

Referred Code
if (!e) { return; }
if (discourseUrl.indexOf(e.origin) === -1) { return; }

if (e.data) {
  if (e.data.type === 'discourse-resize' && e.data.height) {
    iframe.height = e.data.height + "px";
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

🔴
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Raw HTML Rendering: Posts imported from external sources are rendered as raw HTML without explicit
sanitization, creating potential XSS via untrusted content.

Referred Code
# If there is no embed, create a topic, post and the embed.
if embed.blank?
  Topic.transaction do
    creator = PostCreator.new(user, title: title, raw: absolutize_urls(url, contents), skip_validations: true, cook_method: Post.cook_methods[:raw_html])
    post = creator.create
    if post.present?
      TopicEmbed.create!(topic_id: post.topic_id,
                         embed_url: url,
                         content_sha1: content_sha1,
                         post_id: post.id)
    end
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing Auditing: Critical actions like enqueueing topic retrieval and access validations do not emit audit
logs capturing user, timestamp, action, and outcome.

Referred Code
def best
  embed_url = params.require(:embed_url)
  topic_id = TopicEmbed.topic_id_for_embed(embed_url)

  if topic_id
    @topic_view = TopicView.new(topic_id, current_user, {best: 5})
  else
    Jobs.enqueue(:retrieve_topic, user_id: current_user.try(:id), embed_url: embed_url)
    render 'loading'
  end

  discourse_expires_in 1.minute
end
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Input Validation: External feed content (URLs and HTML) is parsed and imported without explicit validation
or error handling around network failures or malformed content beyond basic presence
checks.

Referred Code
def poll_feed
  user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
  return if user.blank?

  require 'simple-rss'
  rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

  rss.items.each do |i|
    url = i.link
    url = i.id if url.blank? || url !~ /^https?\:\/\//

    content = CGI.unescapeHTML(i.content.scrub)
    TopicEmbed.import(user, url, i.title, content)
  end
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Error Detail: Raised errors include specific reasons like 'invalid referer host' that may
reveal validation logic to end users if surfaced directly.

Referred Code
def ensure_embeddable
  raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
  raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host

  response.headers['X-Frame-Options'] = "ALLOWALL"
rescue URI::InvalidURIError
  raise Discourse::InvalidAccess.new('invalid referer host')
end
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Address critical stored XSS vulnerability

A new raw_html post type bypasses sanitization, creating a stored XSS
vulnerability from imported RSS/ATOM feed content. To fix this, all imported
HTML must be strictly sanitized before being stored.

Examples:

app/models/post.rb [133]
    return raw if cook_method == Post.cook_methods[:raw_html]
app/models/topic_embed.rb [22-23]
        creator = PostCreator.new(user, title: title, raw: absolutize_urls(url, contents), skip_validations: true, cook_method: Post.cook_methods[:raw_html])
        post = creator.create

Solution Walkthrough:

Before:

# app/models/post.rb
def cook(*args)
  # Bypasses sanitization for raw_html posts
  return raw if cook_method == Post.cook_methods[:raw_html]
  
  # Default cooking/sanitization
  Plugin::Filter.apply(:after_post_cook, self, post_analyzer.cook(*args))
end

# app/models/topic_embed.rb
def self.import(user, url, title, contents)
  # ...
  # Creates a post with raw HTML from a feed, without sanitization
  creator = PostCreator.new(user, 
                            raw: absolutize_urls(url, contents), 
                            cook_method: Post.cook_methods[:raw_html])
  # ...
end

After:

# app/models/topic_embed.rb
def self.import(user, url, title, contents)
  # ...
  # Sanitize the HTML content before storing it
  sanitized_contents = sanitize_html(contents)
  
  # Creates a post with the sanitized HTML
  creator = PostCreator.new(user, 
                            raw: absolutize_urls(url, sanitized_contents), 
                            cook_method: Post.cook_methods[:raw_html])
  # ...
end

# A new helper method for sanitization
def self.sanitize_html(html)
  # Use a library like Loofah or Nokogiri::HTML::Sanitizer
  # with a strict allow-list of tags and attributes.
  # Example:
  Loofah.fragment(html).scrub!(:prune).to_html
end
Suggestion importance[1-10]: 10

__

Why: The suggestion correctly identifies a critical stored XSS vulnerability introduced by bypassing the sanitization pipeline for raw_html posts, which is a major security flaw.

High
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments