Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 81: StreamField-based rich text #81

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions text/081-streamfield-based-rich-text.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# RFC 81: StreamField-based rich text

* RFC: 81
* Author: Matthew Westcott
* Created: 2022-11-24
* Last Modified: 2022-11-24

## Abstract

This RFC proposes a new implementation of rich text that leverages the StreamField data model for managing content as a sequence of blocks, while preserving the familiar Word-like user interface as closely as possible.

## Specification

### Rationale

Content editors often have an aversion to any editing interface that doesn't have the look and feel of Microsoft Word. No matter how much better StreamField is than rich text on technical grounds as a basis for managing content - as long as rich text feels like a Word document and StreamField doesn't, editors will demand rich text, and developers will often have limited ability to push back. Consequently the site ends up failing to benefit from the feature set of StreamField (more diverse content types, better control of front-end rendering, data exports in a structured format, and so on), making Wagtail look less capable than it really is.

### Proposal

Reimplement rich text with a Word-like user interface, but with StreamField as the underlying data model.

### The theory

Rich text, as implemented by Draftail / draft.js, has a two-level data model. At the top level, it is a sequence of block-level elements - headings, paragraphs, list items, block quotes, images, embeds - of which some have text-based content, and some do not. This sequence is a flat list, with no concept of nesting elements (hierarchical lists are implemented by giving each item a 'depth' attribute instead).

Each text-based block then consists of a plain text string, along with a list of styles (bold, italic, underline, strikethrough, subscript, superscript) and inline entities (primarily links, but could be anything that attaches arbitrary properties to a span of text, such as footnotes, stock symbols, usernames, or custom emoji) to apply to specified character ranges within that string. For example, a paragraph might be represented as:

{
block_type: "paragraph",
text: "A wagtail is a bird.",
styles: [
{type: "bold", offset: 2, length: 7},
]
entities: [
{
type: "external-link", offset: 15, length: 4,
attributes: {url: "https://en.wikipedia.org/wiki/Bird"}
},
]
}

(This example is inspired by the contentState format used by draft.js, but does not follow it rigorously.)

The "sequence of block-level elements" aspect maps well to StreamField; the inner "styled text" representation does not. As such, we will introduce a new block type, named ParagraphBlock, to handle formatting within a single paragraph element (or other block-level element, such as a heading). This behaves similarly to an existing RichTextBlock, but only allows inserting inline styles and entities, not new block-level elements. In its native form (as seen when it is a child of a StructBlock, for example), pressing enter (to insert a new paragraph) has no effect - however, the 'capabilities' mechanism (as currently used for splitting blocks) will introduce this ability.

The single-paragraph rich text editor will also be available as a standalone form widget for use in non-StreamField content that requires a single paragraph of rich text, such as article intro copy. When outputting this value on a template, the outer `<p>` element will not be included as standard, allowing the template author to specify their own markup such as `<p class="introduction">{{ page.introduction }}</p>`.

It is yet to be determined whether this rich text widget will be implemented with Draftail, some other editor component, or an entirely custom implementation based on the browser's contentEditable support. Given the need to integrate tightly with StreamField logic in areas such as keyboard control, and the reduced scope (only having a single block-level element to manage), there may be limited value in using an off-the-shelf component.

ParagraphBlock will accept a `features` keyword argument to define the set of elements allowed, but only features corresponding to inline styles and entities will be meaningful.

### Multi-paragraph editing

A regular multi-paragraph rich text field will be implemented as a StreamField where a ParagraphBlock is one of the available blocks, along with other block types (such as image or video embed) as defined by the field's `features` argument. Other text-based block-level elements (such as headings and blockquotes) will be defined as additional distinctly-named instances of ParagraphBlock, with appropriate styling. For example, a RichTextField defined as:

body = RichTextField(features=['bold', 'italic', 'image', 'link', 'h2', 'h3'])

would be functionally equivalent to:

body = StreamField([
('paragraph', ParagraphBlock(features=['bold', 'italic', 'link'])),
('image', ImageBlock()),
('h2', ParagraphBlock(features=['bold', 'italic', 'link'])),
('h3', ParagraphBlock(features=['bold', 'italic', 'link'])),
])

(where ImageBlock is a StructBlock consisting of an image chooser, alt text field and alignment selector)

Similarly, a RichTextBlock inside a StreamField can be translated to a StreamBlock definition.

It is crucial that editing actions spanning multiple paragraph blocks can be performed without visibly leaving the context of an existing ParagraphBlock widget - for example, pressing enter should create a new ParagraphBlock, without the need to explicitly insert one from a menu. To do this, ParagraphBlock will make use of the StreamField 'capabilities' mechanism to identify that it is contained within a parent block that manages a sequence of children and exposes various API methods for splitting and inserting blocks. When these API methods are available, ParagraphBlock will configure itself with additional keyboard controls and menu items to take advantage of them.

### Keyboard interactions - navigation

ParagraphBlock will expose API methods that allow the block to be focused and the caret placed at the start or end of the text - a capability known as "end-focusable". ListBlock, StreamBlock and StructBlock can also easily implement this capability, if their child blocks are end-focusable themselves - by making the corresponding API call to the first or last of their children as appropriate.

ListBlock, StreamBlock and StructBlock will provide capabilities that allow child blocks to check the capabilities of the previous and next blocks in the sequence.

If the up or left cursor key is pressed while the caret is at the start of a ParagraphBlock, and the previous block in the sequence is end-focusable, then the previous block will be given focus with the caret placed at the end.

If the down or right cursor key is pressed while the caret is at the start of a ParagraphBlock, and the next block in the sequence is end-focusable, then the next block will be given focus with the caret placed at the start.

### Keyboard interactions - block insertion and deletion

If the enter key is pressed while a ParagraphBlock is focused, and the parent block allows insertion of new blocks (as StreamBlock and ListBlock do, but not StructBlock), the ParagraphBlock will be split at the caret position into two ParagraphBlocks of the same type, and the second one will be given focus with the caret placed at the start.

If the backspace key is pressed while the caret is at the start of a ParagraphBlock, and the parent block allows deletion of blocks, and the previous block in the sequence is also a text-based block, then the content of the current block will be appended to the previous block, the current block will be deleted, and the previous block will be given the focus at the start of the newly-moved text.

(These rules are not exhaustive - others may be added, such as the ability to delete an embedded image/video by backspacing from the paragraph after it.)

### Toolbars and changing block type

ParagraphBlock will provide a toolbar for inserting inline styles, inline entities and blocks. The design for this is yet to be determined. However, it is proposed that it should be permanently visible while the block is focused, positioned at the top of the block - or if the ParagraphBlock is one of a sequence (within a StreamBlock or ListBlock), at the top of the first block in the sequence. If this would result in the toolbar being off-screen, it will be anchored to the top of the screen instead.

A StreamBlock will allow a block within it to read the list of available block types; ParagraphBlock will use this to populate the toolbar with buttons for inserting those block types.

On clicking one of those toolbar buttons - or some other equivalent action such as entering the '/' or '#' shortcuts - if the chosen block type is text-based (e.g. a heading or blockquote), the active ParagraphBlock will be replaced with a block of that type, populated with the previous ParagraphBlock's content.

If the chosen block type is not text-based (e.g. an image), the active ParagraphBlock will be split into two at the caret position (deleting any selected text), and a new instance of the chosen block will be inserted between them and focused.

### Undo/redo

The StreamField as a whole (or potentially the whole edit form) will maintain an undo / redo buffer so that block deletions and insertions can be undone, rather than just edits within a single paragraph block.

### Copy and paste

Content pasted into a ParagraphBlock will be intercepted, where browser capabilities allow, and split into paragraphs. If the parent block allows block insertion, as many new blocks will be inserted as necessary to fit the content. Where possible, the markup / style of each pasted paragraph will be matched to the most suitable block type out of all the ParagraphBlock types available on the container - headings, blockquote and so on. (If the parent block does not allow block insertion, only a single paragraph will be pasted - either cutting off after the first paragraph, or concatenating everything into one paragraph.)

### Data representation

This change will mean that the in-database representation of a RichTextField or RichTextBlock will change from the current HTML-like string to the StreamField JSON format.

The data format for an individual ParagraphBlock is also up for consideration: while it could feasibly adopt the existing HTML-like string format, there is probably a strong case for taking this opportunity to switch to a JSON format like contentState. Originally the HTML-like format was chosen to minimise the processing required to transform it into real front-end HTML, but over time additions such as commenting and `data-block-key` attributes have indicated a need to capture information within rich text that isn't reflected in the front-end rendering, and simple regexp replacement increasingly feels like too blunt an instrument for this. It's also arguably a good thing for developers to move away from the mental model of rich text as a "flavour" of HTML - so that features with only a loose relation to HTML (e.g. footnotes) do not have to be approached from the angle of HTML, and common questions of the "how do I enable `<button>` as an allowed element in rich text" variety are asked at the correct level of abstraction.

Retrieving the value of a RichTextField or RichTextBlock in user code will now yield a StreamValue. For simple template rendering cases, this should work with no code changes - a tag such as `{{ page.body }}` will return the StreamValue's `__str__` representation, which will be the HTML rendering as desired. The `|richtext` filter will become redundant, and we would encourage authors of new code to use `{% include_block page.body %}` instead. The native value type of ParagraphBlock will also be a custom Python type with a `__str__` method returning an HTML rendering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Retrieving the value of a RichTextField or RichTextBlock in user code will now yield a StreamValue. For simple template rendering cases, this should work with no code changes - a tag such as `{{ page.body }}` will return the StreamValue's `__str__` representation, which will be the HTML rendering as desired. The `|richtext` filter will become redundant, and we would encourage authors of new code to use `{% include_block page.body %}` instead. The native value type of ParagraphBlock will also be a custom Python type with a `__str__` method returning an HTML rendering.
Retrieving the value of a RichTextField or RichTextBlock in user code will now yield a StreamValue. For simple template rendering cases, this should work with no code changes - a tag such as `{{ page.body }}` will return the StreamValue's `__str__` representation, which will be the HTML rendering as desired. The `|richtext` filter will become redundant, and we would encourage authors of new code to use `{% include_block page.body %}` instead. Template rendering of rich text using `{% include_block %}` will pass context variables from the calling template. The native value type of ParagraphBlock will also be a custom Python type with a `__str__` method returning an HTML rendering.

Supporting {% include_block %} for rich text opens up the promising possibility of passing template context to rich text rendering, which is not possible with the current | richtext filter. This would be very useful to, for example:

It sounds like this proposal would work similarly to how StreamField block rendering does: writing {{ my_block }} does not pass context variables, but {% include_block my_block %} does. Do you think it'd be worth explicitly calling that out here?


It's very likely that there will be some incompatibilities with legacy code that expects a string value (e.g. code that performs string replacement or slicing on it, or queries the field with an `__icontains__` filter), and so this will most likely entail a major version bump of Wagtail.

To accommodate existing API clients that consume the legacy HTML-like string format, we will provide a DRF serialiser to translate the StreamField JSON format back to the HTML-like format - and, if possible, configure our API endpoints to use this by default for `/api/v2/` endpoints. It may be worthwhile to introduce a v3 API at this point that serves rich text fields in the new JSON format.

## Open Questions

...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few additional questions, I'll try to do a full review later.

  • Confirming - this RFC's goal is to replace Draftail - or provide a second richtext interface
  • How will this impact or work with the RFC Outline RFC 46: Single paragraph rich text fields/blocks #46 & RFC 60: Draftail Usage for General Text Entry #60
  • Does the Wagtail core team / Torchbox really want to take on the responsibility of a full JS text editor build. It could be quite a complex endeavour - with even Facebook (Meta) giving up and starting from scratch recently with Draft.js to Lexical
  • Should we consider building on a library like ProseMirror or even TipTap.js that provide the be building blocks for a JS rich text editor without the UI baggage. https://tiptap.dev/ (their data model is quite sound and has support for block level / inline level concepts and plugins).
  • Will this RFC aim to resolve many of the outstanding bugs with Draftail?
  • What about projects that don't want to buy in / go all in too much with StreamField?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Confirming - this RFC's goal is to replace Draftail - or provide a second richtext interface
  • What about projects that don't want to buy in / go all in too much with StreamField?

(These two are best tackled together I think...)

I definitely want to position this as the single path forward for rich text, rather than an "opt-in" alternative to Draftail. The goals of this RFC are twofold:

  • to preserve the MS-Word-like UX of rich text closely enough that editors do not feel a need to stick with Draftail out of familiarity;
  • to make the programming model transparent enough that developers can use it without having to explicitly understand or buy in to StreamField as a concept.

To expand on the latter point: RichTextField (or some direct replacement for it) will still exist, and happens to be implemented as a StreamField with a certain pre-baked configuration, but that's just an implementation detail. The editor will have a look and feel that's as close as possible to "classic" rich text, and writing {{ page.body }} on a template will render the expected HTML markup. If they then want to go beyond the stock rich text functionality - such as inserting CTA snippets into the flow of text, say - then that's the point where they'll learn about StreamField blocks and swap out the RichTextField for a formal StreamField definition.

I'm hopeful that we can systematically address any remaining technical and UX hurdles that would give any reason to stick with Draftail. Inevitably there'll still be a certain amount of organisational inertia, and so I think it's important to the success of the exercise that we pitch it as "this is how rich text will work from now on" rather than "here's an alternative to rich text that you might want to try".

I don't have a firm idea of how the transition will work, and so there could still be some period of Draftail existing alongside the new implementation - perhaps something like the use_json_field flag where there's a certain window of time to perform the migration but developers have to do it sooner or later.

(I'm also not totally ruling out the possibility that Draftail could still be the basis of the new implementation, if it proves to be capable of the kind of custom integrations we need.)

That's a good question! Clearly the ideas in RFC 46 are a key building block for this bit of development, but I don't know whether that's an argument in favour of building a Draftail-based implementation of single-paragraph rich text as a stepping stone towards this, or for starting right away with a new rich text implementation. There's probably a certain amount of overlap with RFC 60 too, although the new data model and details of interactions between adjacent blocks don't apply there.

  • Does the Wagtail core team / Torchbox really want to take on the responsibility of a full JS text editor build. It could be quite a complex endeavour - with even Facebook (Meta) giving up and starting from scratch recently with Draft.js to Lexical
  • Should we consider building on a library like ProseMirror or even TipTap.js that provide the be building blocks for a JS rich text editor without the UI baggage. https://tiptap.dev/ (their data model is quite sound and has support for block level / inline level concepts and plugins).

We should absolutely consider those, yes. Very keen to get your input on this - I haven't done any proper evaluation of the existing rich text solutions that are out there, and I realise there's a danger of falling victim to Not Invented Here syndrome and taking a "how hard can it be, really" stance that then comes back to bite us :-) Right now my inclination towards writing our own editor is based on these thoughts:

  • Given that we're tightly scoping the rich text widget itself to just dealing with the inline entities within a block-level element, and all of the 'fancier' behaviours like headings and toolbars are handled outside of that, in StreamField territory - I feel that there's probably not a huge amount of added value that off-the-shelf libraries can provide beyond what browsers give us as standard through the contentEditable API.
  • Whichever route we take, the hardest parts are probably going to be UI behaviour at paragraph boundaries, undo/redo, and handling pasted content - all of which demand the deepest integration with the StreamField mechanism and the most extensive custom code. It's arguably going to be easier to approach that from a clean slate, rather than grapple with someone else's API with baked-in assumptions about how those should be handled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the epic and thoughtful feedback here - nothing too much to add beyond us trying to leverage some of the building blocks in the JS land for text editing. However, I agree that the underlying library here can be an implementation detail.

I also recommend you look at https://grapesjs.com/ - not to use it but to get some inspiration of another approach to the 'block' building concept and how the data can be stored/worked with. Note: This very much heads into the WYSIWYG direction which I think should be avoided, but it's a good approach for it's use case nonetheless.