-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research and improve editor.getData() efficiency #5812
Comments
There are some performance issues already reported in integration repositories that would benefit from this improvement: |
I've been thinking about data pipeline's performance on various occasions recently. Even if we'd greatly improve the performance of loading or retrieving the data to/from the editor, then there will always be content that long to become a problem. I know that some of our customers ask about 1000 pages long documents. Of course, we can go from e.g. 5s to 2.5s (as we did recently), but that'll still long. What I was often recommending is loading just one chapter of such documents at once. This is probably the safest solution long-term. But there's also a second, complementary option – don't use the data pipeline at all. Use the editor model directly. I'd guess that the 2.5s would become less than 100ms at that point. The risk, though, is being tied to our data model's format. Also, breaking changes in the model structure would force data migration (there would be ways to automate this – e.g. use the data pipeline to migrate content via using the HTML format as the stable link between incompatible editor versions). @fredck, I know that you worked on data snapshots performance for GitHub Writer. Could you share some insights here? Did you go with saving the data model's structure directly (JSON/XML) or something else? |
Yes, I've spent some good time on this issue and came out with a complete solution here: The code should be self-explanatory. I save a snapshot of the model in a XML format for every single change made to the model. The time to retrieve the updated data is under 1ms. You can observe the changes being saved (and the data format) in the "Application" tab of the dev tools, under "Session Storage" while using GitHub Writer. |
Additionally, I've tried to implement a "Live View Data". I've even prototyped it and it could work well. Then I figured out that this is impossible to have because of the way features are implemented. The problem is that features don't care about "attribute" and "remove" events in the data downcasting and therefore the "live view" would not catch such changes. To make this feature a reality, all features should be reviewed and their data downcast should be as complete as the editing downcast, handling all kinds of changes. I can have the prototype somewhere for you guys to check it, if you want. Just let me know. |
I guess that this is exactly what @scofalik proposed to do and exactly the issue that he predicted. Unfortunately, it seems like a lot of work to review existing features in this context. Also, unless we start using this for real, the effort will go nowhere as with time features will stop supporting it. As for the live model data, I'm curious why you chose XML over JSON. JSON seems to be the standard for people today. Was readability/compactness a reason? |
To live it documented here, this is the prototype code I came with for introducing This worked very well while loading and inserting data in the editor (because features implement data downcasting for it). It stopped working when changing attributes and removing elements (because features don't implement data downcast for it). I implemented this as plugins over the current editor code. I imagine that bringing it into the engine directly could make the above prototype even simpler and clearer.
Yes, definitely. But I believe it's doable. Especially taking in consideration that the majority of the features share the same downcasting strategy for the editing and data pipelines. Which leads me to comment on the following:
Whenever possible (when editing and data strategy is the same), we could figure out a way to change both pipelines at the same time, instead of having them handled separately. Ofc, we should start with the hard features first (tables, lists), to see if it would ever be possible.
Well, I assume this would ideally replace getData() altogether, so we would stick with it forever. Summing up, this means a lot of work to be done. But I have a feeling that the outcome could be gorgeous and the benefits surreal ;) |
JSON definitely sounds more modern, especially in the web/js world. But XML is also quite universal, although that's not the reason for the choice. I had a few goals when designing this feature:
I started with JSON in mind, but it ultimately failed for points 1 and 2. For point 3, both JSON ( Let me try to explain why XML won over JSON for 1 and 2. The in memory version of the data is an array tree (arrays inside arrays). In this way we can easily map the position of model nodes to the same position in the array tree. For example:
As we can see, the positions are a 1:1 match. Then, we want to have an "almost string" representation of the data. With XML, this job is easy:
Every node is represented by an array: The above allows for simply calling Now, trying to do the above with JSON is much more complicated: // An empty element:
{ "type":"element", "name":"xyz" }
// or even this, if we want to help JSON:
{ "type":"element", "name":"xyz", "children":[] }
// Same element, 1 child:
{ "type":"element", "name":"xyz", "children":[ { "type":... } ] }
// Same element, 2 children:
{ "type":"element", "name":"xyz", "children":[ { "type":... },{ "type":... } ] } The last case is the problematic one... do you see that comma between the children nodes? That messes things up. Let me try to put the above in a "almost string" array, similar to the XML one: // An empty element:
[ '{ "type":"element", "name":"xyz", "children":[', '] }' ]
// Same element, 1 child:
[ '{ "type":"element", "name":"xyz", "children":[', [ '{ "type":... }' ], ' ] }' ]
// Same element, 2 children:
[ '{ "type":"element", "name":"xyz", "children":[', [ '{ "type":... }', ',', '{ "type":... }', ' ] }' ] Here goes the problem... the last option doesn't follow the So, instead of trying to go with a format that showed some complications, I've decided to go with the one that makes our life easier and our code faster :) |
Wanted to share performance results for a comparison I've just made about how fast it is to have a string out of an in-memory representation of the model data. I've tested in-memory formats for the following outputs:
I've created the following code. It's enough to paste it in the console and run it. It'll use the page DOM as the "model", just for comparison purposes: I've run it in a monster page (gazeta.pl) and these are results output: Firefox:
Chrome:
Running multiple times in Firefox had more stable results. In Chrome, they changed a lot: sometimes outerHTML was super slow, other times XML Array was the fastest... and some times super slow as well. ConclusionAnyway, the above could tell us that all strategies are good and should perform well in the CKEditor case. The "XML Array" is surprisingly the slowest by a small bit. This means that going with JSON is a better option as the code is less obscure, the output size is smaller and it performs similarly/better. My implementation of the Live Model Data could easily be modified to output JSON. I'll probably do so. |
Changing topic, I've got some other ideas about possible future enhancements to data in CKEditor:
This idea is inspired by what how smartly GitHub uses the web platform for its application. Indeed, if the features are there, why having a (poorer) custom implementation. Ofc, maybe today its easier to imagine this working than it was we designed CKEditor 5 a few years ago. Maybe? |
Wouldn't it make practically impossible to use model on Node? (this would be my first concern about it). |
@jodator There's JSDOM. That should work on Node fine assuming that JSDOM supports all the features that we'd use. I know, for instance, that there was no mutation observer for a long time. @fredck That's an interesting idea and it'd be interesting to research it one day. There'd be challenges, but there are some candies out there, for sure. BTW, the DOM can also be used for the view. Thanks for sharing your thoughts on the performance aspect too. I'm sure we'll have to get back to that at some point. We found 2x speed-up so far and I guess there's like a 5x possible if we go deeper into some algorithms. However, for large documents we simply need a different approach – additional pipeline. It could either work around the view completely, or just use a live view structure, like the editing view. |
FYI: the implementation of the Live Model Data has been changed to JSON: |
Hi -- apologies for bumping this up - but is there any work being done to try to improve |
@urbanspr1nter At this moment I can only say that we are changing our opinion on storing the model as the document data source and we are more optimistic towards this idea. It would solve some of our problems, including efficiency problems when loading and getting data from the editor. Most probably this will be the direction that we will take. If or when it happens is too difficult to say, as we have to take a lot into consideration. In last three years we have seen that the model "schema" is much more stable than we anticipated and we don't change nearly anything there. However, there's always risk that if you saved the model, it can become outdated. For example, recently we changed the name of However, these two problems do not exist if you store and pass model only for a duration of an editing session (it is not stored temporarily, or HTML is stored as well at some points and is used when the editor version or config changes). If your use case can implement safe model data usage, then I'd recommend giving it a try. Storing and loading model is much faster than the whole conversion process. Especially storing, it is fast and easy: JSON.stringify( Array.from( editor.model.document.getRoot().getChildren() ) ); Voila! For loading, you will need to recreate model from the JSON string and insert it into the root: class ModelInit extends Plugin {
init() {
this.editor.data.on( 'init', ( evt ) => {
evt.stop();
const modelJson = JSON.parse( modelString );
const modelStruct = makeModel( modelJson );
this.editor.model.enqueueChange( { isUndoable: false }, writer => {
const modelRoot = this.editor.model.document.getRoot();
writer.insert( modelStruct, modelRoot, 0 );
} );
this.editor.data.fire( 'ready' );
}, { priority: 'highest' } );
}
}
function makeModel( childrenJson ) {
const result = [];
for ( const childJson of childrenJson ) {
if ( childJson.data ) {
const text = new Text( childJson.data, childJson.attributes || null );
result.push( text );
} else {
const children = childJson.children && childJson.children.length > 0 ? makeModel( childJson.children ) : [];
const element = new Element( childJson.name, childJson.attributes || null, children );
result.push( element );
}
}
return result;
} Above shows how you could initialize the editor with a stringified model. If you just want to set the model data after you received it from external source, use this modified const modelJson = JSON.parse( modelString );
const modelStruct = makeModel( modelJson );
this.editor.model.enqueueChange( { isUndoable: false }, writer => {
const modelRoot = this.editor.model.document.getRoot();
writer.remove( this.editor.model.createRangeIn( modelRoot ) );
writer.insert( modelStruct, modelRoot, 0 );
} ); Finally, consider those two notes:
|
You can also take a look at this solution: #620 (comment) However it was proposed 3 years ago and I am not sure if it still up-to-date. Also, AFAICS, you will not be able to get HTML from the editor this way (as |
@scofalik - Very interesting! Thanks for the insight. This is an interesting option .. but I'm also wondering if the view data from the |
So, you basically suggest to do
However, I do not think this is a safe solution. Of course, it depends on what features you use. If you only use styling features, headings and lists then maybe that would work. Widgets (images, tables) will add UI elements that may be incorrectly converted. We also put some things (like hidden spaces) to fight browser quirks, and then you would have them in your content. I would not recommend that. We may put really "weird" stuff into editable to provide UI or accessibility or to fight browser quirks. Also, remember that |
@scofalik - That's a good point. I did not consider the injected content. We actually keep things a bit simple and try to keep our editing view and data view more or less, the same. We don't make use of UI elements, however we do use the table and image plugins to a certain degree. We also inject hidden spaces (ZWS) ourselves to manage the browser input behaviors. However, when we finally need to get the data, we generally do a global replace to remove them anyway. I assume that our operation in doing so is already probably doing that to whatever the library is also injecting. That is very interesting though - I had never considered in leveraging the model as a way to serialize/deserialize the editor data. This would normally work for our use case, but we also fan out editor data to subscribers within our app so that they can make use of data too. Also! Thank you for being so quick to respond and actually responding to such an old thread. It's really appreciated and I learned a lot just from reading your response. 😁 I'm going to revisit our implementation in how we serialize the editor data, and perhaps can come up with something creative there. I totally understand that |
I think some way to expose the HTML data in real time is a crucial feature. And since the issues with not having this only show up with large documents (guess what developers are unlikely to use / test with when integration CKEditor5? ;)) they tend to show up later, possibly even only in production. Basically at a time where finding a workaround is a mess (and waiting for upstream to improve things won't do any good in the short term). Is there any reason why the editor doesn't have an option to always keep a HTML version internally? Since it probably already tracks all changes to its modal in order to show the editable view, it sounds pretty straightforward - for someone familiar with the CKEditor internals - to also keep an up to date HTML version that's in sync with the model, and could be used to populate e.g. a hidden field for more classic forms or the state of a react form library (e.g. react-final-form). |
@scofalik your solution solved my issue . But the cursor always is first of ckeditor field. Do you have any idea to keep the cursor at current position ? I tried : |
editor.model.change((writer) => {
writer.setSelection(writer.createPositionAt(editor.model.document.getRoot(), "end"));
}); Should work correctly. I even checked it in a test sample. It correctly moves the selection. So, it must be something around this code. But this is an offtopic to this issue. |
It is a better solution to replace |
I would be cautious and advise against using |
In recent releases, we worked on performance improvements in engine, mostly from the point of view of editor init time. More on that in recent changelog entry. In summary -- we lowered editor init time by around 65-80%, depending on the data sample we had. The good news is that these changes also have a positive impact on Worth mentioning -- these changes are not brekaing and do not change the editor architecture, or how the core engine parts work. More precisely, the data is still evaluated "from scratch" when We still have one big improvement ahead -- scheduled for February release. As we measure on our test samples, together with the improvement mentioned above, we see 70-90% lower We hope that these changes scheduled for February will solve many problems that you might experience so far. |
At the moment, whenever
getData()
is called, the whole data pipeline is used to build the result from a scratch. This is very inefficient in case of big documents. Generating output data takes a visible amount of time (>0.5s) starting from documents of 15+ A4 pages. It means there's annoying lagging.This causes problems for:
Autosave
plugin,Watchdog
feature,For really big documents it may also provide UX loss simply when clicking the "Save" button in a form. Consider browser stopping being responsive, or needing to take an additional 5 seconds before sending HTTP request. And I am using a relatively fast MacBook, so in case of a typical user, it could be even worse.
As I pointed out, the biggest reason for that inefficiency is building the data view from scratch on
.getData()
. If it would be kept up-to-date in the memory, just like the editing view, it wouldn't take that much time to performgetData()
. So, it would basically work like the editing pipeline but with different converters. Maybe there's even way to quickly prototype it just by using theEditingController
in place of theDataController
🤔?Keep in mind, that would be quite a breaking change. I think this is a huge issue that will demand a lot of testing.
Other than that, AFAIR, we have might think about improving performance in view downcast writer, downcast dispatcher and possibly other parts of the data pipelines. However, there's a ceiling for how much can be achieved this way. So, unfortunately, making the data pipeline "live" is something we will have to do one day.
There's one thing to note, though. When we will introduce the "live" data pipeline, every action done in the editor will take twice as much time. As we add features, callbacks, post fixers, etc. to the editor and the number of things in the event loop rises, we may hit an efficiency issue from another side (for example, typing becomes less responsive).
There's a possibility that we may end up with two data pipelines: "live" and "on-demand" (that's the one we have now). We could either give the possibility to explicitly use one of them through configuration or through preparing appropriate editor build. But we could also try switching them as the document size grows.
If we hit performance issues with the "live" data view, we could also maybe think about batching changes and updating the data view less frequently than the editing view.
The text was updated successfully, but these errors were encountered: