Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Viz Calculations Being Dropped #11882

Open
jdunkerley opened this issue Dec 16, 2024 · 10 comments
Open

Some Viz Calculations Being Dropped #11882

jdunkerley opened this issue Dec 16, 2024 · 10 comments
Assignees

Comments

@jdunkerley
Copy link
Member

jdunkerley commented Dec 16, 2024

image

Reading a 1mm row Excel file, the visualization is not always rendered.
Interrupted exception also shows up (but thats another issue).

from Standard.Base import all
from Standard.Table import all
import Standard.Visualization

main =
    node1 = Data.read 'C:\\Users\\jdunk\\Downloads\\1m.xlsx' ..Sheet
    integer1 = node1.row_count
    table1 = node1.column_info

https://docs.google.com/spreadsheets/d/1hGx9W7qeIUvBvJMlR-a2fy6k_TtRFh_f/edit?usp=sharing&ouid=105802918667196204721&rtpof=true&sd=true

Logs:
enso-project-manager-2024-12-16.0.zip

@hubertp
Copy link
Collaborator

hubertp commented Dec 18, 2024

I can't seem to reproduce that problem. I would need to see it live, I'm afraid. Also, the exchanges between GUI and LS would be much appreciated.

@hubertp
Copy link
Collaborator

hubertp commented Dec 18, 2024

I think I got it now and also got a bit more logs.
I believe what happens is:

  1. Computation gets interrupted and visualizations get the missing method on exception, as reported in InterruptException/ClosedByInterruptException are still showing up in long running computations #11896
  2. That value becomes a bit sticky and somehow expression doesn't get updated
  3. Eventually, visualization will get calculated (not really dropped despite the title of this ticket) but it may take a reaaaaaly long time until that happens due to numerous interrupts

@enso-bot
Copy link

enso-bot bot commented Dec 19, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-12-17):

Progress: Investigating vis issues, can't seem to reproduce the main problem but created #11896 for a related issue. Continued doing small tweaks to native image - needs testing with electron. It should be finished by 2024-12-19.

Next Day: Next day I will be working on the #11896 task. Continue investigating the issue

@hubertp
Copy link
Collaborator

hubertp commented Dec 20, 2024

Just to prove my point, I can see execution of visualizations being done once

[org.apache.poi.openxml4j.opc.OPCPackage] The close() method is intended to SAVE a package. This package is open in READ ONLY mode, use the revert() method instead!

is being reported in the logs.

Note that:
a) in IDE it seems that table visualization of the read Excel file is already being shown for a while so it seems as if the computation hung
b) nothing is being reported in the logs (apart from usual ping/pong) for over 2 minutes and suddenly boom, The close() method.... method appears and execution of the visualization is finished.

@hubertp
Copy link
Collaborator

hubertp commented Dec 23, 2024

Another important note:
IDE issues a number of Standard.Visualization.Preprocessor.default_preprocessor requests on startup. While of only limited use, it provides a glimpse at the data. In the above example the default preprocessor will show first 10 results. Currently it seems that data is simply ignored by IDE and a user stares at a blank visualization until full results arrive (2 minutes later?). Similarly row_count - the results are available early on but they are not being displayed until the second visualization comes: Standard.Visualization.Table.Visualization.prepare_visualization.

@enso-bot
Copy link

enso-bot bot commented Dec 23, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-12-20):

Progress: Back to investigating slow visualizations after fixing #11896. Definitely seeing odd behavior when reading large files. Trying to deploy self-hosted runners for Bazel integration but blocked by permissions problem. It should be finished by 2024-12-23.

Next Day: Next day I will be working on the #11882 task. Continue investigating the issue.

@hubertp
Copy link
Collaborator

hubertp commented Dec 27, 2024

There isn't much that backend can do about UI not using the partial visualization results (will raise as a separate topic to GUI team).
But I noticed one important thing: in the presence of interruptions it may well be that we perform the same execution of expressions more than once, despite the presence of caches. This is due to the logic that runs cache invalidation when upserting the visualization. So when running long computations, we artificially may run them longer than necessary which makes things much worse.

The scenario is as follows:

  1. A long running execution is triggered and interrupted, meaning that the Data.read expression is not cached
  2. Re-trigger of that job is started and is well pass runtime hooks part
  3. Visualization upsert is triggered, and it sees that the expression is not cached so it triggers cache invalidation
  4. Execute job finishes, and caches the result of the long running expression
  5. Further execute job is triggered, this time for the visualization. It runs runtime hooks that clean the cache.
  6. We have to start Data.read again..

@enso-bot
Copy link

enso-bot bot commented Dec 30, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-12-27):

Progress: Cache invalidation done when upserting visualizations has a negative effect on long-running computations. There is a race between caching the initial result and immediately reseting it due to a schedule visualization job. That may significantly delay startup. There doesn't appear to be an easy way to disable it without loosing existing functionality. Need to weight on pros and cons. It should be finished by 2024-12-27.

Next Day: Next day I will be working on the #11882 task. Continue investigating the issue.

@enso-bot
Copy link

enso-bot bot commented Dec 30, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-12-23):

Progress: Continued investigating odd visualization problems. Surprisingly backend does send partial results to GUI with default_processor but the results aren't being shown in IDE. Additionally it looks like some computations are being done unnecessairly when interrupt is triggered. Needs more investigation after a break. It should be finished by 2024-12-27.

Next Day: Next day I will be working on the #11882 task. Continue investigating the issue.

@enso-bot
Copy link

enso-bot bot commented Jan 1, 2025

Hubert Plociniczak reports a new STANDUP for the provided date (2024-12-30):

Progress: Prepared a PR that disables cache invalidation when upserting visualization, at the cost of not being able to attach one to subexpression. This will need to be revisited with observables approach. Back to native image generation for Language Server. Got weird classpath failures when including database jars that need investigating. It should be finished by 2024-12-30.

Next Day: Next day I will be working on the #11882 task. Continue on native image, address PR review.

mergify bot pushed a commit that referenced this issue Jan 2, 2025
The possibility of attaching visualizations to subexpressions meant that we (currently) have to invalidate caches for their parent expression every time a request comes. That was an acceptable cost when an expression is relatively fast to compute but unacceptable when dealing with slow computations that would have to be repeated.

Since currently attaching visualizations is not used at all and we can rely on caching RHS and self, it is _safe_ to disable it. An observable pattern is better suited for visualizations and would mitigate this problem by design, something that we planned for a while in #10525

Should help with long running visualizations in #11882. Partial visualization results should be addressed on GUI side.

# Important Notes
For the example in #11882 we would at least re-read the large file at least twice, which adds 40-60seconds to the overall startup.

Exchanges before the change:
![Screenshot from 2024-12-27 16-52-46](https://github.com/user-attachments/assets/63e7a6db-73f8-48dd-a24a-a44e197e4ee6)

Responses after the change:
![Screenshot from 2024-12-30 12-18-28](https://github.com/user-attachments/assets/08020b1c-58f0-4c0f-b06d-1d904373b946)

Results in the same (final) data:
![Screenshot from 2024-12-30 12-24-02](https://github.com/user-attachments/assets/280f6ef5-6691-4744-b67a-2a0898f55b8b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New
Development

No branches or pull requests

2 participants