-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replaced repeated progress() calculation calls with a variable #4256
base: 0_15
Are you sure you want to change the base?
Conversation
progress() is called in setPixelColor(), calculating the transition progress for each pixel. Replaced that call with an inline function to get the new segment variable. The progress is updated in service() when handleTransition() is called. The new variable is in a spot where padding is added, so this should not use more RAM. Result: over 10% increase in FPS on 16x16 matrix
wled00/FX_fcn.cpp
Outdated
@@ -317,12 +317,12 @@ void Segment::stopTransition() { | |||
} | |||
|
|||
// transition progression between 0-65535 | |||
uint16_t IRAM_ATTR Segment::progress() const { | |||
void IRAM_ATTR Segment::updateTransitionProgress() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could use IRAM_ATTR_YN
here - it means that esp32 puts the function into IRAM, while 8266 doesn't. We'll save some IRAM space especially on the "_compat" builds.
As the function is only called once per frame in WS2812FX::service() - via seg.handleTransition() - it might even be better to remove the IRAM_ATTR as this call is not performance critical any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about removing the attribute but left it as is since I have no way to check the difference. I once as a test removed all IRAM_ATTR and on my setup there was zero performance change.
I think removing it here is safe, as you say, this is only called once per frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
esp8266 will thank you ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe WLED_SAVE_IRAM
should also be defined on ESP32 C3: it is not as performant as the ESP32 any way and putting stuff in IRAM uses a lot more flash for some reason. If I enable WLED_SAVE_IRAM
on the C3 that saves 1.6k of flash. On the ESP32 it only saves 68 bytes of flash.
Any suggestions for performance tests that would show if this is a valid option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any function marked with IRAM_ATTR will always be kept in fast SRAM and will never be fetched from flash. The basic idea for IRAM_ATTR is to be used in ISR or functions that may access (write to) flash directly.
The benefit of using it elsewhere is to speed up access to such function as it will never go to cache hit/miss logic.
Contrary to what @softhack007 is saying I still think adding IRAM_ATTR to functions that are called very often is beneficial. I am telling this from experience with over 50 installed ESP8266's with various options and usermods installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to what @softhack007 is saying I still think adding IRAM_ATTR to functions that are called very often is beneficial.
It might be beneficial, however we talked about the new progress()
that's only called a few hundred times per second (max) now. The function is not time critical any more with this PR, so why use IRAM_ATTR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(side-topic)
I once as a test removed all IRAM_ATTR and on my setup there was zero performance change.
This aligns with my own experiments on -S3 and esp32 with 80mhz flash - no noticeable performance impact, however sometimes IRAM_ATTR increases program size. This can be explained because the compiler cannot inline such a function, even when there would be a benefit for program size.
Maybe it's also depending a lot on the CPU caches. In fact a function that's called really often has a good chance to be cached by the CPU already. Also a board with fast flash (qio 80mhz) is like 4x faster on flash reading, compared to slow flash (dout 40mhz).
Many cheap 8266 still have 40mhz dout, plus smaller caches, so it makes sense that there is still some benefit of IRAM_ATTR on boards with slow flash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no doubt that any ESP32 performs adequately without IRAM_ATTR.
However ESP8266 is another thing and while it may be old and lacking it is still used by many users (including me) who keep attaching plenty of peripherals to it while running WLED. Hence I strongly urge to keep IRAM_ATTR as many times as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test if this PR with latest commit has any impact on ESP8266? i.e. removed IRAM_ATTR
from updateTransitionProgress()
The other question was to add WLED_SAVE_IRAM
to C3 builds as it would save on flash size but may have negative performance impacts in certain situations / setups. If there is a way to test that on the C3 I could check. After all, the C3 is more of an upgraded ESP8266 compared to other ESP32 variants.
`updateTransitionProgress()` is called only once per frame, no need to put it in RAM.
@@ -363,6 +363,7 @@ typedef struct Segment { | |||
}; | |||
uint8_t startY; // start Y coodrinate 2D (top); there should be no more than 255 rows | |||
uint8_t stopY; // stop Y coordinate 2D (bottom); there should be no more than 255 rows | |||
uint16_t transitionprogress; // current transition progress 0 - 0xFFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a static
variable as it will be modified in each handleTransition()
call at the start of service()
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the current behaviour if for example the palette of one segment is changed with 10s transition, and after 5s palette of a second segment is changed? Will that stop transition of the first segment? i.e. are transition start/stop per segment or global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transition time is bound to strip
ATM but each segment may start transition at its own (point in) time. So each segment should transition on its own (that was my goal when implementing current transitions, compared to previous, limited transitions) independent from others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a
static
variable as it will be modified in eachhandleTransition()
call at the start ofservice()
loop.
Rather than make it static, I'd say put it into strip
.
Static member attributes of a class are always a good source of confusion when trying to read code written by someone else - because a 'static' member is technicially not even part of the object you work with. It survives delete segment
, changing it in one object instances also changes the value in all other instances. Coping one segment to another will not create a copy of static members attributes - there will still be only one value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than make it static, I'd say put it into
strip
.
That would be a worse choice IMO. It can be considered in the same way as _vLength, etc in speed improvements branch by @DedeHai . It is used as a speedup, pre-calculated value.
If you think this will cause confusion, keep it as an instance member rather than strip
member.
But that is just my opinion, no need to take it into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If individual segment transition is to be kept/extended in the future, this should not be static but per segment. Also there is no RAM impact currently as there are 3 bytes of padding added in that space anyway.
progress() is called in
setPixelColor()
re-calculating the transition progress for each pixel.Replaced that call with an inline function to get the new segment variable. The progress is updated in service() when
handleTransition()
is called. The new variable is in a spot where padding is added, so this should not use more RAM.Result: over 10% increase in FPS on 16x16 matrix during transitions