-
Notifications
You must be signed in to change notification settings - Fork 401
Parameters and Return Values
What is there to say about parameters and return values? Quite a bit as it turns out!
As with everything else, it helps to understand how things work "under the hood" in order to motivate the recommendations.
One important "under the hood" nugget is that x86_64 architecture has 16 general purpose registers that can each hold a 64-bit scalar. These registers are used for local variables, intermediate results, and for passing parameters to functions and grabbing returned results. The x86_64 ABI (application binary interface) specifies that the first six parameters that can fit into registers are passed via six specific registers (doesn't matter what they are, just that there are six of them) and the rest are passed via the stack (which is a part of memory). The same ABI specifies that a single 64-bit result can be returned via a register but that larger results must be returned via the stack. Registers are faster than memory--there is overhead of an additional instruction in both the caller and callee, and these additional instructions are a STORE
and a LOAD
which are somewhat more expensive than instructions like ADD
--and so the goal is to maximize the use of registers and minimize the use of the stack in the function call process.
Another "under the hood" nugget is that passing parameters by pointer/reference interferes with compiler optimizations in the caller because the caller cannot keep a variable that is passed by pointer/reference in a register across a call. There is no such thing as a pointer/reference to a register and so the caller has to keep the variable in memory before the call--this may not cost anything if the variable is already be in memory (e.g., on the heap as part of state
) but it may if it is just a simple local variable. In addition, the callee could have changed the value of the variable and so it has to be re-LOAD
-ed into a register after the call. The upshot here is that parameters should be passed by value as much as possible.
Let's get to the meat and potatoes.
There are three basic ways to pass parameters, by value, by reference, and by constant reference, and here are the rules for them.
- Scalars that are strictly inputs to a function should be passed by value.
- Non-scalars (i.e., objects) that are strictly inputs to the function should be passed by constant reference (i.e.,
const &
). The two exceptions to this arestd::string_view
andgsl::span
which have two members each and are themselves essentially references. - Scalars that are outputs of a function should be returned via the return value as opposed to reference (i.e.,
&
) parameters. - Scalars that are outputs of a function but cannot be returned via the return value (e.g., because another scalar is already being returned) and non-scalar outputs should be passed as reference (i.e.,
&
) parameters.
Are we done? Mostly, but there are some nuances.
What if you have more than six scalar (i.e., register eligible) parameters. If a number of them are input parameters, are you better off collecting some of them into an object and passing that object by constant reference or passing the parameters individually by value understanding that some of them will have to be passed via the stack? This is not empirical but my sense is you are better off collecting some of them into a object--if it makes logical sense to do so--and passing that object by reference. Compilers have no latitude to optimize the calling convention or ABI, it is what it is. They have more freedom and are getting better at optimizing local object storage. This would be easier if EnergyPlus classes had better internal organization and made heavier use of "sub-objects" as opposed to be flat lists of fields. A canonical example of this is the PlantLocation
struct. This is a logical group of variables, but too often they are represented as individual variables and passed to functions individually rather than by constant reference (or by reference if they need to be written). Representing these as structures more pervasively would facilitate passing them as a structure, reducing function call cost. Here is a canonical example from EnergyPlus:
void
SetComponentFlowRate(class EnergyPlusData &state,
Real64 &CompFlow, // [kg/s]
int const InletNode, // component's inlet node index in node structure
int const OutletNode, // component's outlet node index in node structure
int const LoopNum, // plant loop index for PlantLoop structure
int const LoopSideNum, // Loop side index for PlantLoop structure
int const BranchNum, // branch index for PlantLoop
int const CompNum) // component index for PlantLoop
This function has eight parameters which means that the last two, BranchNum
and CompNum
have to be pushed to the stack by the caller and that SetComponentFlowRate has to pop them off. Now, the quartet LoopNum
, LoopSideNum
, BranchNum
, and CompNum
appear together so often that they are already collected in a struct
.
struct PlantLocation {
int LoopNum;
int LoopSideNum;
int BranchNum;
int CompNumNum;
};
We could then reimplement the function this way.
void
SetComponentFlowRate(class EnergyPlusData &state,
Real64 &CompFlow, // [kg/s]
int const InletNode, // component's inlet node index in node structure
int const OutletNode, // component's outlet node index in node structure
struct PlantLocation const &ploc) // component index for PlantLoop
)
But wait a minute, you say. Now SetComponentFlowRate
has to load all four members from PlantLocation
. Now it has to load four elements, instead of the two it loaded previously off the stack. That is true. But in the previous example, the caller had to load four elements into registers whereas in this one it has to load only one (the address of the PlantLocation struct
) and actually loading address into a register usually does not require actually loading something from memory, it usually requires adding a constant offset to an address in another register and register-to-register ADD
is cheaper than a memory-to-register LOAD
. If you count caller and callee together--and after all you have to, because you need both for a function call--then the example with PlantLocation struct wins four LOAD
s plus one ADD
to six LOAD
s.
It's even more of a no-brainer to collect a bunch of output parameters into a struct
and pass it by reference. If you have multiple output parameters, you are already committed to returning them via memory, and so you may as well combine them into an object. On top of this, you only need to explicitly pass the address of the object rather than the address of every field in the object. The compiler knows the relative positions of fields in the objects and in LOAD
and STORE
instructions constant offset calculations are essentially "free", i.e., it is no more expensive to STORE
a value at a constant offset to an address than it is to store it at that address directly. Collecting related fields into objects for input parameter purposes is usually helpful when it reduces the number of parameters over six. Collecting related fields into objects for output parameter purposes is helpful always.
void
ScanPlantLoopsForObject(class EnergyPlusData &state,
std::string_view CompName,
int const CompType,
int &LoopNum,
int &LoopSideNum,
int &BranchNum,
int &CompNum,
bool &errFlag)
This function actually has a couple of optional arguments that we are ignoring for now. In this implementation, the caller has to execute four ADD
s to place the addresses of LoopNum
, LoopSideNum
, BranchNum
, and CompNum
into registers. It then has to store the addresses of CompNum
and errFlag
on the stack because those are parameters seven and eight and there are only six parameter registers. The callee of course, has to LOAD
the addresses of CompNum and errFlag from the stack.
In the implementation below, the caller has to execute only one ADD
to place the address of ploc
into a register, and the stack is not used at all since the number of parameters is five (which is less than six).
In both cases, ScanPlantLoopForObject
has to execute four STORE
s.
void
ScanPlantLoopsForObject(class EnergyPlusData &state,
std::string_view CompName,
int const CompType,
struct PlantLocation &ploc,
bool &errFlag)
If you need to return multiple values, which one should you return as the return value? Should you use std::pair
or std::tuple
to return multiple values? This is a good discussion for a future EnergyPlus Technicalities meeting.
Here's a religious argument for you, the pointer vs. reference argument. The C-language had only pointers, references are a C++ construct. And they are probably the worst C++ construct at that. What is the difference between a reference and a pointer, you ask? Nothing except for syntactic sugar and the fact that references can technically not be nullptr
. Other than that, all a reference does is obscure the fact that something is actually a pointer. If it were up to me--and it may be--I would say that arguments should be passed by pointer rather than by reference. This would require a &
in front of the argument at the call-site, making it obvious that the argument is being passed by pointer rather than by value. And it would require the use of *
or ->
inside the function, again making it clear that we are dealing with a parameter that was passed by address rather than by value.
Although I am against references in general, I am particularly against this specific use of them. I am fine with local reference variables to shorten what would otherwise be long names, in fact we don't do this enough in EnergyPlus.
ZoneData &zone = state.dataHeatBal->Zone(ZoneNum);
I am also somewhat fine with constant reference function parameters, because there is not a big logical difference between a constant reference parameter and a value parameter, but non-const reference parameters are evil in my opinion.
EnergyPlus makes reasonably heavy use of the ObjexxFCL::Optional
template and its variants. The C++ standard library also has a std::optional
template that is lighter-weight than ObjexxFCL::Optional
and essentially combines a value with a present/not-present bool
in a std::pair
template. Are these things useful?
In my opinion, they are not. The C++ language already has a mechanism for optional parameters and that is default values.
int
aFunction(int requiredArgument,
int optionalArgument = -1); // if an argument is not supplied a default value of -1 will be passed instead
If an optional
argument is not provided, the function typically adopts some default value. It is both faster and cleaner (in my opinion) to use parameters with appropriate default values than to use any one of several optional containers like std::optional
or ObjexxFCL::Optional
.
Avoiding the use of optional containers is another reason to use pointer rather than reference parameters. A pointer argument can be given the default value of nullptr
, whereas there is no such thing as a nullref
. Ironically, the same mechanism is used to implement ObjexxFCL::Optional
. The container stores a pointer to the optional parameter and the present()
function checks if that pointer is nullptr
. Remember, references are just pointers in syntactic disguise and ObjexxFCL undisguises them for this purpose.