OneDiff is an acceleration library for diffusion models, allowing for faster inference with minimal code changes. The name stands for "one line of code to accelerate diffusion models". It achieves this through features like PyTorch Module compilation and optimised GPU Kernels.
Does OneDiff improve the inference speed of our T2I and I2I pipelines while still providing high quality.
To evaluate the performance impact of OneDiff optimization, I conducted a series of tests using the provided benchmarking script. The methodology involved running the script multiple times for each model, with the number of inference steps consistently set to 10 across all runs. I specifically utilized the Nexfort compiler for all operations, as specified in the script. The testing process involved comparing the performance of runs with and without OneDiff optimization. For each run, I measured and recorded the inference time. To ensure fair comparisons, I maintained consistent parameters across all runs, including the number of inference steps and other relevant settings. This approach allowed for a structured assessment of the performance differences between standard model execution and the OneDiff-optimized version, all while leveraging the Nexfort compiler.
python3 -m pip install -U torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 torchao==0.1
python3 -m pip install -U nexfort
pip install onediffx
If you want to run with onediff use compiler nexfort otherwise none
python3 ./text_to_image.py --scheduler none --steps 10 --height 1024 --width 1024 --compiler none --compiler-config '{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last", "dynamic": true}' --output-image ./test.png
Code Details
print("Nexfort backend is now active...")
if args.quantize:
if args.quantize_config is not None:
quantize_config = json.loads(args.quantize_config)
else:
quantize_config = '{"quant_type": "fp8_e4m3_e4m3_dynamic"}'
if args.quant_submodules_config_path:
# download: https://huggingface.co/siliconflow/PixArt-alpha-onediff-nexfort-fp8/blob/main/fp8_e4m3.json
pipe = quantize_pipe(
pipe,
quant_submodules_config_path=args.quant_submodules_config_path,
ignores=[],
**quantize_config,
)
else:
pipe = quantize_pipe(pipe, ignores=[], **quantize_config)
if args.compiler_config is not None:
# config with dict
options = json.loads(args.compiler_config)
else:
# config with string
options = '{"mode": "max-optimize:max-autotune:freezing", "memory_format": "channels_last"}'
pipe = compile_pipe(
pipe, backend="nexfort", options=options, fuse_qkv_projections=True
) print("Nexfort backend is now active...")
if args.quantize:
if args.quantize_config is not None:
quantize_config = json.loads(args.quantize_config)
else:
quantize_config = '{"quant_type": "fp8_e4m3_e4m3_dynamic"}'
if args.quant_submodules_config_path:
# download: https://huggingface.co/siliconflow/PixArt-alpha-onediff-nexfort-fp8/blob/main/fp8_e4m3.json
pipe = quantize_pipe(
pipe,
quant_submodules_config_path=args.quant_submodules_config_path,
ignores=[],
**quantize_config,
)
else:
pipe = quantize_pipe(pipe, ignores=[], **quantize_config)
if args.compiler_config is not None:
# config with dict
options = json.loads(args.compiler_config)
else:
# config with string
options = '{"mode": "max-optimize:max-autotune:freezing", "memory_format": "channels_last"}'
pipe = compile_pipe(
pipe, backend="nexfort", options=options, fuse_qkv_projections=True
)
MODEL = "SG161222/RealVisXL_V4.0"
VARIANT = None
CUSTOM_PIPELINE = None
SCHEDULER = "EulerAncestralDiscreteScheduler"
LORA = None
CONTROLNET = None
STEPS = 30
PROMPT = "best quality, realistic, unreal engine, 4K,a cat sitting on human lap"
NEGATIVE_PROMPT = ""
SEED = 333
WARMUPS = 1
BATCH = 1
HEIGHT = None
WIDTH = None
INPUT_IMAGE = None
CONTROL_IMAGE = None
OUTPUT_IMAGE = None
EXTRA_CALL_KWARGS = None
CACHE_INTERVAL = 3
CACHE_LAYER_ID = 0
CACHE_BLOCK_ID = 0
COMPILER = "nexfort"
COMPILER_CONFIG = None
QUANTIZE_CONFIG = None
â—Ź This code block defines a function parse_args to handle command-line arguments without using any attributes.
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--model", type=str, default=MODEL)
parser.add_argument("--variant", type=str, default=VARIANT)
parser.add_argument("--custom-pipeline", type=str, default=CUSTOM_PIPELINE)
parser.add_argument("--scheduler", type=str, default=SCHEDULER)
# ... other argument definitions
return parser.parse_args()
args = parse_args()
â—Ź This code block defines a function parse_args to handle command-line arguments using argparse.It defines various arguments such as model, variant, custom-pipeline, scheduler, etc., each with a default value from globally defined variables.This allows users to customize the text-to-image generation process from the command line.The line args = parse_args()
calls the function and stores the parsed arguments in the args variable for later use.
def load_pipe(
pipeline_cls,
model_name,
variant=None,
dtype=torch.float16,
device="cuda",
custom_pipeline=None,
scheduler=None,
lora=None,
controlnet=None,
):
# ... function implementation ...
â—Ź This code defines a function load_pipe that is responsible for loading and configuring the text-to-image generation pipeline.â—Ź It takes several arguments including the pipeline class (pipeline_cls), model name (model_name), variant, data type (dtype), device, and optional components like a custom pipeline, scheduler, LoRA (Low-Rank Adaptation), and ControlNet.â—Ź The function handles loading the pre-trained model, potentially applying quantization, setting up the scheduler, loading LoRA weights, and moving the pipeline to the specified device.
def calculate_inference_time_and_throughput(height, width, n_steps, model):
start_time = time.time()
model(prompt=args.prompt, height=height, width=width, num_inference_steps=n_steps)
end_time = time.time()
inference_time = end_time - start_time
# pixels_processed = height * width * n_steps
# throughput = pixels_processed / inference_time
throughput = n_steps / inference_time
return inference_time, throughput
â—Ź This code defines a function calculate_inference_time_and_throughput to measure the performance of the text-to-image generation process.It takes the image height, width, number of inference steps, and the model as input.The function records the start and end time of the generation process to calculate the inference time.Throughput is then calculated as the number of steps per second.
def get_kwarg_inputs():
kwarg_inputs = dict(
prompt=args.prompt,
negative_prompt=args.negative_prompt,
height=height,
width=width,
# ... other keyword arguments ...
)
# ... additional argument handling ...
return kwarg_inputs
â—ŹThis code defines a function get_kwarg_inputs to collect and organize keyword arguments that will be passed to the text-to-image generation pipeline.It gathers arguments such as prompt, negative_prompt, height, width, and others, which control the generation process.The function handles optional arguments like the input image, control image, deep caching options, and additional arguments from the extra_call_kwargs variable.
The IterationProfiler class and related functions measure performance:
class IterationProfiler:
def __init__(self):
self.begin = None
self.end = None
self.num_iterations = 0
# ... (methods for profiling)
pipe = load_pipe(
pipeline_cls,
args.model,
variant=args.variant,
custom_pipeline=args.custom_pipeline,
scheduler=args.scheduler,
lora=args.lora,
controlnet=args.controlnet,
)
This loads the specified diffusion model pipeline with various customization options like variant, custom pipeline, scheduler, LoRA, and ControlNet.
height = args.height or core_net.config.sample_size * pipe.vae_scale_factor
width = args.width or core_net.config.sample_size * pipe.vae_scale_factor
Sets the output image dimensions, either from user arguments or based on the model's default configuration.
if args.compiler == "none":
pass
elif args.compiler == "oneflow":
pipe = compile_pipe(pipe)
elif args.compiler == "nexfort":
# ... (nexfort compilation logic)
elif args.compiler in ("compile", "compile-max-autotune"):
# ... (torch.compile logic)
Applies various compiler optimizations to the pipeline based on the specified compiler option.
if args.input_image is None:
input_image = None
else:
input_image = load_image(args.input_image)
input_image = input_image.resize((width, height), Image.LANCZOS)
Loads and resizes an input image if specified (for image-to-image tasks).
if args.control_image is None:
if args.controlnet is None:
control_image = None
else:
# ... (create a default control image)
else:
control_image = load_image(args.control_image)
control_image = control_image.resize((width, height), Image.LANCZOS)
Prepares a control image for ControlNet, either loading a specified image or creating a default one.
if args.warmups > 0:
# ... (perform warm-up runs)
Executes warm-up runs to trigger compilation and initial optimizations.As for Warmup is basically the time taken to bring the model to its full capable loading state
kwarg_inputs = get_kwarg_inputs()
iter_profiler = IterationProfiler()
# ... (set up profiling callback)
begin = time.time()
output_images = pipe(**kwarg_inputs).images
end = time.time()
Performs the main image generation inference, with profiling.
print(f"Inference time: {end - begin:.3f}s")
iter_per_sec = iter_profiler.get_iter_per_sec()
if iter_per_sec is not None:
print(f"Iterations per second: {iter_per_sec:.3f}")
# ... (memory usage reporting)
Reports various performance metrics like inference time, iterations per second, and memory usage.
if args.output_image is not None:
output_images[0].save(args.output_image)
Saves the generated image if an output path is specified.
if args.run_multiple_resolutions:
# ... (run inference at multiple resolutions)
Tests the model's performance across various image resolutions. As for i wont recommend running this as for i made various testing changes and now code is barely linked as to work properly this might be broken.
if args.throughput:
steps_range = range(1, 100, 1)
data, coefficients = generate_data_and_fit_model(pipe, steps_range)
plot_data_and_model(data, coefficients)
If requested, performs a detailed throughput analysis across different numbers of inference steps and plots the results. This is the main blocks where all the things are handled and you will be able to understand most of the features used in the code using this block.
if __name__ == "__main__":
main()
â—ŹThis is a common Python idiom. It ensures that the main() function is called only when the script is run directly, not when it's imported as a module. This documentation provides a breakdown of the code snippets, explaining their purpose and how they fit into the larger text-to-image generation process.
python3 testi2i.py --input-image ./RealVisXL_withoutonediff_1024.png --height 1024 --width 1024 --compiler none --output-image ./i2i_1024__timebrooks_withoutonediff.png --prompt "turn it into a painting painted by paintbrush"
Code Details
print("Nexfort backend is now active...")
if args.quantize:
if args.quantize_config is not None:
quantize_config = json.loads(args.quantize_config)
else:
quantize_config = '{"quant_type": "fp8_e4m3_e4m3_dynamic"}'
if args.quant_submodules_config_path:
# download: https://huggingface.co/siliconflow/PixArt-alpha-onediff-nexfort-fp8/blob/main/fp8_e4m3.json
pipe = quantize_pipe(
pipe,
quant_submodules_config_path=args.quant_submodules_config_path,
ignores=[],
**quantize_config,
)
else:
pipe = quantize_pipe(pipe, ignores=[], **quantize_config)
if args.compiler_config is not None:
# config with dict
options = json.loads(args.compiler_config)
else:
# config with string
options = '{"mode": "max-optimize:max-autotune:freezing", "memory_format": "channels_last"}'
pipe = compile_pipe(
pipe, backend="nexfort", options=options, fuse_qkv_projections=True
) print("Nexfort backend is now active...")
if args.quantize:
if args.quantize_config is not None:
quantize_config = json.loads(args.quantize_config)
else:
quantize_config = '{"quant_type": "fp8_e4m3_e4m3_dynamic"}'
if args.quant_submodules_config_path:
# download: https://huggingface.co/siliconflow/PixArt-alpha-onediff-nexfort-fp8/blob/main/fp8_e4m3.json
pipe = quantize_pipe(
pipe,
quant_submodules_config_path=args.quant_submodules_config_path,
ignores=[],
**quantize_config,
)
else:
pipe = quantize_pipe(pipe, ignores=[], **quantize_config)
if args.compiler_config is not None:
# config with dict
options = json.loads(args.compiler_config)
else:
# config with string
options = '{"mode": "max-optimize:max-autotune:freezing", "memory_format": "channels_last"}'
pipe = compile_pipe(
pipe, backend="nexfort", options=options, fuse_qkv_projections=True
)
MODEL = "timbrooks/instruct-pix2pix"
VARIANT = None
CUSTOM_PIPELINE = None
SCHEDULER = "EulerAncestralDiscreteScheduler"
LORA = None
CONTROLNET = None
STEPS = 30
PROMPT = "make "
NEGATIVE_PROMPT = ""
SEED = 333
WARMUPS = 1
BATCH = 1
HEIGHT = 512
WIDTH = 512
INPUT_IMAGE = "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg" # Set a default input image path
CONTROL_IMAGE = None
OUTPUT_IMAGE = None
EXTRA_CALL_KWARGS = None
CACHE_INTERVAL = 3
CACHE_LAYER_ID = 0
CACHE_BLOCK_ID = 0
COMPILER = "nexfort"
COMPILER_CONFIG = None
QUANTIZE_CONFIG = None
â—Ź This code block defines default values for various parameters used in the text-to-image generation process. These values serve as fallbacks if not specified by the user.
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--model", type=str, default=MODEL)
parser.add_argument("--variant", type=str, default=VARIANT)
parser.add_argument("--custom-pipeline", type=str, default=CUSTOM_PIPELINE)
parser.add_argument("--scheduler", type=str, default=SCHEDULER)
# ... other argument definitions
return parser.parse_args()
args = parse_args()
â—Ź This code defines the parse_args function to handle command-line arguments using argparse. It allows users to customize various aspects of the text-to-image generation process. The parsed arguments are stored in the args variable for later use throughout the script.
def load_pipe(
pipeline_cls,
model_name,
variant=None,
dtype=torch.float16,
device="cuda",
custom_pipeline=None,
scheduler=None,
lora=None,
controlnet=None,
):
# ... function implementation ...
â—Ź This function, load_pipe, is responsible for loading and configuring the text-to-image generation pipeline. It handles various components like custom pipelines, schedulers, LoRA, and ControlNet. The function also manages model loading, potential quantization, and device placement.
def calculate_inference_time_and_throughput(height, width, n_steps, model):
start_time = time.time()
model(prompt=args.prompt, height=height, width=width, num_inference_steps=n_steps)
end_time = time.time()
inference_time = end_time - start_time
throughput = n_steps / inference_time
return inference_time, throughput
â—Ź This function measures the performance of the text-to-image generation process. It calculates both the inference time and throughput (steps per second) for a single run of the model.
def get_kwarg_inputs():
kwarg_inputs = dict(
prompt=args.prompt,
negative_prompt=args.negative_prompt,
height=height,
width=width,
# ... other keyword arguments ...
)
# ... additional argument handling ...
return kwarg_inputs
â—Ź The get_kwarg_inputs function prepares a dictionary of keyword arguments for the pipeline. It includes various generation parameters and handles optional arguments like input images and deep caching options.
class IterationProfiler:
def __init__(self):
self.begin = None
self.end = None
self.num_iterations = 0
# ... (methods for profiling)
â—Ź The IterationProfiler class is used for detailed performance profiling of the generation process. It tracks the timing of individual iterations using CUDA events.
pipe = load_pipe(
pipeline_cls,
args.model,
variant=args.variant,
custom_pipeline=args.custom_pipeline,
scheduler=args.scheduler,
lora=args.lora,
controlnet=args.controlnet,
)
â—Ź This code loads the specified diffusion model pipeline with various customization options.
pythonCopyheight = args.height or core_net.config.sample_size * pipe.vae_scale_factor
width = args.width or core_net.config.sample_size * pipe.vae_scale_factor
â—Ź Sets the output image dimensions based on user arguments or model defaults.
if args.compiler == "none":
pass
elif args.compiler == "oneflow":
pipe = compile_pipe(pipe)
elif args.compiler == "nexfort":
# ... (nexfort compilation logic)
elif args.compiler in ("compile", "compile-max-autotune"):
# ... (torch.compile logic)
â—Ź Applies compiler optimizations to the pipeline based on the specified compiler option.
if args.input_image is None:
input_image = None
else:
input_image = load_image(args.input_image)
input_image = input_image.resize((width, height), Image.LANCZOS)
â—Ź Loads and resizes an input image if specified for image-to-image tasks.
if args.control_image is None:
if args.controlnet is None:
control_image = None
else:
# ... (create a default control image)
else:
control_image = load_image(args.control_image)
control_image = control_image.resize((width, height), Image.LANCZOS)
â—Ź Prepares a control image for ControlNet, either loading a specified image or creating a default one.
if args.warmups > 0:
# ... (perform warm-up runs)
â—Ź Executes warm-up runs to trigger compilation and initial optimizations.
kwarg_inputs = get_kwarg_inputs()
iter_profiler = IterationProfiler()
# ... (set up profiling callback)
begin = time.time()
output_images = pipe(**kwarg_inputs).images
end = time.time()
â—Ź Performs the main image generation inference with profiling.
Performance Reporting:
pythonCopyprint(f"Inference time: {end - begin:.3f}s")
iter_per_sec = iter_profiler.get_iter_per_sec()
if iter_per_sec is not None:
print(f"Iterations per second: {iter_per_sec:.3f}")
# ... (memory usage reporting)
â—Ź Reports various performance metrics including inference time and iterations per second.
if args.output_image is not None:
output_images[0].save(args.output_image)
â—Ź Saves the generated image if an output path is specified.
if args.run_multiple_resolutions:
# ... (run inference at multiple resolutions)
â—Ź Tests the model's performance across various image resolutions.
if args.throughput:
steps_range = range(1, 100, 1)
data, coefficients = generate_data_and_fit_model(pipe, steps_range)
plot_data_and_model(data, coefficients)
â—Ź Performs a detailed throughput analysis across different numbers of inference steps and plots the results.
if __name__ == "__main__":
main()
â—Ź Ensures that the main() function is called only when the script is run directly, not when it's imported as a module.
Compiler Oneflow
Text-to-Image
Warmup time: 68.700s
=======================================
=======================================
Inference time: 0.874s
Iterations per second: 16.183
Max used CUDA memory : 13.244GiB
=======================================
Warmup time: 2.439s
=======================================
=======================================
Inference time: 1.521s
Iterations per second: 8.326
Max used CUDA memory : 10.465GiB
=======================================
Warmup time: 67.218s
=======================================
=======================================
Inference time: 0.332s
Iterations per second: 44.523
Max used CUDA memory : 10.031GiB
=======================================
Warmup time: 1.755s
=======================================
=======================================
Inference time: 0.874s
Iterations per second: 13.381
Max used CUDA memory : 7.661GiB
=======================================
Warmup time: 71.707s
=======================================
=======================================
Inference time: 0.863s
Iterations per second: 16.257
Max used CUDA memory : 13.248GiB
=======================================
Warmup time: 2.405s
=======================================
=======================================
Inference time: 1.536s
Iterations per second: 8.325
Max used CUDA memory : 10.470GiB
=======================================
Warmup time: 67.914s
=======================================
=======================================
Inference time: 0.337s
Iterations per second: 42.992
Max used CUDA memory : 10.085GiB
=======================================
Warmup time: 1.817s
=======================================
=======================================
Inference time: 0.890s
Iterations per second: 13.250
Max used CUDA memory : 7.656GiB
=======================================
Image-to-Image
For 1024 x 1024 size images i have used 1024 sized image generated from RealVisXL_V4.0 model and same for 512 too. prompt ("trun her into a cyborg")
Warmup time: 70.647s
=======================================
=======================================
Inference time: 0.871s
Iterations per second: 16.199
Max used CUDA memory : 13.302GiB
=======================================
Warmup time: 2.313s
=======================================
=======================================
Inference time: 1.522s
Iterations per second: 8.290
Max used CUDA memory : 10.471GiB
=======================================
Warmup time: 72.229s
=======================================
=======================================
Inference time: 0.325s
Iterations per second: 47.863
Max used CUDA memory : 10.031GiB
=======================================
Warmup time: 1.784s
=======================================
=======================================
Inference time: 0.898s
Iterations per second: 12.942
Max used CUDA memory : 7.661GiB
=======================================
Warmup time: 45.665s
=======================================
=======================================
Inference time: 0.888s
Iterations per second: 13.108
Max used CUDA memory : 13.079GiB
=======================================
Warmup time: 2.343s
=======================================
=======================================
Inference time: 1.723s
Iterations per second: 7.033
Max used CUDA memory : 4.400GiB
=======================================
Warmup time: 38.675s
=======================================
=======================================
Inference time: 0.187s
Iterations per second: 69.570
Max used CUDA memory : 4.636GiB
=======================================
Warmup time: 1.231s
=======================================
=======================================
Inference time: 0.397s
Iterations per second: 31.571
Max used CUDA memory : 2.613GiB
=======================================
Compiler Nexfort
Text-to-Image
Warmup time: 924.378s
=======================================
=======================================
Inference time: 0.979s
Iterations per second: 13.871
Max used CUDA memory : 11.464GiB
=======================================
Warmup time: 2.391s
=======================================
=======================================
Inference time: 1.515s
Iterations per second: 8.331
Max used CUDA memory : 10.471GiB
=======================================
Warmup time: 890.209s
=======================================
=======================================
Inference time: 0.704s
Iterations per second: 17.770
Max used CUDA memory : 8.956GiB
=======================================
Warmup time: 1.696s
=======================================
=======================================
Inference time: 0.889s
Iterations per second: 13.081
Max used CUDA memory : 7.657GiB
=======================================
Warmup time: 813.568s
=======================================
=======================================
Inference time: 0.976s
Iterations per second: 13.891
Max used CUDA memory : 11.465GiB
======================================
Warmup time: 3.034s
=======================================
=======================================
Inference time: 1.518s
Iterations per second: 8.333
Max used CUDA memory : 10.473GiB
=======================================
Warmup time: 802.404s
=======================================
=======================================
Inference time: 0.697s
Iterations per second: 17.522
Max used CUDA memory : 8.956GiB
=======================================
Warmup time: 1.577s
=======================================
=======================================
Inference time: 0.868s
Iterations per second: 13.497
Max used CUDA memory : 7.657GiB
=======================================
Image-to-image
For Image 2 Image the images i am using are also the generated images from the above models under different size
1024x1024 Prompt(turn it into a painting) , Image used 1024 x 1024 without onediff generated image by RealVisXL_4.0
Warmup time: 414.009s
=======================================
=======================================
Inference time: 2.558s
Iterations per second: 12.674
Max used CUDA memory : 3.643GiB
=======================================
Warmup time: 5.245s
=======================================
=======================================
Inference time: 4.569s
Iterations per second: 7.035
Max used CUDA memory : 4.400GiB
=======================================
512 X 512 Prompt(turn it into a painting) , Image used 512 x 512 without onediff generated image by RealVisXL_4.0
Warmup time: 470.570s
=======================================
=======================================
Inference time: 0.790s
Iterations per second: 42.596
Max used CUDA memory : 2.693GiB
=======================================
Warmup time: 1.883s
=======================================
=======================================
Inference time: 1.045s
Iterations per second: 31.124
Max used CUDA memory : 2.625GiB
1024x1024 Prompt(make it into a cyborg) , Image used 1024 x 1024 without onediff generated image by RealVisXL_4.0
Warmup time: 218.133s
=======================================
=======================================
Inference time: 1.079s
Iterations per second: 15.162
Max used CUDA memory : 11.489GiB
=================================
Warmup time: 2.355s
=======================================
=======================================
Inference time: 1.585s
Iterations per second: 8.327
Max used CUDA memory : 10.474GiB
=======================================
512x512 Prompt(turn it into a painting) , Image used 512 x 512 without onediff generated image by RealVisXL_4.0
Warmup time: 553.390s
=======================================
=======================================
Inference time: 0.542s
Iterations per second: 25.336
Max used CUDA memory : 8.977GiB
=======================================
Warmup time: 1.654s
=======================================
=======================================
Inference time: 0.831s
Iterations per second: 13.513
Max used CUDA memory : 7.655GiB
=======================================
it is evident that OneDiff consistently reduced inference time, leading to increased inference speed.While the source does not explicitly analyze the impact of OneDiff on image quality, it does provide sample images generated with and without OneDiff for each model and resolution. A subjective visual comparison of these images suggests that the image quality remains largely unaffected by OneDiff optimization.
Working with OneDiff in the current scenario has proven challenging due to several factors. The lack of comprehensive documentation has made it difficult to navigate and implement the library effectively. Testing capabilities are limited, as OneDiff doesn't currently support Kaggle P100 and T4 GPUs, which restricts the environments in which it can be evaluated. Furthermore, the library ecosystem surrounding OneDiff appears to be fragile, with interdependencies causing conflicts. Installing one component often leads to unexpected issues with others, creating a cascade of compatibility problems. This instability in the development environment has made it cumbersome to set up and maintain a reliable testing framework for OneDiff.