Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions .github/workflows/ci_energyml_utils_pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
## SPDX-License-Identifier: Apache-2.0
##
---

name: Publish (pypiTest)

defaults:
Expand All @@ -15,13 +14,14 @@ on:
branches:
- main
pull_request:
release:
types: [published]

jobs:
build:
name: Build distribution
runs-on: ubuntu-latest
steps:

- name: Checkout code
uses: actions/checkout@v4
with:
Expand All @@ -30,7 +30,7 @@ jobs:
- name: Install poetry
uses: ./.github/actions/prepare-poetry
with:
python-version: '3.10'
python-version: "3.10"

- name: Build
run: |
Expand Down Expand Up @@ -58,7 +58,6 @@ jobs:
needs: [build]
runs-on: ubuntu-latest
steps:

# Retrieve the code and GIT history so that poetry-dynamic-versioning knows which version to upload
- name: Checkout code
uses: actions/checkout@v4
Expand All @@ -74,7 +73,7 @@ jobs:
- name: Install poetry
uses: ./.github/actions/prepare-poetry
with:
python-version: '3.10'
python-version: "3.10"

- name: Upload to PyPI TEST
run: |
Expand Down
2 changes: 1 addition & 1 deletion energyml-utils/.flake8
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[flake8]
# Ignore specific error codes (comma-separated list)
ignore = E501, E722 #, W503, F403
ignore = E501, E722, W503, F403, E203, E202

# Max line length (default is 79, can be changed)
max-line-length = 120
Expand Down
3 changes: 2 additions & 1 deletion energyml-utils/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,5 @@ manip*


# WIP
src/energyml/utils/wip*
src/energyml/utils/wip*
scripts
14 changes: 14 additions & 0 deletions energyml-utils/.pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# .pre-commit-config.yaml
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
153 changes: 149 additions & 4 deletions energyml-utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,144 @@ energyml-prodml2-2 = "^1.12.0"
- The "EnergymlWorkspace" class allows to abstract the access of numerical data like "ExternalArrays". This class can thus be extended to interact with ETP "GetDataArray" request etc...
- ETP URI support : the "Uri" class allows to parse/write an etp uri.

## EPC Stream Reader

The **EpcStreamReader** provides memory-efficient handling of large EPC files through lazy loading and smart caching. Unlike the standard `Epc` class which loads all objects into memory, the stream reader loads objects on-demand, making it ideal for handling very large EPC files with thousands of objects.

### Key Features

- **Lazy Loading**: Objects are loaded only when accessed, reducing memory footprint
- **Smart Caching**: LRU (Least Recently Used) cache with configurable size
- **Automatic EPC Version Detection**: Supports both CLASSIC and EXPANDED EPC formats
- **Add/Remove/Update Operations**: Full CRUD operations with automatic file structure maintenance
- **Context Management**: Automatic resource cleanup with `with` statements
- **Memory Monitoring**: Track cache efficiency and memory usage statistics

### Basic Usage

```python
from energyml.utils.epc_stream import EpcStreamReader

# Open EPC file with context manager (recommended)
with EpcStreamReader('large_file.epc', cache_size=50) as reader:
# List all objects without loading them
print(f"Total objects: {reader.stats.total_objects}")

# Get object by identifier
obj: Any = reader.get_object_by_identifier("uuid.version")

# Get objects by type
features: List[Any] = reader.get_objects_by_type("BoundaryFeature")

# Get all objects with same UUID
versions: List[Any] = reader.get_object_by_uuid("12345678-1234-1234-1234-123456789abc")
```

### Adding Objects

```python
from energyml.utils.epc_stream import EpcStreamReader
from energyml.utils.constants import gen_uuid
import energyml.resqml.v2_2.resqmlv2 as resqml
import energyml.eml.v2_3.commonv2 as eml

# Create a new EnergyML object
boundary_feature = resqml.BoundaryFeature()
boundary_feature.uuid = gen_uuid()
boundary_feature.citation = eml.Citation(title="My Feature")

with EpcStreamReader('my_file.epc') as reader:
# Add object - path is automatically generated based on EPC version
identifier = reader.add_object(boundary_feature)
print(f"Added object with identifier: {identifier}")

# Or specify custom path (optional)
identifier = reader.add_object(boundary_feature, "custom/path/MyFeature.xml")
```

### Removing Objects

```python
with EpcStreamReader('my_file.epc') as reader:
# Remove specific version by full identifier
success = reader.remove_object("uuid.version")

# Remove ALL versions by UUID only
success = reader.remove_object("12345678-1234-1234-1234-123456789abc")

if success:
print("Object(s) removed successfully")
```

### Updating Objects

```python
...
from energyml.utils.introspection import set_attribute_from_path

with EpcStreamReader('my_file.epc') as reader:
# Get existing object
obj = reader.get_object_by_identifier("uuid.version")

# Modify the object
set_attribute_from_path(obj, "citation.title", "Updated Title")

# Update in EPC file
new_identifier = reader.update_object(obj)
print(f"Updated object: {new_identifier}")
```

### Performance Monitoring

```python
with EpcStreamReader('large_file.epc', cache_size=100) as reader:
# Access some objects...
for i in range(10):
obj = reader.get_object_by_identifier(f"uuid-{i}.1")

# Check performance statistics
print(f"Cache hit rate: {reader.stats.cache_hit_rate:.1f}%")
print(f"Memory efficiency: {reader.stats.memory_efficiency:.1f}%")
print(f"Objects in cache: {reader.stats.loaded_objects}/{reader.stats.total_objects}")
```

### EPC Version Support

The EpcStreamReader automatically detects and handles both EPC packaging formats:

- **CLASSIC Format**: Flat file structure (e.g., `obj_BoundaryFeature_{uuid}.xml`)
- **EXPANDED Format**: Namespace structure (e.g., `namespace_resqml201/version_{id}/obj_BoundaryFeature_{uuid}.xml` or `namespace_resqml201/obj_BoundaryFeature_{uuid}.xml`)

```python
with EpcStreamReader('my_file.epc') as reader:
print(f"Detected EPC version: {reader.export_version}")
# Objects added will use the same format as the existing EPC file
```

### Advanced Usage

```python
# Initialize without preloading metadata for faster startup
reader = EpcStreamReader('huge_file.epc', preload_metadata=False, cache_size=200)

try:
# Manual metadata loading when needed
reader._load_metadata()

# Get object dependencies
deps = reader.get_object_dependencies("uuid.version")

# Batch processing with memory monitoring
for obj_type in ["BoundaryFeature", "PropertyKind"]:
objects = reader.get_objects_by_type(obj_type)
print(f"Processing {len(objects)} {obj_type} objects")

finally:
reader.close() # Manual cleanup if not using context manager
```

The EpcStreamReader is perfect for applications that need to work with large EPC files efficiently, such as data processing pipelines, web applications, or analysis tools where memory usage is a concern.


# Poetry scripts :

Expand All @@ -95,25 +233,32 @@ energyml-prodml2-2 = "^1.12.0"
poetry install
```

if you fail to run a script, you may have to add "src" to your PYTHONPATH environment variable. For example, in powershell :

```powershell
$env:PYTHONPATH="src"
```


## Validation examples :

An epc file:
```bash
poetry run validate --input "path/to/your/energyml/object.epc" *> output_logs.json
poetry run validate --file "path/to/your/energyml/object.epc" *> output_logs.json
```

An xml file:
```bash
poetry run validate --input "path/to/your/energyml/object.xml" *> output_logs.json
poetry run validate --file "path/to/your/energyml/object.xml" *> output_logs.json
```

A json file:
```bash
poetry run validate --input "path/to/your/energyml/object.json" *> output_logs.json
poetry run validate --file "path/to/your/energyml/object.json" *> output_logs.json
```

A folder containing Epc/xml/json files:
```bash
poetry run validate --input "path/to/your/folder" *> output_logs.json
poetry run validate --file "path/to/your/folder" *> output_logs.json
```

Loading