-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DataEcon to v0.3 #46
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #46 +/- ##
==========================================
+ Coverage 92.06% 92.24% +0.17%
==========================================
Files 20 20
Lines 2785 2901 +116
==========================================
+ Hits 2564 2676 +112
- Misses 221 225 +4
|
YPFrequency not supported yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @bbejanov . DataEcon has really expanded in its functionality!
I only have a few comments and some questions stemming mostly from my own curiosity.
const DE_VER_REVISION = 1 | ||
|
||
const DE_VER_SUBREVISION = 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are all hard-coded, would it be better if they were derived or do we need some check to ensure consistency here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This C.jl file is generated using Clang.jl package from the daec.h file in DataEcon. The idea is that the .h file has these constants hard-coded, and there is also a function in the library that returns the same version number (that's de_version()
). At run time the user should compare the hardcoded constants in the .h file they are using to the value returned by the library function to make sure they match. For this Julia connector, this check is done in DataEcon.jl in __init__()
version = VersionNumber(unsafe_string(C.de_version())) | ||
if version != VersionNumber(C.DE_VERSION) | ||
throw(ErrorException("Library version $(version) does not match expected version $(C.DE_VERSION).")) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to specify in the error message that it is the DE library version causing the error here (as opposed to any number of other libraries the user may be loading).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea - I'll update the error message.
src/dataecon/DataEcon.jl
Outdated
fname = string(fname) | ||
I._check(C.de_open(fname, handle)) | ||
handle = _do_open(readonly ? C.de_open_readonly : C.de_open, fname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would set readonly to true by default; I think most open operations will be to get data and load it into memory; we wouldn't want to lock the file unnecessarily. Maybe change the option to write=false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/dataecon/DataEcon.jl
Outdated
# import ..LittleDict | ||
|
||
""" | ||
get_all_attributes(de, obejct_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Retrieve the names and value of all attributes of the object with the given id. | ||
They are returned in a dictionary. | ||
""" | ||
function get_all_attributes(de::DEFile, id::C.obj_id_t; delim="‖") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the SQLITE call is just getting the columns and concatenating them with this delimiter, and then we on our side are then splitting the resulting string by this delimiter? Is this the best way to do this? I'm guessing there won't be many attributes, so it can't be that expensive, it just seems a little roundabout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the query that actually runs in de_get_all_attributes()
:
So, yes, it concatenates all names in one string with the given delimiter and all the values in the same order as the names and with the same delimiter in another string, and it also returns the number of attributes as a separate value. The main point here is that this query returns at most one row. It may be less efficient than simply SELECT `name`, `value` FROM `attributes` WHERE `id` = ?
, but:
- the simple query requires a more complex logic to be coded by the caller -- they'd have to write a loop and additional logic for all corner cases. Compare with
I._list_catalog()
to see what the user would have to do to iterate the rows of a table returned by SQL. - the simple query doesn't distinguish between an object that doesn't exist and an object that exists, but doesn't have any attributes - both cases return a table with 0 rows. The other query returns 0 rows if object doesn't exist and 1 row (with count=0, names="", values="") if object exists but doesn't have any attributes.
Btw, the case of 0-rows is handled by the library - de_get_all_attributes
returns an error code (DE_OBJ_DNE
), which here in the Julia connector is handled by I._check()
to throw an exception. So the end user doesn't even have to check for that.
all_names = String[split(unsafe_string(names[]), delim);] | ||
all_values = String[split(unsafe_string(values[]), delim);] | ||
if length(all_names) != length(all_values) | ||
error("number of names and values don't match. Try a different delimiter.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought was that this error message doesn't seem that useful since the user is unlikely to call get_all_attributes
themselves, but it seems that this is not the case; i.e. get_all_attributes
is not called anywhere and is currently there for the sake of completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct: get_all_attributes
is not used in the connector. It is made available for the end user to call, if they need to.
The idea is that, as an end user, if you want the value of a specific attribute, you call get_attribute
. If you want to explore what attributes a given object has, you call get_all_attributes
. Setting attributes can only be done one at a time with set_attribute
.
src/dataecon/I.jl
Outdated
@@ -56,16 +56,41 @@ _check(rc::Cint) = rc == 0 || throw(DEError()) | |||
|
|||
const StrOrSym = Union{Symbol,AbstractString} | |||
|
|||
_to_de_scalar_val(value) = throw(ArgumentError("Unable to write value of type $(typeof(value)).")) | |||
_to_de_scalar_val(value) = throw(ArgumentError("Unable to write scalar value of type $(typeof(value)).")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the pattern of some of your other code, do we want an @nospecialize
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/dataecon/I.jl
Outdated
############################################################################# | ||
# write tseries and mvtseries | ||
|
||
|
||
_de_array_data(; eltype, elfreq=C.freq_none, obj_type, nbytes, val) = (; eltype, elfreq, obj_type, nbytes, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? It takes a number of keyword arguments (supplied in any order) and returns an array of their values in a specific order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) this is not my best moment :-)
The logic here is that all methods of _to_de_array
would return a sturcture with all relevant fields, which are then used in _do_store_array
. But instead of defining an actual struct
, I am using a NamedTuple
(sort of like an anonymous struct, if you will). This particular function, _de_array_data
, acts as "constructor" - so you call it with keyword arguments and it constructs the NamedTuple
for you.
This was easier to work with while prototyping, because Revise.jl doesn't handle changes in a struct - you have to restart the repl.
_to_de_array(::Val{true}, value::AbstractMatrix{ET}) where {ET} = (; eltype=_to_de_scalar_type(ET), type=C.type_matrix, nbytes=0, val=C_NULL) | ||
_to_de_array(::Val{true}, value::AbstractVecOrMat{ET}) where {ET} = _de_array_data(; | ||
_eltypefreq(ET)..., | ||
obj_type=ndims(value) == 1 ? C.type_vector : C.type_matrix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL: a matrix in Julia is a strictly two-dimensional object.
Thanks @jasonjensen for your review! |
New version of DataEcon introduces functions to pack and unpack dates of different frequencies given year-period or year-month-day.