Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DataEcon to v0.3 #46

Merged
merged 14 commits into from
Oct 30, 2023
Merged

Update DataEcon to v0.3 #46

merged 14 commits into from
Oct 30, 2023

Conversation

bbejanov
Copy link
Member

@bbejanov bbejanov commented Sep 7, 2023

New version of DataEcon introduces functions to pack and unpack dates of different frequencies given year-period or year-month-day.

@bbejanov bbejanov changed the base branch from master to dev September 7, 2023 13:29
@bbejanov bbejanov marked this pull request as draft September 7, 2023 13:31
@bbejanov bbejanov self-assigned this Sep 7, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2023

Codecov Report

Merging #46 (f452ae4) into dev (b6bdcbb) will increase coverage by 0.17%.
The diff coverage is 93.15%.

@@            Coverage Diff             @@
##              dev      #46      +/-   ##
==========================================
+ Coverage   92.06%   92.24%   +0.17%     
==========================================
  Files          20       20              
  Lines        2785     2901     +116     
==========================================
+ Hits         2564     2676     +112     
- Misses        221      225       +4     
Files Coverage Δ
src/dataecon/C.jl 93.58% <95.00%> (+3.26%) ⬆️
src/momentintime.jl 91.04% <90.90%> (-0.18%) ⬇️
src/dataecon/DataEcon.jl 96.27% <95.45%> (-3.73%) ⬇️
src/dataecon/I.jl 93.65% <92.36%> (+4.76%) ⬆️

... and 3 files with indirect coverage changes

@bbejanov bbejanov changed the title Update DataEcon to v0.2.3 Update DataEcon to v0.2.4 Sep 8, 2023
@bbejanov bbejanov changed the title Update DataEcon to v0.2.4 Update DataEcon to v0.3 Oct 3, 2023
@bbejanov bbejanov marked this pull request as ready for review October 12, 2023 04:50
@bbejanov bbejanov requested a review from jasonjensen October 27, 2023 14:56
Copy link
Contributor

@jasonjensen jasonjensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @bbejanov . DataEcon has really expanded in its functionality!

I only have a few comments and some questions stemming mostly from my own curiosity.

const DE_VER_REVISION = 1

const DE_VER_SUBREVISION = 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are all hard-coded, would it be better if they were derived or do we need some check to ensure consistency here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This C.jl file is generated using Clang.jl package from the daec.h file in DataEcon. The idea is that the .h file has these constants hard-coded, and there is also a function in the library that returns the same version number (that's de_version()). At run time the user should compare the hardcoded constants in the .h file they are using to the value returned by the library function to make sure they match. For this Julia connector, this check is done in DataEcon.jl in __init__()

version = VersionNumber(unsafe_string(C.de_version()))
if version != VersionNumber(C.DE_VERSION)
throw(ErrorException("Library version $(version) does not match expected version $(C.DE_VERSION)."))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to specify in the error message that it is the DE library version causing the error here (as opposed to any number of other libraries the user may be loading).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - I'll update the error message.

fname = string(fname)
I._check(C.de_open(fname, handle))
handle = _do_open(readonly ? C.de_open_readonly : C.de_open, fname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set readonly to true by default; I think most open operations will be to get data and load it into memory; we wouldn't want to lock the file unnecessarily. Maybe change the option to write=false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# import ..LittleDict

"""
get_all_attributes(de, obejct_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Retrieve the names and value of all attributes of the object with the given id.
They are returned in a dictionary.
"""
function get_all_attributes(de::DEFile, id::C.obj_id_t; delim="‖")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the SQLITE call is just getting the columns and concatenating them with this delimiter, and then we on our side are then splitting the resulting string by this delimiter? Is this the best way to do this? I'm guessing there won't be many attributes, so it can't be that expensive, it just seems a little roundabout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the query that actually runs in de_get_all_attributes():
image

So, yes, it concatenates all names in one string with the given delimiter and all the values in the same order as the names and with the same delimiter in another string, and it also returns the number of attributes as a separate value. The main point here is that this query returns at most one row. It may be less efficient than simply SELECT `name`, `value` FROM `attributes` WHERE `id` = ?, but:

  • the simple query requires a more complex logic to be coded by the caller -- they'd have to write a loop and additional logic for all corner cases. Compare with I._list_catalog() to see what the user would have to do to iterate the rows of a table returned by SQL.
  • the simple query doesn't distinguish between an object that doesn't exist and an object that exists, but doesn't have any attributes - both cases return a table with 0 rows. The other query returns 0 rows if object doesn't exist and 1 row (with count=0, names="", values="") if object exists but doesn't have any attributes.

Btw, the case of 0-rows is handled by the library - de_get_all_attributes returns an error code (DE_OBJ_DNE), which here in the Julia connector is handled by I._check() to throw an exception. So the end user doesn't even have to check for that.

all_names = String[split(unsafe_string(names[]), delim);]
all_values = String[split(unsafe_string(values[]), delim);]
if length(all_names) != length(all_values)
error("number of names and values don't match. Try a different delimiter.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was that this error message doesn't seem that useful since the user is unlikely to call get_all_attributes themselves, but it seems that this is not the case; i.e. get_all_attributes is not called anywhere and is currently there for the sake of completeness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct: get_all_attributes is not used in the connector. It is made available for the end user to call, if they need to.

The idea is that, as an end user, if you want the value of a specific attribute, you call get_attribute. If you want to explore what attributes a given object has, you call get_all_attributes. Setting attributes can only be done one at a time with set_attribute.

@@ -56,16 +56,41 @@ _check(rc::Cint) = rc == 0 || throw(DEError())

const StrOrSym = Union{Symbol,AbstractString}

_to_de_scalar_val(value) = throw(ArgumentError("Unable to write value of type $(typeof(value))."))
_to_de_scalar_val(value) = throw(ArgumentError("Unable to write scalar value of type $(typeof(value))."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the pattern of some of your other code, do we want an @nospecialize here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#############################################################################
# write tseries and mvtseries


_de_array_data(; eltype, elfreq=C.freq_none, obj_type, nbytes, val) = (; eltype, elfreq, obj_type, nbytes, val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? It takes a number of keyword arguments (supplied in any order) and returns an array of their values in a specific order?

Copy link
Member Author

@bbejanov bbejanov Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) this is not my best moment :-)

The logic here is that all methods of _to_de_array would return a sturcture with all relevant fields, which are then used in _do_store_array. But instead of defining an actual struct, I am using a NamedTuple (sort of like an anonymous struct, if you will). This particular function, _de_array_data, acts as "constructor" - so you call it with keyword arguments and it constructs the NamedTuple for you.

This was easier to work with while prototyping, because Revise.jl doesn't handle changes in a struct - you have to restart the repl.

_to_de_array(::Val{true}, value::AbstractMatrix{ET}) where {ET} = (; eltype=_to_de_scalar_type(ET), type=C.type_matrix, nbytes=0, val=C_NULL)
_to_de_array(::Val{true}, value::AbstractVecOrMat{ET}) where {ET} = _de_array_data(;
_eltypefreq(ET)...,
obj_type=ndims(value) == 1 ? C.type_vector : C.type_matrix,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: a matrix in Julia is a strictly two-dimensional object.

@bbejanov
Copy link
Member Author

Thanks @jasonjensen for your review!

@bbejanov bbejanov merged commit cc2268f into dev Oct 30, 2023
6 checks passed
@bbejanov bbejanov deleted the dataecon branch October 30, 2023 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants