Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New behavior of igraph::disjoint_union #271

Open
bockthom opened this issue Oct 21, 2024 · 10 comments
Open

New behavior of igraph::disjoint_union #271

bockthom opened this issue Oct 21, 2024 · 10 comments

Comments

@bockthom
Copy link
Collaborator

The following comment has been posted by @maxloeffler in #260:


Unfortunately, somewhat about their new version breaks in our usage of igraph::disjoint_union (which worked prior to updating igraph). I researched a bit and found that the breaking change was introduced 4 months ago in this commit. I have not yet figured out a way to fix that issue.

Concrete error description

Error Message:

── Error (test-networks.R:404:5): Extraction of sub-networks ───────────────────
<vctrs_error_ptype2/vctrs_error_incompatible_type/vctrs_error_incompatible/vctrs_error/rlang_error/error/condition>
Error in `vctrs::vec_c(attr[[exattr[a]]], ea[[exattr[a]]])`: Can't combine `..1` <list> and `..2` <character>.
Backtrace:

  1. ├─global get.sample.network() at test-networks.R:404:5
  2. │ └─net.builder$get.multi.network() at util-networks.R:2179:5
  3. │   └─igraph::disjoint_union(authors.net, artifacts.net) at util-networks.R:1280:13
  4. │     └─vctrs::vec_c(attr[[exattr[a]]], ea[[exattr[a]]])
  5. └─vctrs (local) `<fn>`()
  6.   └─vctrs::vec_default_ptype2(...)
  7.     ├─base::withRestarts(...)
  8.     │ └─base (local) withOneRestart(expr, restarts[[1L]])
  9.     │   └─base (local) doWithOneRestart(return(expr), restart)
 10.     └─vctrs::stop_incompatible_type(...)
 11.       └─vctrs:::stop_incompatible(...)
 12.         └─vctrs:::stop_vctrs(...)
 13.           └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)

When breaking the variables are instanciated as follows:

Browse[1]> attr
$date
$date[[1]]
[1] "2004-10-09 18:38:13 UTC" "2005-02-09 18:49:49 UTC"

$date[[2]]
[1] "2010-07-12 10:05:36 UTC" "2010-07-12 11:05:35 UTC"

$date[[3]]
[1] "2010-07-12 12:05:34 UTC" "2010-07-12 12:05:40 UTC"

$date[[4]]
[1] "2010-07-12 12:05:41 UTC" "2010-07-12 12:05:42 UTC"


$artifact.type
$artifact.type[[1]]
[1] "Mail" "Mail"

$artifact.type[[2]]
[1] "Mail" "Mail"

$artifact.type[[3]]
[1] "Mail" "Mail"

$artifact.type[[4]]
[1] "Mail" "Mail"


$message.id
$message.id[[1]]
[1] "<[email protected]>" "<[email protected]>"

$message.id[[2]]
[1] "<[email protected]>" "<[email protected]>"

$message.id[[3]]
[1] "<[email protected]>" "<[email protected]>"

$message.id[[4]]
[1] "<[email protected]>" "<[email protected]>"


$thread
$thread[[1]]
[1] "<thread-1>" "<thread-1>"

$thread[[2]]
[1] "<thread-2>" "<thread-2>"

$thread[[3]]
[1] "<thread-3>" "<thread-3>"

$thread[[4]]
[1] "<thread-4>" "<thread-4>"


$weight
[1] 2 2 2 2 1 1 1 1 1

$type
[1] "Unipartite" "Unipartite" "Unipartite" "Unipartite"

$relation
[1] "mail" "mail" "mail" "mail"

Browse[1]> ea
$weight
[1] 1 1 1 1 1

$artifact.type
[1] "Feature" "Feature" "Feature" "Feature" "Feature"

$type
[1] "Unipartite" "Unipartite" "Unipartite" "Unipartite" "Unipartite"

$relation
[1] "callgraph" "callgraph" "callgraph" "callgraph" "callgraph"

Browse[1]> a
[1] 2

Originally posted by @maxloeffler in #260 (comment)

@bockthom
Copy link
Collaborator Author

bockthom commented Oct 21, 2024

I've moved this to a new issue. I am actually not sure whether this is a problem on our side or on the side of igraph.

One and a half years ago, I spotted an issue in how igraph::disjoint_union handles types. Therefore, I implemented a workaround in coronet and reported the bug to igraph and requested a fix from them. With yesterday's release, the igraph people have released a fix for the bug report that I have opened one and a half years ago. For more information about my workaround, see #236 and this commit in coronet. In this commit, I also have added a link to the bug report that I had submitted to igraph.

What we have to do on our side now: Check whether we need to adjust our code to be compatible with igraph's fix - or whether there is a problem in igraph's fix that we need to report to igraph again.

@bockthom
Copy link
Collaborator Author

bockthom commented Oct 23, 2024

I looked a little bit deeper into this issue:

exattr <- intersect(names(ea), names(attr)) means that exattr is c("weight", "artifact.type", "type", "relation"). Consequently, exattr[a] is "artifact.type". When we look at attr and ea, we can see that the artifact.type in the first one is a list, while in the second one, it is a vector:

$artifact.type
$artifact.type[[1]]
[1] "Mail" "Mail"

$artifact.type[[2]]
[1] "Mail" "Mail"

$artifact.type[[3]]
[1] "Mail" "Mail"

$artifact.type[[4]]
[1] "Mail" "Mail"

vs.

$artifact.type
[1] "Feature" "Feature" "Feature" "Feature" "Feature"

The question now is: Why do we have different data structures for the artifact type for the different networks? The first one (i.e., the list) originates from the author network, while the second one (i.e., the character vector) originates from the artifact network.

@maxloeffler Could you please investigate why we have different data structures for artifact.type in author networks and artifact networks? Thanks! It could also be that this is independent of the network type, but just related to how the edges and their attributes are constructed.

@bockthom bockthom added this to the v4.5 milestone Oct 23, 2024
@bockthom
Copy link
Collaborator Author

bockthom commented Oct 23, 2024

@maxloeffler I forgot about mentioning one additional check in our meeting:

Could you please also check what happens in our test suite when you comment out the changes of this commit: a953555 With the new igraph version 2.1.1, the workaround introduced in that commit should not be necessary any more (at least, I'd hope so). However, even if it is not necessary any more, let's keep the workaround for a while not to abruptly cease coronet support of previous igraph versions.

@maxloeffler
Copy link
Contributor

Reverting the mentioned commit leads to an additional 159 tests failing, so I assume it is still necessary. Also, I realized that in order to patch around in igraphs code of disjoint_union I would need to recompile igraph from source, I'll postpone this check therefore and first evaluate the edge attributes in coronet as discussed.

@bockthom
Copy link
Collaborator Author

Reverting the mentioned commit leads to an additional 159 tests failing

Damn. Did you run it with the edge-attribute-handling fix that solves the list vs. character vector problem? If so, that many tests failing is in stark contrast to my expectations, because the fix of igraph should make the mentioned commit needless.

Could you please dive deeper into what's the problem in the 159 failing tests? It would be good to know whether there are similar list vs. vector situations that we did not come across yet, or whether something else is completely broken. Thanks!

@bockthom
Copy link
Collaborator Author

Also, I realized that in order to patch around in igraphs code of disjoint_union I would need to recompile igraph from source

It depends. If it is just R code in a function that is directly executed by us, you could try to just redefine the entire function and call your function instead of the one from the package. If it is nested within some other functions or interacts with some non-public functions, than you might be right that recompiling could be necessary. But lets first deal with the edge attributes before moving on to new problems...

@maxloeffler
Copy link
Contributor

Okay, seems like the 159 was fake news. I must have had some other change effect the outcome without my knowledge. Reverting the patch of a953555 does only raise one additional error.

@maxloeffler
Copy link
Contributor

Regarding the edge attributes sometimes being of list type and other times being of character type, I found that calling get.multi.network (where igraph::disjoint_union is being called in) may break due to every edge attribute that is not explicitly handled in EDGE.ATTR.COMB. The default behavior for edge attributes when simplifying is concatenating all values of edges that are being merged into a list. However, if there is just one entry the entry will remain of type character. The only solution I see for this problem is to precautiously convert every edge attribute to list by default.

@bockthom
Copy link
Collaborator Author

I'm afraid you are right. This would involve numerous tests to be changed and checks to be adjusted. But there is one part I am not sure about:

The default behavior for edge attributes when simplifying is concatenating all values of edges that are being merged into a list.

Really? This only would happen if the attributes are vectors of multiple length already beforehand. If there is only one value, then it should be combined into a vector instead of a list.

So, in general, we should check for each edge attribute (there is a limited number of them) whether this problem can occur at all. And if so, it might be necessary to wrap exactly these attributes into lists.

@maxloeffler
Copy link
Contributor

  1. Regarding the concatenating list/vector thing: I definitely do not have the extent of R knowledge that you have, but isn't that exactly what happened in the error I reported earlier? In total, eight edges, in pairs of two edges, each with artifact.type being Mail (singular), have been simplified by the concat edge.comb.attr, and the result was this list:
$artifact.type
$artifact.type[[1]]
[1] "Mail" "Mail"

$artifact.type[[2]]
[1] "Mail" "Mail"

$artifact.type[[3]]
[1] "Mail" "Mail"

$artifact.type[[4]]
[1] "Mail" "Mail"
  1. I tried to overwrite the igraph::disjoint_union function to test out what breaks when we apply the same fix they applied for the edge attributes also to the vertex attributes, but I was not able as the function makes use of internal igraph functions that are not exposed post-compile. Though compiling igraph manually should not be too much of a hassle, I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants