-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish directedness of author networks and edge-construction algorithm #137
Conversation
…orithm From now on, the 'author.directed' parameter in the network configuration only affects the directedness of an author network, and the newly introduced parameter 'respect.temporal.order' is responsible for choosing the edge-construction algorithm. Now, it is possible to build author networks which are directed but don't respect the temporal order. That is, for two subsequent events, there will be two edges between the corresponding nodes, one in each direction. In addition, it is also possible to build author networks which are undirected but respect the temporal order. That is, only one edge (annotated with the temporally later date) will be present between the corresponding nodes of two subsequent events. So, let there be an event of author A to common artifact at time 1 and an event of B to the same artifact at time 2. Then, we are able to construct the following networks (only the list of edges is presented here): author.directed=TRUE, respect.temporal.order=TRUE: A<--2--B author.directed=TRUE, respect.temporal.order=FALSE: A--1-->B, A<--2--B author.directed=FALSE, respect.temporal.order=TRUE: A--2-- B author.directed=FALSE, respect.temporal.order=FALSE: A--1--B, A--2--B For more information, please see #6. To keep the former behavior, the default value of the newly introduced network configuration parameter 'respect.temporal.order' is NA. If the default value this parameter is used, the edge-construction function uses the 'directed' parameter for 'respect.temporal.order'. This fixes #6. Signed-off-by: Thomas Bock <[email protected]>
Add a test for constructing undirected networks which respect the temporal order and add another test for constructing directed networks without respecting the temporal order. The tests test the functionality introduced in commit 4fc59a0. Signed-off-by: Thomas Bock <[email protected]>
When working on the functions 'construct.edge.list.from.key.value.list' and 'construct.network.from.edge.list', I recognized some inconsistent uses of '#' in comments. Comments which begin at the beginning of a line should start with '##'. This commit fixes that in the above mentioned functions. Signed-off-by: Thomas Bock <[email protected]>
Update README.md accordingly after introducing the new network configuration parameter 'respect.temporal.order'. This updates the README.md regarding the changes in commit 4fc59a0. Signed-off-by: Thomas Bock <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your effort, @bockthom! This is a very good first shot, unfortunately, I identified an error in the logic (but prepared a patch). Please read the (one important) comment carefully.
There are some other comments that need some work, but nothing too major.
By the way, very nice solution with the default for the new NetworkConf
attribute! I like this. 😀
README.md
Outdated
* [`TRUE`, *`FALSE`*] | ||
- `respect.temporal.order` | ||
* Denotes whether the temporal order of activities shall be respected when constructing author networks. When respecting the temporal order, there will be only one edge for the later of two corresponding events between two authors. Otherwise, there will be mutual edges. (For instance, when constructing author networks with relation mail, there will be only an edge for an answer in a mail thread to the autors of every previous e-mail in this mail thread if the temporal order is respected. If the temporal order is not respected, there is an edge for every answer to all the authors of the previous e-mails in the mail thread, but also an edge for every previously sent e-mail from its author to the author of each subsequent mail in this mail thread.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this description, but it seems a bit long for this list. What do you think about moving most of this description to the subsection Relations
in the Section Network construction
(or a new subsection)? There, we would have enough space to also insert the table from issue #6 (or an equivalent illustration).
Alternatively, we could introduce an **Example**
item as for skip.threshold
. 😉 Although, I think that a new section as suggested above would be more appropriate as the matter is quite important.
Additionally:
-
... relation `mail`.
-
... previous emails in this mail thread ...
-
If the temporal order is not respected, there is not only an edge ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about moving most of this description to the subsection
Relations
in the SectionNetwork construction
(or a new subsection)? There, we would have enough space to also insert the table from issue #6 (or an equivalent illustration).
Adding all the information regarding network construction would overload the Relations
section. Therefore, I would add a new section after the Relations
section, named Different edge-construction algorithms for author networks
. There we can add also the table from issue #6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like the heading for the new subsection, although, I would remove the "Different".
README.md
Outdated
* [`TRUE`, *`FALSE`*] | ||
- `respect.temporal.order` | ||
* Denotes whether the temporal order of activities shall be respected when constructing author networks. When respecting the temporal order, there will be only one edge for the later of two corresponding events between two authors. Otherwise, there will be mutual edges. (For instance, when constructing author networks with relation mail, there will be only an edge for an answer in a mail thread to the autors of every previous e-mail in this mail thread if the temporal order is respected. If the temporal order is not respected, there is an edge for every answer to all the authors of the previous e-mails in the mail thread, but also an edge for every previously sent e-mail from its author to the author of each subsequent mail in this mail thread.) | ||
* **Note**: If no value is speficied (i.e., `NA` is used), the value of `author.directed` is used for determining whether to respect the temporal order during edge construction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no value is specified explicitly by the user ...
speficied
is by far the most undiscoverable typo I have had for some time. 😀
README.md
Outdated
* The (time-based) directedness of edges in an artifact network | ||
* **Note**: This parameter does not take effect for now, as the co-change relation is always undirected, while the call-graph relation is always directed. | ||
* The directedness of edges in an artifact network | ||
* **Note**: This parameter does not take effect for now, as the `cochange` relation is always undirected, while the `callgraph` relation is always directed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and we do not have data available for the other relations (
issue
) to exhibit edge information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just neglect that (at least for now), as there will be edges, at least, for the issue
relation, soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to mention all relations, especially, since there is the full list of relations listed right above (under artifact.relation
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ok, I just added a sentence for the other relations there.
tests/test-networks-author.R
Outdated
proj.conf$update.value("artifact.filter.base", FALSE) | ||
net.conf = NetworkConf$new() | ||
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE, | ||
respect.temporal.order = FALSE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation is wrong here.
tests/test-networks-author.R
Outdated
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) | ||
proj.conf$update.value("artifact.filter.base", FALSE) | ||
net.conf = NetworkConf$new() | ||
net.conf$update.values(updated.values = list(author.relation = "cochange", respect.temporal.order = TRUE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please additionally set author.directed = FALSE
here? As you explicitly test a certain combination of attribute values, we should explicitly set those values.
tests/test-networks-author.R
Outdated
|
||
## edge attributes | ||
data = data.frame(comb.1. = c("Björn", "Björn", "Olaf", "Olaf", "Olaf", "Olaf", "Karl", "Karl"), | ||
comb.2. = c("Olaf", "Olaf", "Karl", "Karl", "Thomas", "Thomas", "Thomas", "Thomas"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the easy fix for issue #6 will not be as easy as expected... 😞 The network constructed here is – sadly – wrong!
The idea was that, in a network ignoring the temporal order of events (i.e., respect.temporal.order = FALSE
), we connect all developers touching an artifact, but – and that is the important part – direct them from the actor to the other authors. Following this logic, there are wrong edges here since the first row (Line 346) denotes the actor and the second one (Line 347) denotes the recipient of the edge:
-
When we look at the data we find two authors working on the feature
A
, Björn and Olaf. (analogously for the other artifacts!)
https://github.com/se-passau/codeface-extraction-r/blob/9d7bc6ac0ef4b4c4ab9d9209c92304e8741a239b/tests/codeface-data/results/testing/test_feature/feature/commits.list#L1-L4 -
Both authors work once on the artifact
A
. As a consequence, we expect two edges in the network:- Björn→Olaf and
- Olaf→Björn.
-
In the expected network of this test, we have two edges: Björn→Olaf and Björn→Olaf – which is obviously wrong.
Okay, how do we solve this? In the last hours, I debugged this and prepared a patch in the end. Unfortunately, this temporarily breaks some network tests with issue data because the order of edges is messed up.
Please add the patch to your branch and fix the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I know what I did forget to do. Before opening this PR, I wanted to check something but did not remember any more what I had wanted to check. When I have read your comment, I remembered again, that I wanted to check for the edge directions again.
I am sorry for that. Moreover, it was not intended that it is your job to spend hours on debugging that. I would just have been fine with a review comment like "the edge directions are wrong here, please debug that".
I will have a look at your patch tomorrow and also will fix the other tests. That's the least I can do.
Thanks for your patch and also thanks for spending so much time on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I know what I did forget to do. Before opening this PR, I wanted to check something but did not remember any more what I had wanted to check. When I have read your comment, I remembered again, that I wanted to check for the edge directions again.
I guess, everybody knows that feeling. 😁
I am sorry for that. Moreover, it was not intended that it is your job to spend hours on debugging that. I would just have been fine with a review comment like "the edge directions are wrong here, please debug that".
Sorry for the misleading formulation. For the patch, I needed only some short time, I just spent some further time on debugging what was happening inside igraph
– to find a way to easily fix the tests again. Essentially, I have done some other things in the last hours, too. 😉
I will have a look at your patch tomorrow and also will fix the other tests. That's the least I can do.
Thanks for your patch and also thanks for spending so much time on that.
You are welcome. Keep me posted.
In the course of PR #137, it occurred that the temporal order of edges was broken in directed networks. The cause of this error is the technical implementation to obtain all combinations of items in a key-value set: The call 'combn(nodes, 2)' in the function 'construct.edge.list.from.key.value.list' does not care for the order of the items, although we need to care. To fix this problem, we need to extract the edge list for each item in an item combination separately and use the second one as the receiver of the edge. Additionally, the function 'construct.edge.list.from.key.value.list' is adapted to match the coding conventions: - access the vertex column via its name 'data.vertices', - use 'seq_len' instead of the ':' operator, and - add curly braces for a single-line if-statement. Note: This temporarily breaks some tests due to igraph's normalization on undirected edge lists which reorders some edges (apparently) randomly. Signed-off-by: Claus Hunsen <[email protected]>
Also extend documentation in the source code for the edge-construction algorithm for better comprehensibility. Signed-off-by: Thomas Bock <[email protected]>
After a change of the edge-construction algorithm in 70b3c82, for some author networks in the test suite, the internal representation of the corresponding igraph object has changed: The igraph object now stores the edges in another order (but the network itself has not changed). This commit fixes those tests. Signed-off-by: Thomas Bock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, @bockthom. There just some minor things to do before we can merge.
README.md
Outdated
@@ -224,6 +225,41 @@ Relations determine which information is used to construct edges among the verti | |||
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts are connected when they reference each other (i.e., one artifact calls a function contained in the other artifact). | |||
* For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits (same as for the relation `cochange`). | |||
|
|||
#### Edge-construction algorithms for author networks | |||
|
|||
When constructing author networks, we have four different edge-construction possibilities, based on two different configuration parameters in the [`NetworkConf`](#networkconf)). On the one hand, networks can either be directed or undirected (configured via `author.directed` in the [`NetworkConf`](#networkconf)). On the other hand, we can construct edges based on the temporal order of events or just construct edges neglecting the temporal order of events (configured via `respect.temporal.order` in the [`NetworkConf`](#networkconf)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a superfluous closing parenthesis after the first sentence.
README.md
Outdated
|
||
When constructing author networks, we have four different edge-construction possibilities, based on two different configuration parameters in the [`NetworkConf`](#networkconf)). On the one hand, networks can either be directed or undirected (configured via `author.directed` in the [`NetworkConf`](#networkconf)). On the other hand, we can construct edges based on the temporal order of events or just construct edges neglecting the temporal order of events (configured via `respect.temporal.order` in the [`NetworkConf`](#networkconf)). | ||
|
||
When respecting the temporal order, there will be only one edge for the later of two corresponding events between two authors. Otherwise, there will be mutual edges. For instance, when constructing author networks with relation mail, there will be only an edge for an answer in a `mail` thread to the autors of every previous e-mail in this mail thread if the temporal order is respected. This also includes loop edges (i.e., edges from one vertex to itself). If the temporal order is not respected, there is an edge for every answer to all the authors of the previous e-mails in the mail thread, but also an edge for every previously sent e-mail from its author to the author of each subsequent mail in this mail thread. In this case, no loop edges are contained in the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
... one edge for the later event of two related events ...
- relation mail → relation `mail`
mail
thread → mail thread- autors → authors
-
Potentially, this also includes loop edges ...
-
... temporal order is not respected, there is not only an edge for every
README.md
Outdated
|
||
When respecting the temporal order, there will be only one edge for the later of two corresponding events between two authors. Otherwise, there will be mutual edges. For instance, when constructing author networks with relation mail, there will be only an edge for an answer in a `mail` thread to the autors of every previous e-mail in this mail thread if the temporal order is respected. This also includes loop edges (i.e., edges from one vertex to itself). If the temporal order is not respected, there is an edge for every answer to all the authors of the previous e-mails in the mail thread, but also an edge for every previously sent e-mail from its author to the author of each subsequent mail in this mail thread. In this case, no loop edges are contained in the network. | ||
|
||
In the following, we show the an example defining the edge construction for all combinations of temporally (un-)ordered respect of data and (un-)directed networks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ...
thean example ... - "respect of data" sounds weird. I guess, this is a mistake here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was your sentence, I just copied the last part of this sentence from issue #6 without changing it. Will just remove the respect
here, then it should sound not that weird any more.
README.md
Outdated
<tr> | ||
<td><strong>network directed</strong></td> | ||
<td><em>A ←(1999)– A<br>A ←(2000)– B</em></td> | ||
<td>A –(1999)→ B<br>A ←(2000)– B</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the edge A –(1998)→ B
is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Missed that. It's time that the weather changes, can't really concentrate when it is so hot, which leads to those not satisfactory results... Will be fixed immediately.
README.md
Outdated
<tr> | ||
<td><strong>network undirected</strong></td> | ||
<td>A –(1999)– A<br>A –(2000)– B</td> | ||
<td><em>A –(1999)– B<br>A –(2000)– B</em></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the edge A –(1998)– B
is missing.
-Move the detailed information on how the parameter works to an individual section. -Add an example (taken from #6) and extend it by adding loop edges. -Fix minor issues on README from review in PR #137. Signed-off-by: Thomas Bock <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
Improve the section regarding edge construction for author networks in the README.md file. Signed-off-by: Thomas Bock <[email protected]> Signed-off-by: Barbara Eckl <[email protected]> Signed-off-by: Claus Hunsen <[email protected]>
To clarify, that the network configuration parameter 'respect.temporal.order' only applies for the construction of author networks, the parameter is renamed to 'author.respect.temporal.order'. Signed-off-by: Thomas Bock <[email protected]>
Escape arrow brackets. Signed-off-by: Thomas Bock <[email protected]>
Also thank you, @ecklbarb, for helping us with formulating the corresponding section in the README. As you have not directly been involved in the implementation itself, you have another perspective on the new enhancements. As a consequence, your feedback and help was very valuable for improving the README! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README.md
Outdated
</tr> | ||
<tr> | ||
<td><strong>network directed</strong></td> | ||
<td><em>A ←(2)– A<br>A ←(3)– B</em></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I just looked up a thing that came to my mind this morning and I found this:
Consider running the following code representing the use case for our example:
source("util-init.R")
x = list("<thread-1>" = data.frame(data.vertices = c("A", "A", "B"), date = c(1998:2000), artifact = "<thread-1>"))
construct.edge.list.from.key.value.list(x, NetworkConf$new(), directed = FALSE, respect.temporal.order = TRUE)
The result is currently the following:
$vertices
name
X1 A
X2 A
X3 B
$edges
Var1 Var2 date artifact
1 A A 1999 X
2 B A 2000 X
3 B A 2000 X
Note that author B gets connected to "both authors A" who sent an e-mail to this mail thread earlier (Var1
is from
and Var2
is to
). While this is as reasonable as the version presented in the README right now, this is a mismatch between documentation and implementation.
We have two options here:
-
Update the documentation in the README file.
-
Update the implementation to use
unique(nodes.processed.set)
here:
https://github.com/se-passau/codeface-extraction-r/blob/cccd82c22cb19ad4ea417b61f9aa05412ecfc464/util-networks.R#L912or
unique(c(nodes.processed.set, item.node))
here:
https://github.com/se-passau/codeface-extraction-r/blob/cccd82c22cb19ad4ea417b61f9aa05412ecfc464/util-networks.R#L919
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Update the documentation in the README file.
We decided to update the README file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we also may introduce an additional simplification function to solve that somewhen in the future. See #138 for more details.
After getting feedback from @SCPhantom, improve the README section on edge-construction algorithms for author networks again. Signed-off-by: Thomas Bock <[email protected]> Signed-off-by: Barbara Eckl <[email protected]> Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
In the course of PR se-sic#137, it occurred that the temporal order of edges was broken in directed networks. The cause of this error is the technical implementation to obtain all combinations of items in a key-value set: The call 'combn(nodes, 2)' in the function 'construct.edge.list.from.key.value.list' does not care for the order of the items, although we need to care. To fix this problem, we need to extract the edge list for each item in an item combination separately and use the second one as the receiver of the edge. Additionally, the function 'construct.edge.list.from.key.value.list' is adapted to match the coding conventions: - access the vertex column via its name 'data.vertices', - use 'seq_len' instead of the ':' operator, and - add curly braces for a single-line if-statement. Note: This temporarily breaks some tests due to igraph's normalization on undirected edge lists which reorders some edges (apparently) randomly. Signed-off-by: Claus Hunsen <[email protected]>
Also extend documentation in the source code for the edge-construction algorithm for better comprehensibility. Signed-off-by: Thomas Bock <[email protected]>
-Move the detailed information on how the parameter works to an individual section. -Add an example (taken from se-sic#6) and extend it by adding loop edges. -Fix minor issues on README from review in PR se-sic#137. Signed-off-by: Thomas Bock <[email protected]>
Prerequisites
dev
.Description
From now on, the 'author.directed' parameter in the network
configuration only affects the directedness of an author network, and
the newly introduced parameter 'respect.temporal.order' is responsible
for choosing the edge-construction algorithm.
Now, it is possible to build author networks which are directed but
don't respect the temporal order. That is, for two subsequent events,
there will be two edges between the corresponding nodes, one in each
direction.
In addition, it is also possible to build author networks which are
undirected but respect the temporal order. That is, only one edge
(annotated with the temporally later date) will be present between the
corresponding nodes of two subsequent events.
So, let there be an event of author A to common artifact at time 1 and
an event of B to the same artifact at time 2. Then, we are able to
construct the following networks (only the list of edges is presented
here):
To keep the former behavior, the default value of the newly introduced
network configuration parameter
respect.temporal.order
isNA
. If thedefault value this parameter is used, the edge-construction function
uses the
directed
parameter forrespect.temporal.order
.Changelog
Added
respect.temporal.order
for determining the edge-construction algorithm (Distinguish directedness of networks and edge-construction algorithm #6, 4fc59a0)Changed/Improved