Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish directedness of author networks and edge-construction algorithm #137

Merged
merged 15 commits into from
Aug 12, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
- Add method `metrics.is.scale.free()` to decide whether a network is scale-free or not (80f47512ce7634c81f3708865eb1697b0151f549, 97161b1228a157cbe747c2e98b88f27f572d695e)
- Tests for comparing networks that are created differently (66d37ceb8227ba120e4e06fb7b8334a73b19c076, 4a9d6b9a543a18c2e2f38aa7e05592f84039a745, a37c27731bbd2e96bb0b3730a8e6b429616248f0)
- Method `clear.edge.attributes` to clear the edge attributes list of the network configuration (15f7587ed6590637991c2f811a26f9e860229288)
- Add network configuration parameter `respect.temporal.order` for determining the edge-construction algorithm (#6, 4fc59a0ff68c8600b574e868458e5d53dbdc405b)

### Changed/Improved
- Add committer information to the commit list in the test data
Expand All @@ -24,7 +25,8 @@
- Add implementation of Codeface to compute the scale-free attributes for small networks (80f47512ce7634c81f3708865eb1697b0151f549)
- Remove data inconsistencies when re-setting the commit, mail, and PaStA data (569552687e70ec7a67bed7da2c77cf56f5434dc6)
- Switch the order of the `type` and `kind` attributes of vertices in bipartite networks for testing reasons (351311a30f545d31e43e0065a6147185edec647b)
- Update README file (8380dc62f53591e53762a3692e44973535b5dcea, f590453aadde2dc9224d9f418f4a92c59cd49795, 792cb9558e04868bcad230dfdc50a5ace6d63d35, 8c2a8255966cc6c38ea16e64a28b135ef8456e58)
- Update README file (8380dc62f53591e53762a3692e44973535b5dcea, f590453aadde2dc9224d9f418f4a92c59cd49795, 792cb9558e04868bcad230dfdc50a5ace6d63d35, 8c2aI8255966cc6c38ea16e64a28b135ef8456e58, 5cfc5aec95eb3be5afcc549671477dcaf73d21b6)
- Distinguish directedness of author networks and edge-construction algorithm (#6, 4fc59a0ff68c8600b574e868458e5d53dbdc405b)

### Fixed
- Change the type of all commit count default values to Integer (62c033948d3449ed3bb64ec044036bcda56afdae)
Expand Down
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -470,8 +470,12 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
* **Note**: The author--artifact relation in bipartite and multi networks is configured by `artifact.relation`!
* possible values: [*`"mail"`*, `"cochange"`, `"issue"`]
- `author.directed`
* The (time-based) directedness of edges in an author network
* The directedness of edges in an author network
* [`TRUE`, *`FALSE`*]
- `respect.temporal.order`
* Denotes whether the temporal order of activities shall be respected when constructing author networks. When respecting the temporal order, there will be only one edge for the later of two corresponding events between two authors. Otherwise, there will be mutual edges. (For instance, when constructing author networks with relation mail, there will be only an edge for an answer in a mail thread to the autors of every previous e-mail in this mail thread if the temporal order is respected. If the temporal order is not respected, there is an edge for every answer to all the authors of the previous e-mails in the mail thread, but also an edge for every previously sent e-mail from its author to the author of each subsequent mail in this mail thread.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this description, but it seems a bit long for this list. What do you think about moving most of this description to the subsection Relations in the Section Network construction (or a new subsection)? There, we would have enough space to also insert the table from issue #6 (or an equivalent illustration).

Alternatively, we could introduce an **Example** item as for skip.threshold. 😉 Although, I think that a new section as suggested above would be more appropriate as the matter is quite important.

Additionally:

  1. ... relation `mail`.

  2. ... previous emails in this mail thread ...

  3. If the temporal order is not respected, there is not only an edge ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving most of this description to the subsection Relations in the Section Network construction (or a new subsection)? There, we would have enough space to also insert the table from issue #6 (or an equivalent illustration).

Adding all the information regarding network construction would overload the Relations section. Therefore, I would add a new section after the Relations section, named Different edge-construction algorithms for author networks. There we can add also the table from issue #6.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like the heading for the new subsection, although, I would remove the "Different".

* **Note**: If no value is speficied (i.e., `NA` is used), the value of `author.directed` is used for determining whether to respect the temporal order during edge construction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no value is specified explicitly by the user ...

speficied is by far the most undiscoverable typo I have had for some time. 😀

* [`TRUE`, `FALSE`, *`NA`*]
- `author.all.authors`
* Denotes whether all available authors (from all analyses and data sources) shall be added to the network as a basis
* **Note**: Depending on the chosen author relation, there may be isolates then
Expand All @@ -484,8 +488,8 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
* **Note**: Additionally, this relation configures also the author--artifact relation in bipartite and multi networks!
* possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`]
- `artifact.directed`
* The (time-based) directedness of edges in an artifact network
* **Note**: This parameter does not take effect for now, as the co-change relation is always undirected, while the call-graph relation is always directed.
* The directedness of edges in an artifact network
* **Note**: This parameter does not take effect for now, as the `cochange` relation is always undirected, while the `callgraph` relation is always directed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and we do not have data available for the other relations (mail and issue) to exhibit edge information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just neglect that (at least for now), as there will be edges, at least, for the issue relation, soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to mention all relations, especially, since there is the full list of relations listed right above (under artifact.relation).

Copy link
Collaborator Author

@bockthom bockthom Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ok, I just added a sentence for the other relations there.

* [`TRUE`, *`FALSE`*]
- `edge.attributes`
* The list of edge-attribute names and information
Expand Down
88 changes: 88 additions & 0 deletions tests/test-networks-author.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
## Copyright 2017 by Christian Hechtl <[email protected]>
## Copyright 2017 by Felix Prasse <[email protected]>
## Copyright 2018 by Barbara Eckl <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -238,6 +239,47 @@ test_that("Network construction of the undirected author-cochange network", {
expect_true(igraph::identical_graphs(network.built, network.expected))
})

test_that("Network construction of the undirected but temorally ordered author-cochange network", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", respect.temporal.order = TRUE))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please additionally set author.directed = FALSE here? As you explicitly test a certain combination of attribute values, we should explicitly set those values.


## construct objects
proj.data = ProjectData$new(project.conf = proj.conf)
network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf)

## build network
network.built = network.builder$get.author.network()

## vertex attributes
authors = data.frame(name = c("Björn", "Olaf", "Karl", "Thomas"),
kind = TYPE.AUTHOR,
type = TYPE.AUTHOR)

## edge attributes
data = data.frame(comb.1. = c("Olaf", "Karl", "Thomas", "Thomas"),
comb.2. = c("Björn", "Olaf", "Olaf", "Karl"),
date = get.date.from.string(c("2016-07-12 16:00:45", "2016-07-12 16:06:10",
"2016-07-12 16:06:32", "2016-07-12 16:06:32")),
artifact.type = "Feature",
hash = c("5a5ec9675e98187e1e92561e1888aa6f04faa338", "1143db502761379c2bfcecc2007fc34282e7ee61",
"0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526"),
file = c("test.c", "test3.c", "test2.c", "test2.c"),
artifact = c("A", "Base_Feature", "Base_Feature", "Base_Feature"),
weight = 1,
type = TYPE.EDGES.INTRA,
relation = "cochange"
)

## build expected network
network.expected = igraph::graph.data.frame(data, directed = FALSE, vertices = authors)

expect_true(igraph::identical_graphs(network.built, network.expected))
})

test_that("Network construction of the directed author-cochange network", {

## configurations
Expand Down Expand Up @@ -279,6 +321,52 @@ test_that("Network construction of the directed author-cochange network", {
expect_true(igraph::identical_graphs(network.built, network.expected))
})

test_that("Network construction of the directed author-cochange network without respecting temporal order", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE,
respect.temporal.order = FALSE))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is wrong here.


## construct objects
proj.data = ProjectData$new(project.conf = proj.conf)
network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf)

## build network
network.built = network.builder$get.author.network()

## vertex attributes
authors = data.frame(name = c("Björn", "Olaf", "Karl", "Thomas"),
kind = TYPE.AUTHOR,
type = TYPE.AUTHOR)

## edge attributes
data = data.frame(comb.1. = c("Björn", "Björn", "Olaf", "Olaf", "Olaf", "Olaf", "Karl", "Karl"),
comb.2. = c("Olaf", "Olaf", "Karl", "Karl", "Thomas", "Thomas", "Thomas", "Thomas"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the easy fix for issue #6 will not be as easy as expected... 😞 The network constructed here is – sadly – wrong!

The idea was that, in a network ignoring the temporal order of events (i.e., respect.temporal.order = FALSE), we connect all developers touching an artifact, but – and that is the important part – direct them from the actor to the other authors. Following this logic, there are wrong edges here since the first row (Line 346) denotes the actor and the second one (Line 347) denotes the recipient of the edge:

  1. When we look at the data we find two authors working on the feature A, Björn and Olaf. (analogously for the other artifacts!)
    https://github.com/se-passau/codeface-extraction-r/blob/9d7bc6ac0ef4b4c4ab9d9209c92304e8741a239b/tests/codeface-data/results/testing/test_feature/feature/commits.list#L1-L4

  2. Both authors work once on the artifact A. As a consequence, we expect two edges in the network:

    • Björn→Olaf and
    • Olaf→Björn.
  3. In the expected network of this test, we have two edges: Björn→Olaf and Björn→Olaf – which is obviously wrong.


Okay, how do we solve this? In the last hours, I debugged this and prepared a patch in the end. Unfortunately, this temporarily breaks some network tests with issue data because the order of edges is messed up.

Please add the patch to your branch and fix the tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I know what I did forget to do. Before opening this PR, I wanted to check something but did not remember any more what I had wanted to check. When I have read your comment, I remembered again, that I wanted to check for the edge directions again.

I am sorry for that. Moreover, it was not intended that it is your job to spend hours on debugging that. I would just have been fine with a review comment like "the edge directions are wrong here, please debug that".

I will have a look at your patch tomorrow and also will fix the other tests. That's the least I can do.
Thanks for your patch and also thanks for spending so much time on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I know what I did forget to do. Before opening this PR, I wanted to check something but did not remember any more what I had wanted to check. When I have read your comment, I remembered again, that I wanted to check for the edge directions again.

I guess, everybody knows that feeling. 😁

I am sorry for that. Moreover, it was not intended that it is your job to spend hours on debugging that. I would just have been fine with a review comment like "the edge directions are wrong here, please debug that".

Sorry for the misleading formulation. For the patch, I needed only some short time, I just spent some further time on debugging what was happening inside igraph – to find a way to easily fix the tests again. Essentially, I have done some other things in the last hours, too. 😉

I will have a look at your patch tomorrow and also will fix the other tests. That's the least I can do.
Thanks for your patch and also thanks for spending so much time on that.

You are welcome. Keep me posted.

date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45", "2016-07-12 16:05:41",
"2016-07-12 16:06:10", "2016-07-12 16:05:41", "2016-07-12 16:06:32",
"2016-07-12 16:06:10", "2016-07-12 16:06:32")),
artifact.type = "Feature",
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338",
"3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61",
"3a0ed78458b3976243db6829f63eba3eead26774", "0a1a5c523d835459c42f33e863623138555e2526",
"1143db502761379c2bfcecc2007fc34282e7ee61", "0a1a5c523d835459c42f33e863623138555e2526"),
file = c("test.c", "test.c", "test2.c", "test3.c", "test2.c", "test2.c", "test3.c", "test2.c"),
artifact = c("A", "A", "Base_Feature", "Base_Feature", "Base_Feature", "Base_Feature",
"Base_Feature", "Base_Feature"),
weight = 1,
type = TYPE.EDGES.INTRA,
relation = "cochange"
)

## build expected network
network.expected = igraph::graph.data.frame(data, directed = TRUE, vertices = authors)

expect_true(igraph::identical_graphs(network.built, network.expected))
})

test_that("Network construction of the undirected simplified author-cochange network", {

## configurations
Expand Down
6 changes: 6 additions & 0 deletions util-conf.R
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,12 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf,
allowed = c(TRUE, FALSE),
allowed.number = 1
),
respect.temporal.order = list(
default = NA, # default value will be determined from the 'author.directed' parameter
type = "logical",
allowed = c(TRUE, FALSE, NA),
allowed.number = 1
),
author.all.authors = list(
default = FALSE,
type = "logical",
Expand Down
43 changes: 29 additions & 14 deletions util-networks.R
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@ NetworkBuilder = R6::R6Class("NetworkBuilder",
author.net.data = construct.edge.list.from.key.value.list(
private$proj.data$get.artifact2author(),
network.conf = private$network.conf,
directed = private$network.conf$get.value("author.directed")
directed = private$network.conf$get.value("author.directed"),
respect.temporal.order = private$network.conf$get.value("respect.temporal.order")
)

## construct network from obtained data
Expand Down Expand Up @@ -182,7 +183,8 @@ NetworkBuilder = R6::R6Class("NetworkBuilder",
author.net.data = construct.edge.list.from.key.value.list(
private$proj.data$get.thread2author(),
network.conf = private$network.conf,
directed = private$network.conf$get.value("author.directed")
directed = private$network.conf$get.value("author.directed"),
respect.temporal.order = private$network.conf$get.value("respect.temporal.order")
)

## construct network from obtained data
Expand Down Expand Up @@ -213,7 +215,8 @@ NetworkBuilder = R6::R6Class("NetworkBuilder",
author.net.data = construct.edge.list.from.key.value.list(
private$proj.data$get.issue2author(),
network.conf = private$network.conf,
directed = private$network.conf$get.value("author.directed")
directed = private$network.conf$get.value("author.directed"),
respect.temporal.order = private$network.conf$get.value("respect.temporal.order")
)

## construct network from obtained data
Expand Down Expand Up @@ -842,27 +845,39 @@ NetworkBuilder = R6::R6Class("NetworkBuilder",
#'
#' Important: The input needs to be compatible with the function \code{get.key.to.value.from.df}.
#'
#' If directed the order of things in the sublist is respected and the 'edge.attr's hold the
#' If \code{directed}, as default, the order of things in the sublist is respected (if
#' \code{respect.temporal.order} is not set manually) and the 'edge.attr's hold the
#' vector of possible edge attributes in the given list.
#'
#' @param list the list of lists with data
#' @param network.conf the network configuration
#' @param directed whether or not the network should be directed [default: FALSE]
#' @param respect.temporal.order whether to respect the temporal order when constructing edges,
#' i.e., whether to only add edges from the later event to the previous one.
#' If \code{NA} is passed, the default value is taken.
#' [default: directed]
#'
#' @return a list of two data.frames named 'vertices' and 'edges' (compatible with return value
#' of \code{igraph::as.data.frame})
construct.edge.list.from.key.value.list = function(list, network.conf, directed = FALSE) {
construct.edge.list.from.key.value.list = function(list, network.conf, directed = FALSE,
respect.temporal.order = directed) {
logging::loginfo("Create edges.")
logging::logdebug("construct.edge.list.from.key.value.list: starting.")

# initialize an edge list to fill and the set of nodes
## if no value for respect.temporal.order is specified (indicated by an NA value
## coming from the network configuration), use the value of directed parameter instead
if (is.na(respect.temporal.order)) {
respect.temporal.order = directed
}

## initialize an edge list to fill and the set of nodes
nodes.processed = c()
edge.list = data.frame()

keys = names(list)
keys.number = length(list)

if (directed) {
if (respect.temporal.order) {

## for all subsets (sets), connect all items in there with the previous ones
edge.list.data = parallel::mclapply(list, function(set) {
Expand All @@ -878,11 +893,11 @@ construct.edge.list.from.key.value.list = function(list, network.conf, directed
return(NULL)
}

# queue of already processed artifacts
## queue of already processed artifacts
edge.list.set = data.frame()
nodes.processed.set = c()

# connect the current item to all previous ones
## connect the current item to all previous ones
for (item.no in 1:nrow(set)) {
item = set[item.no, ]

Expand All @@ -899,7 +914,7 @@ construct.edge.list.from.key.value.list = function(list, network.conf, directed
combinations = cbind(combinations, item.edge.attrs, row.names = NULL) # add edge attributes
edge.list.set = rbind(edge.list.set, combinations) # add to edge list

# mark current item as processed
## mark current item as processed
nodes.processed.set = c(nodes.processed.set, item.node)
}

Expand Down Expand Up @@ -958,7 +973,7 @@ construct.edge.list.from.key.value.list = function(list, network.conf, directed
cols.which = network.conf$get.value("edge.attributes") %in% colnames(edge.attrs)
edge.attrs = edge.attrs[, network.conf$get.value("edge.attributes")[cols.which], drop = FALSE]

# add edge attributes to edge list
## add edge attributes to edge list
edgelist = cbind(edge, edge.attrs)

return(edgelist)
Expand Down Expand Up @@ -990,7 +1005,7 @@ construct.edge.list.from.key.value.list = function(list, network.conf, directed
#' For example for a list of authors per thread, where all authors are connected if they are
#' in the same thread (sublist).
#'
#' If directed the order of things in the sublist is respected and the 'edge.attr's hold the
#' If \code{directed}, the direction of edges in the edge list is respected and the 'edge.attr's hold the
#' vector of possible edge attributes in the given list.
#'
#' @param vertices data frame with vertex data
Expand All @@ -1006,7 +1021,7 @@ construct.network.from.edge.list = function(vertices, edge.list, network.conf, d
## get unique list of vertices to produce
nodes.processed = unique(vertices)

# if we do not have nodes AND the edge.list is empty, return rightaway
## if we do not have nodes AND the edge.list is empty, return rightaway
if (length(nodes.processed) == 0) {
return(create.empty.network(directed = directed))
}
Expand All @@ -1024,7 +1039,7 @@ construct.network.from.edge.list = function(vertices, edge.list, network.conf, d

net = igraph::set.edge.attribute(net, "weight", value = 1)

# transform multiple edges to edge weights
## transform multiple edges to edge weights
if (network.conf$get.value("simplify")) {
net = simplify.network(net)
}
Expand Down