-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vertex attributes (updated) #93
Conversation
Add methods required to add vertex attributes to the graph. Add method to calculate commit count as vertex attribute Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
- Rename aggregation to aggregation.level and all to project - Rename add.vertex.attributes.* methods to add.vertex.attribute.* - Rename compute.vertex.attribute.with.n2r to add.vertex.attribute - Rename compute.vertex.attribute to split.and.add.vertex.attribute - Move networks.to.ranges to util-split and rename it to split.data.by.networks - Remove explicit return statement where possible - Add missing documentation Signed-off-by: Felix Prasse <[email protected]>
…nd issues Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Add test for add.vertex.attribute Add test for split.and.add.vertex.attribute Add test for add.vertex.attribute.commit.count Add test for add.vertex.attribute.author.email Add test for add.vertex.attribute.artifact.count Add test for add.vertex.attribute.first.activity Signed-off-by: Felix Prasse <[email protected]>
The tests were referencing to a wrong project data variable Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Add test for add.vertex.attribute.author.role.simple Add test for add.vertex.attribute.active.ranges Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Add artifact attributes Add commit count based on committer Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
In this patch, we mainly streamline the tests on network attributes, by applying proper line breaks, introducing global constants, and fixing the accidental introduction of factors for the developer roles. This tackles some comments in PR #67. Signed-off-by: Claus Hunsen <[email protected]>
With respect to PR #67, logging statements, line breaks, and capitalization in sqldf-statements are updated. Signed-off-by: Claus Hunsen <[email protected]>
With respect to PR #67, indentation, line breaks, coding flaws, and anything else is tackled. Adds a FIXME item regarding the first activity of an author, which will be tackled in an upcoming commit. Signed-off-by: Claus Hunsen <[email protected]>
The section dividers always end at column 77. ;) Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
To get the very first and the very last activity across all data sources, i.e., the earliest/latest timestamp of all data sources, for a ProjectData object, we add a parameter 'outermost' to 'ProjectData$get.data.timestamps' that makes this possible. Props to @bockthom for his support. Signed-off-by: Claus Hunsen <[email protected]> Signed-off-by: Thomas Bock <[email protected]>
Formerly, we provided three different aggregation levels in the function 'split.data.by.networks': namely, "range", "cumulative", and "project". We now extend and adapt this to the levels "range", "cumulative", "all.ranges", "project.cumulative", "project.all.ranges", and "complete". The various aggregation levels work as follows: - range: The project data will be split exactly to the time ranges specified by the networks' names. - cumulative: The project data will be split exactly to the time ranges specified by the networks' names, but in a cumulative manner. - all.ranges"}: The project data will be split exactly to the time range specified by the start of the first network and end of the last network. All data instances will contain the same data. - project.cumulative: The same splitting as for \code{"cumulative"}, but all data will start at the beginning of the project data and *not* at the beginning of the first network. - project.all.ranges: The same splitting as for \code{"all.ranges"}, but all data will start at the beginning of the project data and *not* at the beginning of the first network. All data instances will contain the same data. - complete: The same splitting as for \code{"all.ranges"}, but all data will start at the beginning of the project data and end at the end of the project data. All data instances will contain the same data. In this patch, we properly extend all functions and tests. Props to @bockthom for his support in parts of this patch. Signed-off-by: Claus Hunsen <[email protected]> Signed-off-by: Thomas Bock <[email protected]>
The function 'add.vertex.attribute.author.role' now adds author- classification attributes based on classification results. The function 'add.vertex.attribute.author.role.function' is now the one using a classification function as a parameter. For this to work, we add the range identifier as a parameter to the anonymous calculation functions for adding the vertex attributes. Signed-off-by: Claus Hunsen <[email protected]> Signed-off-by: Thomas Bock <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]> Signed-off-by: Thomas Bock <[email protected]>
For better modularity, the function 'get.commit.data' is moved to the data module, right where it obviously belongs. Signed-off-by: Claus Hunsen <[email protected]>
This code originates from @flx5's PR on vertex attributes (#67). Signed-off-by: Claus Hunsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After finally reviewing this pull request, let me conclude:
This pull request looks pretty good, but there is still room for improvement. Basically, there are two different types of issues:
- Wrong or insufficient documentation (needs to be fixed, some of my comments may be too meticulous -- @clhunsen it is up to you whether you address all of them or just the important ones)
- General consistency problems regarding copyright headers (discussion needed, see individual comments)
Please don't be angry about me for spotting all the small stuff...
util-data.R
Outdated
#' @param split A list of numerics, indicating numbers of weeks into which the selected data | ||
#' is to be split | ||
#' | ||
#' @return A data.frame indicating the slected \code{columns}, split into the given numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in "slected"
tests/test-split.R
Outdated
## [email protected] | ||
## (c) Felix Prasse, 2017 | ||
## [email protected] | ||
## (c) Thomas Bock, 2018 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why you have added my name in all of the copyright headers in those files. Maybe it is due to the Signed-Off-Tags. Currently, in the released version, my name is not part of any copyright header (except for the network-metrics script). If you would like to make this consistent, we would have to search for all Signed-Off-Tags containing my name and adapt the copyright headers of the corresponding files accordingly...
Long story short, I don't like inconsistencies. Either we remove my name from all copyright headers in this pull request or I will search for all Signed-Off-Tags containing my name to make all copyright headers consistent (not any more in this pull request, but possibly in one of the next ones...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, this affects also commit 4810ac5, in which the function get.commit.data
is moved from util-core-peripheral.R
to util-data.R
: The function get.commit.data
originally was implemented by Ferdinand. Ferdinand is part of the copyright header of util-core-peripheral.R
(which still contains functions implemented by him), but Ferdinand is not part of the copyright header of util-data.R
. As the moved function was implemented by him, he should also get part of the copyright header there...
My suggestion: Remove commit 536965e from this pull request (just in order to merge consistent things and not block other subsequent pull requests from being merged) and postpone updating the copyright headers to a later pull request. Then we should carefully fix all the copyright headers... I guess, we should discuss this personally...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the commit for now. We need to discuss this especially with respect to the addition of a license in #90.
tests/test-split.R
Outdated
net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) | ||
|
||
## retrieve project data and network builder | ||
project.data = ProjectData$new(proj.conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment seems not to be up to date as the network builder is not retrieved here...
util-networks-covariates.R
Outdated
#' @param aggregation.level Determines the data to use for the attribute calculation. | ||
#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, | ||
#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and | ||
#' \code{"complete"}. See a\code{split.data.by.networks} for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in "a\code{split.data.by.networks}".
Also affects all the functions below...
util-networks-covariates.R
Outdated
return(nets.with.attr) | ||
} | ||
|
||
#' Add author role attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe start a new subsection here for a better a structure of the file?
## * Role --------------------------------------------------------------
util-networks-covariates.R
Outdated
#' | ||
#' @param list.of.networks The network list | ||
#' @param project.data The project data | ||
#' @param name The attribute name to add [default: "commit.count"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong default value here. It should be commit.count.author.not.committer
.
util-networks-covariates.R
Outdated
#' | ||
#' @param list.of.networks The network list | ||
#' @param project.data The project data | ||
#' @param name The attribute name to add [default: "commit.count"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wrong default value here. It should be commit.count.committer.not.author
.
util-networks-covariates.R
Outdated
#' @param name.column The name of the author or committer column | ||
#' | ||
#' @return A list of networks with the added attribute | ||
add.vertex.attribute.commit.count = function(list.of.networks, project.data, name = "commit.count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a short comment here regarding the name of the helper function: Currently, a non-expert user of the network library may use this function as the default function for commit count attributes, as the name of this functions sounds reasonable, instead of add.vertex.attribute.commit.count.author
as auto-completion may also suggest add.vertex.attribute.commit.count
in the first place. Are there any suggestions for a better name of the helper function to not be confused with the actual function to use?
(I would like to have this function as a private function if we would have dealt with classes here... - just to make my intention clear.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just renamed it to add.vertex.attribute.commit.count.helper
. This should be enough for now.
Would we have a proper R package, this would be easy to hide. 😉
util-networks-covariates.R
Outdated
#' @param default.value The default value to add if a vertex has no matching value [default: 0] | ||
#' | ||
#' @return A list of networks with the added attribute | ||
add.vertex.attribute.commit.count.committer = function(list.of.networks, project.data, name = "commit.count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about changing the default name here (and also in the documentation) to commit.count.committer
?
In the end, only the usual commit count -- which is the author commit count -- should have the default name commit.count
-- or would you also prefer to rename that to commit.count.author
then for consistency reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talk mainly about authors for now, we should withhold from renaming until we tackle #84.
return(nets.with.attr) | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing this pull request, I hit on the (crazy) idea to perform author-role classification not based on author networks or author commit count, but on committer networks or committer commit count.
I know, it's getting freaky now, but I will add this idea to issues #84 or #92 soon to just not forget the idea.
This patch introduces only some minor fixes and inprovements to the documentation for the changes in PR #93. The most important change is the renaming of the function 'add.vertex.attribute.commit.count' to 'add.vertex.attribute.commit.count.helper', mainly for unambiguity and consistency reasons. Props to @bockthom for being that precise. Signed-off-by: Claus Hunsen <[email protected]>
This is the updated version of PR #67, bringing some important improvements (diff:
https://github.com/flx5/codeface-extraction-r/compare/vertex-attributes...clhunsen:vertex-attributes). Basically, this tackles #34.
Thanks to @flx5 for this high effort! 😃
The open issues related to vertex attributes, all originating from the discussion in PR #67, are collected in issue #92.