Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving variable to aesthetic mapping (input asked) #406

Closed
mtennekes opened this issue Apr 5, 2020 · 21 comments
Closed

Improving variable to aesthetic mapping (input asked) #406

mtennekes opened this issue Apr 5, 2020 · 21 comments
Assignees

Comments

@mtennekes
Copy link
Member

mtennekes commented Apr 5, 2020

tmap 3.0 will be released in a few days. For this version, I want to improve the variable mapping, so any feedback/tips is welcome.

There is a need for two features:

1. Integer variables

Treat a numeric variable as integer. This is needed because currently the legend labels will be 0 to 10, 10 to 20, 20 to 30, where the presumed intervals are [0, 10), [10, 20) and [10, 30], so open righthand-side except the last). When the variable is an integer, then the legend labels should be 0 to 9, 10 to 19, 20 to 29 (or 30).

I'm thinking about style = "integer" or an additional argument as.integer. The latter probably makes more sense since many break styles (current options are c("cat", "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust", "bclust", "fisher", "jenks", and "log10_pretty")) should handle integers slightly differently. For instance, "log10_pretty" will return 0 to 1, 1 to 10, 10 to 100 when the variable is continuous and should return 0, 1 to 9, 10 to 99 when it is an integer.

What do you think? If we go for the second option, what would be a good name for the argument? as.integer, as.continuous, as.discrete, ....?

Next question: should tmap set the default value to this argument to continuous, or should the default value be determined by whether all variable values are integers?

(see also https://github.com/mtennekes/tmap/issues/258 and https://github.com/mtennekes/tmap/issues/399)

2. Specific value to color mapping

Sometimes all a user (including myself) wants is to map specific data variables to specific colors.
How should this be done? Keep in mind that it should work for integer and categorical data.

For categorical data, we could let the user assign a named color vector to the argument palette, where the names correspond to the levels.

How do we do this for numeric data? A color table? If so, it makes sense to add the labels in this color table as well, rather than via the labels argument. Any ideas?

(see also r-spatial/mapview#208)

@Nowosad @Robinlovelace @sjewo @jannes-m @tim-salabim @edzer @rsbivand @mcSamuelDataSci @zross

@mtennekes mtennekes self-assigned this Apr 5, 2020
@tim-salabim
Copy link

Hi @mtennekes,
I have been struggling with this same issue recently as well. For mapview I think, I have it under acceptable control now. Acceptable meaning that in the scope of mapview I don't care too much about whether the legend maps [0, 1) or [0, 1]. Currently, and mostly for convenience, mapview treats all integer as numeric values and all character values as factors.

@rsbivand
Copy link

rsbivand commented Apr 5, 2020

Tangentially, there is infrastructure in classInt to handle interval closure (intervalClosure=). On occasion, I've found that running classInt twice, first with dataPrecision=NULL, the default, then with style="fixed" and non-default dataPrecision=, or just using dataPrecision=. tmap::tm_fill() has the equivalent interval.closure= argument, but I don't see dataPrecision=.

In addition, @dieghernan has contributed a new style: "headtails" with a vignette. I'm looking to submit to CRAN soon, to make this available.

@mtennekes
Copy link
Member Author

Thanks @tim-salabim and @rsbivand.

Currently, tmap also treats integers as numeric and character as factors, but since there were a few use cases in which the data values are clearly integers, it would be good to adjust the breaks (or at least the labels) accordingly.

The interval closure is not my main concern. It is under control: the argument legend.format contains a parameter called digits which is similar to dataPrecision in classInt. Probably would have been easier for me to use dataPrecision in the implementation. Looking forward to test this new style headtails in tmap.

@sjewo
Copy link
Collaborator

sjewo commented Apr 5, 2020

Hi @mtennekes,
those are nice improvements for tmap!

For my use cases the new legend labels for integers are really helpful. I would prefer a additional option "as.integer" with a default value determined by the class of the variable (integer or numeric).

I think a named color vector would be fine for factors and numeric (or integer) variables as well. A unified approach to define a palette would be more user friendly, but I don't know if this would be too complicated for floating point numbers.

@edzer
Copy link
Contributor

edzer commented Apr 5, 2020

Hi @mtennekes about the integer legend: 10 years ago I would have thought "great!", now I think it is over-engineering. Does ggplot2 have this feature?

For the color ramps: stars now adopts a vector of colors mapping one-to-one with an integer variable, starting at 1 (like levels of a factor); r-spatial/stars#128

@mtennekes
Copy link
Member Author

Color assignment is working now. Also the colors from stars are used (I check whether there are duplicated levels and if so, apply droplevels).

library(tmap)
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 2.4.2, PROJ 5.2.0

data(World)

# palette of named colors for a character/factor variable
tm_shape(World) + tm_polygons("income_grp", 
    palette = c("2. High income: nonOECD" = "red",
        "3. Upper middle income" = "green", 
        "4. Lower middle income" = "pink", 
        "1. High income: OECD" = "blue",
        "5. Low income" = "purple"))

# palette of named colors for a numeric variable
World$income_grp_int <- as.integer(World$income_grp)
tm_shape(World) + tm_polygons("income_grp_int", style = "cat", 
    palette = c("2" = "red", 
        "3" = "green", 
        "4" = "pink", 
        "1" = "blue",
        "5" = "purple"))

    
# use the colors of a stars object
#getwd()
r = read_stars("pr_landcover_wimperv_10-28-08_se5.img", 
    RAT = "Land Cover Class", proxy = TRUE)
# downloaded from https://s3-us-west-2.amazonaws.com/mrlc/PR_landcover_wimperv_10-28-08_se5.zip

qtm(r) + tm_legend(outside = TRUE)

image

@Nowosad
Copy link
Member

Nowosad commented Apr 6, 2020

@mtennekes, thank you for opening this discussion.

1. Integer variables

I think it would be a nice addition to tmap, but it is not crucial.
It depends on the effort you would make to add this feature.
An as.integer argument sounds fine.

2. Specific value to color mapping

This is, in my opinion, a way more interesting and important feature.
I already started this discussion at https://github.com/mtennekes/tmap/issues/276 and at https://github.com/mtennekes/tmap/issues/388.

It would be also great to make it possible to extend the color mapping to external symbologies (see https://github.com/mtennekes/tmap/issues/65 and r-spatial/discuss#36).

Update:
The above examples look great!
I have some questions about the last examples - does it drop empty levels by default? It is possible to not drop them? How can someone edit the legend there (one category does not have a name)?

@mtennekes
Copy link
Member Author

Good point @Nowosad !

Hmm, why isn't there an argument to specify whether unused levels are dropped (@mtennekes?)

That specific file is crappy: I think it doesn't contain unused levels, but duplicated levels. Also the black-colored category has level "". It is not easy to change the legend afterwards. Much easier is to replace all the "" values with NA, and set colorNA = "black".

@Nowosad
Copy link
Member

Nowosad commented Apr 6, 2020

You can find some examples with unused levels at r-spatial/stars#245 (comment).

@edzer
Copy link
Contributor

edzer commented Apr 6, 2020

droplevels drops unused factor levels. I wouldn't do that automatically: if you plot time series of factor maps, at some times certain levels may not be present but you'd still want them in the legend.

@Nowosad
Copy link
Member

Nowosad commented Apr 6, 2020

I agree @edzer, but I think there should be an argument in tmap invoking droplevels. It could be FALSE by default.

@mtennekes
Copy link
Member Author

Exactly what I'm working on: an argument drop.levels which is by default FALSE.

And I'll add an argument as.integer which formats the labels as integers (so 0 to 9, 10 to 19 etc). For know, I'll only do this for style = "pretty" and "log10_pretty", which should be sufficient.

Thanks for your input!

@zross
Copy link

zross commented Apr 6, 2020

This is totally great! I provided a bit of code for reference

  1. I'm going to disagree with Edzer about the over-engineering. I actually think the legend-integer issue is very important. As it stands, the tmap for integer literally does not make sense since you can't tell whether a given integer on the margins falls into one category or another. Really important -- and I like your solution.

  2. I don't have an opinion on the 2nd issue beyond what has already been supplied.

library(sf)
library(tmap)
library(dplyr)

counties <- read_sf("https://cdn.jsdelivr.net/npm/us-atlas@3/counties-10m.json") %>% 
  filter(stringr::str_sub(id,1,2) == "36")

n <- nrow(counties)
set.seed(100)
counties <- counties %>% 
  mutate(
    vals_int = sample(1:10, n, replace = TRUE),
    vals_cont = rnorm(n)
  )

tm_shape(counties) + 
  tm_polygons("vals_int", style = "pretty")

tm_shape(counties) + 
  tm_polygons("vals_cont")

image

image

@mtennekes
Copy link
Member Author

That's a very nice example @zross. It illustrates another problem:

pretty(runif(100, min = 0, max = 10))
#> [1]  0  2  4  6  8 10
pretty(1L:10L)
#> [1]  0  2  4  6  8 10

When I opened this issue, I thought that changing the labels at the righthand-side of the intervals would be enough (e.g. from 0-10, 10-20 to 0-9, 10-19, etc). However, in this case it would make more sense to have 1-2, 3-4, 5-6, 7-8, 9-10 (given n=5). So pretty is not very useful here.

Any ideas how to tackle this problem? @rsbivand does classInt offer a method for this?

@rsbivand
Copy link

rsbivand commented Apr 6, 2020

No, pretty() expects that x= is a continuous variable. classIntervals(x, n=5, style="pretty", intervalClosure="right") gives the classes, but not the break labels.

@mtennekes
Copy link
Member Author

data(World)

# as.count is TRUE for integers if style = pretty, fixed, or log10_pretty

# N (natural numbers, with 0)
World$x <- sample(0:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")

# N+ (natural numbers, positive)
World$x <- sample(1:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")

# Z (integers)
World$x <- sample(-10:10, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")
#> Variable(s) "x" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

# show as continuous (old way)
World$x <- sample(1:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x", as.count = FALSE)

# style: fixed
tm_shape(World) + tm_polygons("x", breaks = c(1, 5, 10, 20))

# scientific notation (decided to use the set notation)
tm_shape(World) + tm_polygons("x", breaks = c(0, 1, 3, 5, 10, 20), 
   legend.format = list(scientific = TRUE))

# style: log10pretty (continuous)
tm_shape(World) + tm_polygons("pop_est", style = "log10_pretty")

# style: log10pretty (count)
tm_shape(World) + tm_polygons("pop_est", as.count = TRUE, style = "log10_pretty")

Created on 2020-04-07 by the reprex package (v0.3.0.9001)

@mcSamuelDataSci
Copy link

mcSamuelDataSci commented Apr 7, 2020 via email

@mcSamuelDataSci
Copy link

mcSamuelDataSci commented Apr 7, 2020 via email

@rsbivand
Copy link

rsbivand commented Apr 8, 2020

Re: https://github.com/mtennekes/tmap/issues/406#issuecomment-609428252 classInt 0.4-3 with headtails style on CRAN.

@mtennekes
Copy link
Member Author

Re: #406 (comment) classInt 0.4-3 with headtails style on CRAN.

... and already supported by tmap

data(World)
tm_shape(World) + tm_symbols(col = "pop_est_dens",
    style = "headtails", style.args = list(thr = 1))

@mtennekes
Copy link
Member Author

tmap 3.0 on its way to CRAN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants