Potential optimization issues with BatchMapping #869

bsara · 2024-01-04T03:23:14Z

BatchMapping injects a set or list of the type which owns the field being resolved. This presents a problem: if a set of objects is provided and there isn't an appropriate equals override in the type, then you could get several of essentially the same object though with different memory references, which would cause the need to either find the truly unique values that you care about or would result in significantly more data being used in a query that would potentially take place.

It's been my experience that one should stick to more "scalar" types for keys like strings and numbers. An example would be getting all of the children of a parent by parent IDs rather than by the parents themselves.

…try alongside BiFunction registration methods. resolves spring-projects#870 & spring-projects#869

rstoyanchev · 2024-01-11T12:40:37Z

It is up to the parent type @SchemaMapping method to ensure the list of objects is without duplicates, or to return objects should have equals and hashcode. That said if your goal is to use the parent keys as input, it is possible as long as you specify the parent schema type since we can no longer derive that information from the method signature.

For example replace this:

@Controller
public class BatchListController {

	@QueryMapping
	public Collection<Course> courses() {
		// ...
	}

	@BatchMapping
	public List<Person> students(List<Course> courses) {
		// ...
	}
}

with this:

@Controller
public class BatchListController {

	@QueryMapping
	public Collection<Long> courses() {
		// return course id's
	}

	@BatchMapping(typeName = "Course")
	public List<Person> students(List<Long> courseIds) {
		// ...
	}
}

bsara · 2024-01-15T05:12:08Z

It is up to the parent type @schemamapping method to ensure the list of objects is without duplicates

While that may be true of the implementation provided, that is not necessarily how the data loader library works for data loaders with mapped batch loaders. That is part of the point of using a mapped batch loader, no? It makes it so that I don't have to worry about whether I have duplicates in the provided keys, because it is a Set.

With the course id example, does it automatically pass the ids of a type as the parameters of a batch mapping method if you have a signature and annotation like you have shown? I guess that I'm struggling to see how it knows what to pass as the argument unless it's just looking for an "id" field or getter.

Also, if the courses field is a list of IDs, then how would it resolve the entire type when selected? Does it resolve the entire type and not just the ids?

Another issue with how things are implemented is that this approach also seems to encourage inefficient use of data/batch loaders. Here is one example:

Let's say that I have two different types A and B. While both types aren't related to each other, they do share a similar field: createdBy. This field is of type User. If a query is submitted where both types are selected at the same nested level in the query and the createdBy field is selected for both types, multiple queries would end up being executed to fetch the User types instead of one. This is because two different batch/data loaders would be created, one for type A and one for type B, even though they are fetching user data with a user id (assuming but A and B types use a user id to fetch the value of createdBy). The data loader library is meant to be used such that a single data loader is used to fetch this data when resolving these two different yet similar fields. On top of this, it also reduces the amount of potential cache hits in a data loader during a single query resolution since multiple loaders would be used instead of a single loader.

Here is an example of the query described above:

# "createdBy" field of "a", "b", and "c" fields
# and the "modifiedBy" field of "c"
# should resolve using a single fetch via a
# single data loader, but they won't, they will
# resolve using 4 separate data loaders.

query {
  a { # type is A
    createdBy { # will result in separate fetch
      ...user
    }
  }
  b { # type is B
    createdBy { # will result in separate fetch
      ...user
    }
  }
  c { # type is C
    createdBy { # will result in separate fetch
      ...user
    }
    modifiedBy { # will result in separate fetch
      ...user
    }
  }
}

fragment user on User {
  id
  name
}

rstoyanchev · 2024-06-27T15:48:36Z

@BatchMapping is an (optional) shortcut so instead of writing this:

@SchemaMapping
public Object students(Course course, DataLoader<Course, List<Person>> dataLoader) {
    return dataLoader.load(course);
}

registry.forTypePair(Course.class, Person.class).registerBatchLoader((courses, env) -> {
    // ...
});

you can write:

@BatchMapping
public List<List<Person>> students(List<Course> courses) {
    // ...
}

The actual @SchemaMapping method used looks like this. It is written in a generic way, accessing the source/parent and using it as the key.

Support for using parent id's could look like this:

@BatchMapping
public List<List<Person>> students(List<Long> courseIds) {
    // ...
}

First, we can't work out the field coordinates from this method signature, so it would have to be specified in the type attribute of the annotation as a String (not strongly typed). Second, we wouldn't know how to access the id of the source/parent.

In short, I understand the intent, but not what you are proposing, how it should look, and how it could be done.

As a shortcut, @BatchMapping is convenient, but not using it isn't that much more effort. For full control, you can always use the long form:

@SchemaMapping
public Object students(Course course, DataLoader<Course, List<Person>> dataLoader) {
    return dataLoader.load(course.getId());
}

registry.forTypePair(Long.class, Person.class).registerBatchLoader((courseIds, env) -> {
    // ...
});

spring-projects-issues · 2024-07-04T15:55:43Z

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

bsara · 2024-07-08T21:54:37Z

what you are proposing, how it should look, and how it could be done

I guess that I'm not necessarily proposing anything specific at this point in time. I'm more just trying to report what I feel is an issue with how this API works.

It's a concern to me that the library docs suggest this as a viable option given the inefficiencies in data/batch loading that it presents. It also appears that there aren't alternatives either. Your suggestion to use @SchemaMapping still doesn't solve the problem either because my data loader signature will now be so generic, I could possibly get the wrong one. I guess what I'm saying is that this approach in general seems to me to break down entirely when it comes to creating and using data/batch loaders in an appropriate and efficient way because of the reliance on the "in" and "out" types of the batch loader (as specified in forTypePair.

I'm very willing to admit that I'm way off on this though, so please correct me if I'm wrong. I'd love to understand better if I'm not seeing this correctly. It's been several months since I opened this ticket too, so I'm kind of struggling to refresh my entire memory on things.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jan 4, 2024

bsara added a commit to bsara/spring-graphql that referenced this issue Jan 5, 2024

added ability to provide raw batch loader classes to BatchLoaderRegis…

c8eb4ff

…try alongside BiFunction registration methods. resolves spring-projects#870 & spring-projects#869

rstoyanchev added the status: waiting-for-feedback We need additional information before we can continue label Jan 11, 2024

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 15, 2024

This comment has been minimized.

Sign in to view

rstoyanchev added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jun 27, 2024

spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential optimization issues with BatchMapping #869

Potential optimization issues with BatchMapping #869

bsara commented Jan 4, 2024

rstoyanchev commented Jan 11, 2024 •

edited

Loading

bsara commented Jan 15, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rstoyanchev commented Jun 27, 2024 •

edited

Loading

spring-projects-issues commented Jul 4, 2024

bsara commented Jul 8, 2024

Potential optimization issues with BatchMapping #869

Potential optimization issues with BatchMapping #869

Comments

bsara commented Jan 4, 2024

rstoyanchev commented Jan 11, 2024 • edited Loading

bsara commented Jan 15, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rstoyanchev commented Jun 27, 2024 • edited Loading

spring-projects-issues commented Jul 4, 2024

bsara commented Jul 8, 2024

rstoyanchev commented Jan 11, 2024 •

edited

Loading

bsara commented Jan 15, 2024 •

edited

Loading

rstoyanchev commented Jun 27, 2024 •

edited

Loading