Skip to content

Upgrade Guide to 4.x version 2

Kyle Szklenski edited this page Jun 5, 2024 · 6 revisions

New Firestore Implementation

I wanted to take a moment to write up a guide to fully replacing the previous Firestore implementation and to provide a little history on why I felt it had to be done. First, the history! But for those who just want to dive in, here's the upgrade guide!

The History

When I was building out the GodotFirebase video series conceptually, and the game which I'd like to show for said series, I ran into a lot of annoying issues. They weren't "bugs" in the strictest sense, but they were mental errors due to how the Firestore library was shaped. It was just massively over-complicated, and the API, despite being implemented by some incredibly smart engineers, never got to the point where I had intended it to be, which is: dirt simple, but allows complex code.

The problem, as I saw it, was that the Firestore implementation started off with the idea of a FirestoreTask, and became decidedly more complicated; in an effort to be able to handle all cases, the FirestoreTask class was given a huge, huge number of responsibilities even though it was really making or handling exactly one request. That sounds funny so let me back up a little bit.

FirestoreTask's therapist says it has a problem

The FirestoreTask was originally created to handle the details of making requests to Firestore and signaling when they were complete. It was, perhaps a bit oddly, designed in such a way that it was extremely generic, but had signals representing any possible specific thing that Firestore could do. For instance, it had signals for add_document, get_document, update_document, and so on. When I went to implement the commit API (a separate API from the REST API, believe it or not), it was able to handle the task just fine simply by giving it a different URL, but problematically it required adding the signal and all the various bits of match/case statements to handle what happened when the type was commit_document. This led to a whole cascade of changes throughout the Firestore implementation, including changes for the FirestoreCollection, FirestoreDocument, and FirestoreTask classes, as well as others. This was unacceptable to me. A simple change like implementing a new API that is fundamentally the same, just a different URL, as all other requests should not require so many different classes to change, especially not when those changes are in exactly the same fashion: add a bunch of ifs or match case statements, massage the data however it needs to be massaged for the appropriate signal, then fire that new signal and have it propagate throughout the entire codebase.

This was unacceptable to me. A single class should not be handling all those cases separately like that. It's sensible to have it handle making the request and errors, but couldn't it just have a "finished" signal and not (for instance) an add_finished, get_finished, update_finished, and so on signal? What's the difference?

Well part of the difference is the fact that certain API endpoints might not return uniform data. As an example, the delete function should actively not return a FirestoreDocument (or something), because why on earth would deleting the document return said document? It wouldn't make sense. Handling all these return types means that using polymorphism wouldn't work well if I'm true to covariance in my design (even if, at the time of original creation, Godot itself was incapable of that), which was my first instinct: have one base Task type, then inherit it multiple times for each type of request (add, get, update, commit, delete, etc.), and just have the base task handle all the specifics that were shared; you know, common object-oriented programming techniques! But what FirestoreTask lacked in flexibility, it had in completeness of implementation, and so it already had a data property that represented the results after making the request and it had no specific type on it. The FirestoreTask originally handled it by setting that data and then emitting said data within the specific signal and clearing it out of the task, but it occurred to me that signals in Godot still aren't really statically typed, so why even bother with the specific signal? We can still return whatever data exists after the request is completed through a single signal. Or, for that matter, through the data property!

Why so much propagation!?

This led to me fully deleting all signals except task_finished and error from the FirestoreTask as a first step. I wanted the shape of the API to be drastically simplified, and this seemed like a pretty good step - if you as a user no longer had to care what the specific signal was that you needed to await or connect to based on which Firestore function you called, maybe I could make it so your code could work a little more generically and cleanly. But then I realized that both the error and task_finished signals are the same conceptually as well, so I deleted the error signal. That seems really drastic, but let's just think about it for a second: if you get an error and can't complete the request, the task is definitely finished, it's just that the data is changing to something else (an error!). Conversely, if there's an error property on the Task and it has any values in it and the data from the Task is itself null, you would know that there was an error. This is very similar to the idea of having the data property at all in the first place, just with a slightly more specific interface to ensure no specific interpretation of the data has to be done in order to know that an error has occurred. And that felt nicer than just mashing everything together into the data property.

This was a great step and I'll be honest, I considered stopping it there in order to expedite the process of the refactor. However, it was still a pretty badly-breaking change, so I said to myself that if I'm in for a penny, I'm in for a pound, and why is it anyway that users even need to see this FirestoreTask implementation at all? After all, it's only really used internally to track the current progress of a given request - why should anyone not implementing the library have to care about this very obvious implementation detail? Why is it that, when they make requests to Firestore, they can't just get back the expected value for whatever their request was?

FirestoreDocument Supremacy

This led me to my next big change. That change involved switching over to completely make FirestoreTask an internal class, and just have FirestoreDocument be returned from the various calls in FirestoreCollection where appropriate, and then appropriate data returned from anywhere that doesn't really have the ability to return a document. I liked this idea conceptually but had a sort of issue with it once again: errors. If you call to a function and are awaiting it and an error occurs, you want to know about it - you want to get a specific signal (or some kind of notification) and be able to handle it. So the error signal concept came back to haunt me! Not really. Actually, I just put an error signal that passes an entire dictionary of data back to the user on the Firestore class itself, which now properly emits whenever any task in the entire Firestore implementation fails. But this does lead to a slightly annoying issue, which is that requests will actually show all their errors in the console output when they happen. I deemed that not only acceptable, but actually useful, because previously there were times where it would "hide" errors behind one of the various and sundry error signals, and there were a ton of those signals that had to get propagated from the Task all the way up the tree. I hated that fact, so was very happy to have removed it from everywhere except Firestore, and now it functions way, way more cleanly. Yes, you will have to do some interpretation of the results because the error value given back to you is just sort of a generic dictionary, but once you've hooked up to it and printed it off and seen what it gives you in the console, you should not have any trouble figuring out exactly what you need to do in order to build proper messaging to your users and handling the error appropriately. Nicely (imo), this whole change led to being able to also get rid of all the signals off of FirestoreCollection, so no more weird propagation throughout the entire tree: what you see is what you get!

Fixing FirestoreDocument, now that it's used everywhere

Now that I had the main API cleaned up so, so dramatically, it was time to move on to littler things. With the dawn of the use of FirestoreDocument in most cases - from function call parameters to return types - FirestoreDocument itself needed some love. The reason for that was pretty obvious to me, but maybe not to anyone else: when updating a Document, it wasn't clear at all if you needed to be updating the FirestoreDocument.document property (which was a dictionary), or the FirestoreDocument.doc_fields property (also a dictionary!?), or somehow both of them. You shouldn't have to worry about dealing with both of them, but having tried updating one and then the other in the Test Harness, I found that neither of them really worked the way I was expecting. This turned out to be due to a very subtle bug that one of my users found where if you completely remove a field from your document, Firestore was never notified that it was removed and hence kept it in the document for all subsequent requests. This was due to the fact that Firestore deals with field masks that are designed to indicate what fields have changed in order to more efficiently update documents, and if you removed a field from either FirestoreDocument.document or FirestoreDocument.doc_fields for that matter entirely, it would not "see" the field and so not include it in the mask, and Firebase's functionality is such that if it's not in the mask, it hasn't changed. To fix that, I needed to find a way to maintain the logic for adding that field to the mask even if a field was deleted, but then remove the field from the Document before sending the request, so that it isn't just (for example) updated to become a null value (let's say my first attempt was just having the value be null and seeing what that did; it of course still did not delete the field because why would it?!). This required a fix in the update function of the FirestoreCollection class, although probably I could have kept it in the Document, where it just searches over all fields in a document before the call to update and after the mask is built, and looks for fields which have been deleted. It then removes them from the request and proceeds to serialize it and send it off, thus ensuring that fields that you've deleted have actually remained deleted from the Document. Nice.

But what about both document and doc_fields - which do you even have to update in order to have your document be valid? Those would have remained an ugly mess if I left it at the above. So what I've done is I've introduced functions on FirestoreDocument to handle adding/changing fields on documents, and removing fields from them. This will require a little bit more work if you're intending to do a lot of FirestoreDocument manipulation, but actually it's not significantly more than what had to be done before, given how confusing it was when manipulating the document at all. And now you don't have to manage a separate doc_fields property that was just confusing before, because it was, again, completely an internal implementation detail! As such, I decided to remove the doc_fields property altogether, and have document just manage the data and keeping it as the type that Firebase needs to see so that serializing and deserializing it is incredibly simple. This is ultimately why I introduced the individual functions, and more changes are on the way!

FieldTransforms

While I was at it beefing up the FirestoreDocument class, in order to implement server timestamps, I needed the commit API, as mentioned way up above. This particular Firebase endpoint works slightly differently from the normal REST API endpoints: it accepts json, but is not technically REST-based. But because I had no way to manage the server-side transforms (server timestamp is a FieldTransform) for a document, I had to add them, and decided to make them separate from the rest of the document itself, so that they could be serialized separately, or saved until you actually want to call to do a commit and get the server timestamp (or any of the other FieldTransform types). Thus, I added the add_field_transform, remove_field_transform, and clear_field_transforms functions in order to handle those types of server-side manipulations to documents.

It's the Final Countdown

One final set of changes I wanted to make was to allow for retrieving cached documents from the Collection. Previously, we had a caching mechanism that wrote out documents to the hard-drive so that if you were offline and the user signed out but then came back online and signed in upon the game's next startup, it would update the document appropriately based on what changes were made when the user was offline previously. However, due to the nature of change tracking and the very ugly document vs doc_fields issue demonstrated above, there was a massive, massive issue with offline support, so I decided instead to turn both Documents and Collections into actual Nodes, rather than RefCounted types, and rather than write them out to the hard-drive when you're offline, I just keep the changes in memory in the scene tree. This allows us to actually cache the documents in-memory so that, say, if you go from one scene to another with get_tree().change_scene_to(), when you need to get access to the same document, you don't have to store it in your own singleton in order to get it without making a second full request: you can just set the second parameter's value in FirestoreCollection.get_doc() to true and if it's in the scene tree, it'll return it to you as-is. And if you've never requested that document before but still pass true? It'll still make a full request to get it, so there's very little harm if you're not intending to make server-side changes frequently to a single document to just always passing true. I have it defaulting to false because that was the default behavior before, but just keep this in mind as it's really useful.

I think that's all I'll write for now with the history, as I'd like to get to the guide itself. Funnily enough, the guide itself is...dirt simple. I think you'll all appreciate how much I've simplified the API.

Upgrade Guide

Previous to the big update, FirestoreTask was returned from more or less every function call on FirestoreCollection, which largely handled most of the requests to Firestore except for ones which involved querying over the entire set of FirestoreCollections. But it was also returned from queries over FirestoreCollections, i.e. from functions such as Firebase.Firestore.list(), or Firebase.Firestore.query(). Since I have removed the FirestoreTask from public consumption, you no longer have to await a specific signal on a FirestoreTask at all. Instead, the API now looks like this:

var collection = Firebase.Firestore.collection("my_collection") # Get the appropriate collection where the document is
var document = await collection.get_doc("my_document_name")

Before, the API looked something like this:

... # Same collection as above
var task = collection.get_doc("my_document_name")
var result = await task.get_document # Or await task.task_finished and use task.data to get the resulting document, which I consider to be even uglier than this

And that's pretty much all there is to it: just await the call to the collection, and expect the result to not be a task, but instead the type you would normally expect a function call to return! Here are some other conversions:

Update

... # Same collection as above
var updated_document = await collection.update(document) # Same document as above!

Commit

# This completely new API is used in the following way:
... # Same collection as above, same document as well
document.add_field_transform(ServerTimestampTransform.new(document.doc_name, true, "server_timestamp_field_name")) # True here specifies that the field must already exist on the document in order to get the value; it's possible to pass false, but I'm not sure what that does honestly, haven't tested it. server_timestamp_field_name is just the name of wherever you want to store the server timestamp, and while you might think that because this is being added to the document that the transform would have access to the document and so could get the doc_name itself, I didn't want to bidirectionally couple them at all

var changes = await collection.commit(document) # Returns a dictionary of changes based on whatever transforms were made to the document; note this also clears out all the transforms from the document so that you don't have to worry about them accidentally applying a second time if you need to re-commit at a later time, but that also means you have to re-add any that you _do_ want to apply a second time for any reason (such as the Increment or Decrement transforms).

print(changes.added) # Prints a list of objects with the added fields, in this format: { "key": value, "new": new_value }
print(changes.removed) # Prints a list of the fields which were removed from the document, in this format: { "key": value }
print(changes.updated) # Prints a list of objects with the updated values, in this format: { "key": value, "old": previous_value, "new": new_value }
print(changes.is_listener) # Only true when the values in the document are updated from a call to on_snapshot!

Add Document

# Basically the same as before
... # Same collection as previously
var document = await collection.add("my_document", {"data": some_values, "some_other_data": some_other_values, ...}) # Second parameter is just a Json dictionary with the values lain out exactly as you would want them to be, no need to call to convert them to Firestore types or anything, it's all automatic

Changing a Document

var document = await collection.get_doc("my_document", true) # Pull from cache if it exists there

# Update a field value, or add it if it doesn't exist
var new_values = { "my_new_values" : ["value1", "value2", "value3"] } # This is just an example, update it however you want
document.add_or_update_field("some_other_data", new_values) # Changes the entire value of field "some_other_data" to be equal to the object new_values

# Remove a field from the document altogether
document.remove_field("data")

# Get a value from the document
document.get_value("some_other_data") # Will return the new_values object here

# Get all keys from the document, mostly internal but can be used however you want
var keys = document.keys()

Deleting a Document

... # Same collection and document as above
var did_delete = await collection.delete(document)
print(did_delete) # Boolean value saying whether or not the document was successfully deleted

Using on_snapshot to get live document updates (sort of)

Note: If you don't call this function at all, there's no real chance of getting charged for its use, unless you've implemented a polling update yourself. I do not recommend doing that anymore, but instead if you need to get updated on values somewhat regularly and with little to no confusion about when or why, use the below. Incidentally, if you hook up to the changed signal on your document by hand, it will similarly be called if you also called to on_snapshot, but it'll also be called for normal updates that you yourself make to the document - is_listener will be false for those updates, rather than if the update came from the server, but you'll get all updates either way.

# BEWARE! This function can be expensive. I limit you to no shorter than updating every 2 minutes, or in other words you cannot update more frequently than once every 2 minutes. This sounds extreme, but having done some calculations for updating every 2 seconds, I found users of this plugin would pay somewhere on the order of $40K per hour of use per user. I was not willing to let that happen by accident just because Godot does not support HTTP2+ style requests, but having gotten requests for auto-updating times innumerable, I decided to finally implement polling with a *really* high timeout in order to prevent huge, unexpected costs, or if you're on the free tier, severe restrictions on response times.

# Same document as above
var listener = document.on_snapshot(func(changes): print(changes), 60 * 5) # This means 5 minutes between poll times, which will take precedence over the default 2 minutes; anything lower than 2 minutes becomes 2 minutes, though, for wallet safety's sake, as I mentioned.

# Get updates for a while

listener.stop() # Call this in order to stop listening for changes and stop potentially being charged for it

That's mostly all there is to this rework, honestly. Adding transforms and stuff is really nice, but the brunt of it is in removal of FirestoreTask for public consumption, and updating the FirestoreCollection node to consistently return the types that you'd normally expect, as well as the Firestore type to do so as well. For Firebase.Firestore.list() and Firebase.Firestore.query() calls, they just now return arrays with the expected types in them, whether that be documents (from query), or collections (from list), or some reasonable Json representation of them. It's really quite nice.