Mobile-first: Building Google Photos
Ten years ago, I was lucky enough to be the tech lead and manager of the team that built Google Photos for Android and delivered a seamless mobile-first experience.
Mobile-first was a term used at Google at the time to rally the company to build great, local-feeling user experiences for mobile devices. These user experience goals are quite similar to the current local-first movement in web applications. This is tremendously exciting to me because we’re getting closer to the dream of highly capable web apps. Even when I was building some of the earliest mobile apps in 2008, I would tell people I was looking forward to the day we could write UIs in web tech again.
Recently, Sujay described a map of sync to map out the decision space for various applications. Here, I’ll walk through the tradeoffs we took when building Google Photos.
One of our core principles was that Google Photos should feel like a snappy native local app. Users shouldn’t see loading spinners on every interaction or be stopped from doing what they want just because the network disappeared. We wanted the experience to be fast, local, and seamless wherever possible.
Merging two different products
First, a bit of background. Google Photos began as two distinct products:
- Android Gallery App for viewing photos stored on your phone
- Google+ Photos, which offered cloud backup with automatic enhancement and easy sharing
We had to deliver both feature sets. The app had to work as a highly performant local app for reviewing photos you just took with your phone’s camera and provide an infinitely scalable syncing cloud gallery.
Going back even further, Google+ Photos was built on top of the older Picasa Web infrastructure. This meant that we couldn’t change some core backend data model decisions within our launch timeline, making it especially hard to generalize offline modifications.
We had to do all this in a matter of months, using as many of the pieces from the existing apps as possible. Competition was heating up in the space.
The Core Challenge: Creating a Unified Experience
Our main challenge was to create a seamless experience while staying true to many of the original products’ goals. This meant that there could be no delays in accessing local content. The app could only do the absolute minimum amount of work to show you the photo you just took on your phone. There was no way we could let a network request block this core on-device experience.
At the same time, we wanted to deliver a seamless experience where you see your cloud and local photos as a single grid of photos. Photos would automatically back up in the background to provide you peace of mind. The grid also had to show your entire photo library, even if it had over a million photos, without any loading spinners. We wanted a magical experience where you could jump to any moment in your life’s history and relive those memories.
Yours truly telling an audience that the spinner is “Your Enemy” in 2015.
The Technical Solution
The unified grid experience had three pillars: backup, cloud metadata down-sync, and local metadata sync.
Backup
The backup service ran in the background. Its job was clear: listen to system notifications when somebody took new photos and try to upload as many photos as possible.
This is relatively simple conceptually, but there were a few important constraints:
- The system gave us limited time to stay awake and upload photos to ensure good battery life. We had to upload as many photos as possible as fast as possible, and we eventually built a custom parallel upload infrastructure.
- We also had to avoid extraneous uploads. The system shouldn’t re-upload everything if you deleted and reinstalled the app. We built a protocol to check for file hashes to detect duplicates and data corruption in transit.
- Backups had to be network-efficient. By default, backup was only done when connected to Wi-Fi, and devices could switch connectivity arbitrarily as users moved around in the physical world. This meant supporting partial uploads if connectivity was cut off or if the device switched from Wi-Fi to mobile networks.
Cloud metadata down-sync
This was the fundamental protocol the iOS, Android, and Backend teams worked on to ensure each device had the user’s entire cloud library available. The client provided an identifier that told the server what version of the photo library it had. The server then returned a paginated result of all the photos that were updated in the latest version of the library.
We made a critical simplifying assumption. Since the largest libraries we were aware of measured in the hundreds of thousands, we believed that storing the metadata for all the photos on the device in Sqlite was reasonable. Rough math for this is as follows: each photo’s metadata was less than 1KB. 1KB times a million photos is roughly 1GB of metadata. Folks who took that many photos tended to be power users with the best phones with large amounts of on-disk storage, so carving out 1GB of metadata seemed reasonable. This decision allowed us to sidestep the complexity of partial sync.
Some advantages of this type of sync mechanism:
- Consistency: We always had a consistent view of the whole library. Trying to build any feature out of the band from the sync mechanism would lead to an inconsistent view of the library that could break the magical seamless experience.
- Convenience: We didn’t have to create new API endpoints as new features were added. Everything just came down in sync.
Local metadata sync
To create a seamless experience where local and cloud photos appear as one grid, we synced the metadata of photos on the device into a similar structure as the cloud metadata in the same SQLite database.
We then created a new synthetic table that was the union of the two libraries. The union key was the file hash of the original file.
Like almost all of the processes described in this post, there were many subtle things to get right. Our team would discover bugs in the underlying OS APIs as implemented by different manufacturers. Computing hashes of files could be too expensive before showing the file in the app. We had to provide extensibility in the app to hardware manufacturers building new types of photos and videos such as VR Photos, time lapses, live photos, etc.
Putting it all in the UI
Even though we now had all the metadata in a single SQLite table, we couldn’t just load it all into memory to drive the UI. We solved this by paging metadata from the database as needed while scrolling. When you scrolled to a page, we’d load the metadata of all the photos in that page into memory, and when you scrolled past it, we’d drop it. However, we kept secondary metadata in memory to enable features like the date scrubber, allowing users to jump to any point in their photo history.
There are many more details here, for example, only loading partial photo metadata for the main grid, date headers, driving zoomed-out library views, caching and loading the photo bitmaps into the grid, seamlessly loading the full photo viewer, etc. But that all gets well past the sync conversation. I am proud of the team’s work in making this experience as fast and local as possible with the heterogeneous quality of Android devices.
Handling modifications
We have yet to discuss deletions, edits, and other library modifications. After all, your photo library is not immutable.
Deletions
Deletions are the most critical modifications one can make for photos. If you delete a photo, it should disappear from your library as soon as possible.
A specific order of operations was needed to provide a fast experience:
- The deletion was immediately added to an offline modification queue
- The UI data model was updated to reflect the deletion.
- When the network was available, which could be immediate, the offline queue was processed.
On the server, the deleted photo was immediately put in the “trash bin.” No future modifications were allowed except to remove it from the trash or eventually timing out the trashed photos for permanent deletion.
We also built a parallel trash mechanism for local on-device photos to ensure the “unified library experience,” even if they hadn’t been backed up.
If a delete was made on another device, we would sync down a tombstone so the device would know to remove the photo from the local data store. If that file was also local to the device, we used the local trash mechanism described above.
Photo Edits
Photo edits could be saved in two ways:
- Save as Copy: This creates a new image on disk or cloud, and the existing backup and sync mechanism would handle this.
- In-place Edits: A list of modifications that could be replayed on the original image.
For in-place edits, we maintained an “edit list” - a sequence of modifications that could be deterministically reapplied to the image. These edits were non-destructive and could be undone at any time by the editor. This seems easy to manage; you add the edit list to the offline modification queue and send it to the server.
But we did have to think through the multi-device experience. Consider this example: A user makes an edit on their tablet, and it immediately goes offline and into a drawer. The user then makes a different edit to the same photo on another device that is online. Months later, when they pull out the original tablet and it comes back online - what happens?
To solve this problem, each photo had an “edit version.” In this scenario both devices start with version 0 for the photo, the edit on the first device doesn’t sync to the server. The user then edits the photo again on a second device that’s online. The second device sends the edit and the version number of the photo it has, 0, to the server. The server then updates the version of this photo to 1. The first device with the old edit now comes back online and sends its edit and version number, 0, to the server. Since the server is already ahead of this version, it rejects the edit. Then, via the normal sync process, the photo is updated to the cloud version of the photo.
Usually, if you’re editing the same photo in place again, it’s because you think the edit on the other device didn’t go through. So, your most recent edit in real life is the most important one. If the other device came back online, then its edit should be rejected because it’s operating on an older interpretation of the image.
We made this product decision based on the likelihood of this scenario. If I could go back in time, I would avoid losing the edits on the old device and create a duplicate image, just in case.
Albums
In addition to editing individual media, users could also organize their photos/videos into albums. The album data was largely a backend construct, having grown out of the original Picasa Web app. One key factor is that every piece of media in a user’s library had two identifiers—the hash of the bytes mentioned above and a server-generated ID that was used to identify a photo in an album.
The backend completely controlled the server IDs—clients couldn’t generate them. To allow offline album creation, the client would create a local ID that would stand in for a future server ID until the backup was complete. Later, when the client was back online and the backup had a chance to complete, the client would receive the server IDs and reconcile them with the albums created while offline. Finally, the client would request the backend to create the album with the received server IDs.
Photo metadata edits
Photo metadata edits like favorites and photo descriptions were handled via the offline modification queue. The process was as follows:
- Enqueue the modification
- Update the local UI state
- When online (which could be immediate), send the modification request to the server.
This was a relatively simple last-write-wins system. For these use cases, we were willing to live with it. For example, a favorite is just a flag that’s either true or false. So, if two devices set it to true, it still ends up being true. For the most part, nothing really breaks. And it’s pretty easy to redo if it gets lost.
For descriptions, ideally, we would have used something like the “edit version” described above to ensure a really old description didn’t overwrite a newer one. This was a very rare operation.
Non-core experiences
Photos is a large, sprawling product, even more so today than when we first built it. Experiences outside the main photo grid, like AI-based search and sharing, were built to work primarily when online. We did some aggressive pre-caching or used synced data when available to minimize the amount of data needed from the network.
Conclusion
The core of the Google Photos sync system worked really well for backing up and viewing a unified photo library. Doing some back-of-the-envelope math to determine how much metadata we could sync led to the critical design decision of syncing the whole library to the device, allowing us to sidestep the thorny problem of partial metadata sync.
However, we took on a lot of the complexity of a bidirectional sync system in the application layer. Given the tight timelines and older immovable backend data models we were working with, we special-cased the most critical modifications: deletes, photo edits, and album creation. Some of the other modifications took a backseat, but these tended to be rare operations.
While this system had its share of edge cases and complexity, the core architecture was surprisingly straightforward. Many of these decisions are common patterns when building offline-enabled apps. The focus on a local-first experience with seamless cloud integration created the foundation for what Google Photos has become today—a powerful, responsive photo management solution used by billions.
I want to give a special thanks to Christian Wyglendowski for reviewing this post. Christian was on the Android team with me when we were rushing to build the original app. This was 10 years ago, and both our memories are a little fuzzy. Hopefully, we got the important details right.
Convex is the sync platform with everything you need to build your full-stack project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.