Hello Data Commons community!
We’re excited to share the latest updates for users building their own Data Commons.
First and foremost, we have updated the documentation around building your own Data Commons to reflect all the information below. The setup process has become much easier (under 20 minutes) and can be accomplished on your own computer with Docker.
Data management container
Many partners have requested the ability to support loading large amounts of data into their own Data Commons and integrate data loading into their pipelines.
To enable these, we have introduced a new Docker container for all data management tasks. This container can be run locally or in the cloud. The latter is accomplished by running the container as a standard Google Cloud Run job, with it bringing the benefits of invoking it programmatically, running it for longer for large data loads and having access to the logs for debugging.
We’ll continue to add more data management features to this container, such as creating versioned snapshots, purging old snapshots, data validation, etc. Watch this space for future updates.
Web admin deprecation
With the introduction of the data management container, we’ll be deprecating the web admin page that was previously used for this purpose.
Smaller, faster services container
We’ve made significant changes to the Data Commons services container in terms of image size and build speeds. The image size has been cut in half and build speeds by two-thirds.
The new images are available in the gcr.io/datcom-ci/datacommons-services repository. The prior repository at gcr.io/datcom-ci/datacommons-website-compose will soon be deprecated.
Note that from a build, deployment and development perspective the only change is to the repository location; everything else (env variables, customizations, etc.) remains the same.
Also note that the new services container does not include the web admin page and the data management container will be the only way to load data.
Updated documentation
As noted earlier, the new Data Commons documentation has all the updates mentioned above and more. So check it out and also join the datacommons-community to provide feedback, report issues and stay updated!