My internship: implementing Vertex AI Search into the Data Commons stack

Google’s internships are an important part of our culture of building for everyone. Internships are designed to be more than just a summer job; it’s an opportunity to tackle real-world challenges and make an impact. Interns work alongside full-time Googlers, contributing to the helpful products and services that people use every day. While they gain hands-on experience and grow their skills under the guidance of dedicated mentors, we benefit from their curiosity and new approaches to problem-solving. To learn more, visit our Google Careers site, set up job alerts, and apply when applications open in the fall.

Q. Meet our intern Alyssa Guo. Tell us about yourself!

I’m currently studying Computer Science at the University of Waterloo, in Canada. I had the amazing opportunity to intern on the Data Commons team this summer. I’m excited to share some details about my internship experience, and the new feature I had the chance to build out. It’s been an amazing summer filled with many exciting learning opportunities. 

Q. Share highlights of your internship project. What opportunity were you solving for?

The search tool for the Data Commons statistical variable explorer, which helps analysts navigate over 250,000 variables, provided a suboptimal experience. The core issue was that simple queries returned either no results or results that were too broad to be usable.

For example, a search for the word “population” would yield over 1,000 similar variables, such as “Population Aged 15 Years or Older” and “Population Aged 16 Years or Older,” making it difficult for users to find what they needed. This issue stemmed from the tool’s original design, which was built using a simple trie for word matching.

Over the course of my internship, I worked to solve this by incorporating Vertex AI Search Applications, an AI-powered product from Google Cloud, into the platform’s Natural Language (NL) search stack.

Figure 1. Search results on the statistical variable explorer tool for “total population” before integrating with Vertex AI.

Q. What is the solution you helped implement? 

This is where Vertex AI came into play. By loading our statistical variable metadata (which includes a Data Commons ID, name, the measured property, statistic type, population type and constraint properties) into a Vertex AI search application, and augmenting the existing metadata with alternative sentences generated using Gemini, I was able to launch a new search tool that showed a 56% improvement in search results, along with typo correction and semantic search. The integration is now live, and it’s been implemented for the benefit of all users going forward. 

Figure 2. Improved search results on the statistical variable explorer tool for “total population” after integrating with Vertex AI. Searching now features typo-correction too! 

On top of the exciting improvements we’ve been able to make to the statistical variable explorer, Vertex AI also provides many opportunities for improvement in our NL. For instance, early stage evaluations show it being on-par with our in-house NL-powered vector search. This is the search tool that powers the global Data Commons search bar on our landing page, and is currently built with a combination of third-party models and in-house algorithms, owned and regularly maintained by the Data Commons NL team. The future possibilities of integrating Vertex AI further into Data Commons are promising! 

Q. What are some of the key learnings you are taking away from this internship, whether project or experience-related? 

This internship provided valuable hands-on experience with Vertex AI search applications – which will be a significant asset to my resume. I was also encouraged to explore AI tools like Gemini CLI, gaining a practical understanding of how they integrate into professional workflows. Above all, the most meaningful aspect of my experience was the opportunity to contribute to an open source, social good platform such as Data Commons. Having a direct impact on its functionality was incredibly rewarding, knowing that it will, in turn, help analysts, activists, and policymakers benefit even more from publicly available data.

Discover more from Data Commons • Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading