Bay Area Residents’ Private COVID Emails Secretly Harvested for AI Training

Thousands of private emails sent by Bay Area residents to their local officials during the COVID-19 pandemic have been swept up in a large-scale data collection effort for AI training. Mountain View-based company GovernmentGPT filed 90 California Public Records Act requests to obtain emails sent to mayors, council members, and city clerks between 2020 and 2023.

The requests targeted cities including San Jose, Santa Clara, Mountain View, and Milpitas. Each city received different levels of detail in the requests, with Milpitas getting a 10-page document outlining specific data needed. All requests focused exclusively on the COVID-19 pandemic period.

GovernmentGPT plans to use these emails to train an artificial intelligence tool called CivicVoice. This AI is designed to summarize residents’ opinions and public comments. The company says it wants to enhance public participation and help understand top issues faced by residents during the pandemic.

CEO Raj Abhyanker describes the service as a way to make public engagement easier through AI. His company specializes in “civic AI” applications that aim to advance data-driven governance.

Privacy advocates worry that many residents don’t know their emails to public officials could be used for AI training. While these emails are legally accessible through public records laws, people don’t typically expect their communications to be used this way.

When citizens write to officials, they don’t imagine becoming unwitting contributors to corporate AI datasets.

The requested emails contain discussions about local concerns, pandemic responses, and public health measures. They include personal experiences, complaints, and feedback about city governance during COVID-19. The company didn’t request internal city documents, only communications from residents.

Current privacy laws don’t clearly address using public records for AI development. This collection effort highlights the critical need for balancing patient care and data security in healthcare technology applications. This has sparked debate about civil liberties and ethical boundaries when repurposing public communications for commercial AI applications. Ethics expert Brian Green has raised concerns about the exploitation of residents through the use of their data for purposes they never intended.

The collection represents a significant trove of real-time, first-hand accounts of resident sentiment during a major public health crisis, now being repurposed for proprietary AI development.