Collect, Preserve, Democratize: A Q&A With Democracy’s Library

Every year, the United States federal government spends billions of dollars (original link no longer available) producing research, reports, statistics, and data on almost every topic imaginable. This information is critical for government agencies, policymakers at all levels, public and private organizations, journalists, and citizens. Access to historical government records is necessary to create a more efficient, effective, and accountable system where all stakeholders work together to address current needs with an eye toward the future.

Unfortunately, many highly valuable assets are not readily available and accessible to those who need them the most. Paywalls and disjointed, non-standardized information keep data siloed and difficult to decipher and use.

FFDW supports programs that lower barriers to entry and access to critical knowledge.

Recently, FFDW spoke with Jamie Joyce, US Project Lead at Democracy's Library, an organization striving to collect and preserve our democracy's data in one place, providing a central repository of veritable information for all stakeholders. Read on for more information about Democracy's Library and its quest to empower the future of government.

When did Democracy's Library first realize, "Hey, we have a real problem here!"?

Our first work with FFDW and the Filecoin ecosystem focused on a landscape analysis of government documents. We used the Internet Archive's Wayback Machine to deeply analyze the conditions surrounding important datasets and websites, starting with the federal data catalog. This catalog has over 200,000 datasets from many federal agencies, with important statistics on topics like immigration, student loan debt, key economic indicators, and many others. Federal law mandates that the catalog be maintained and updated in real-time.

We took a stratified sample of 1,000 sets of data from the federal data catalog and ran into a significant issue: 25% of the links were 404'd, meaning they're inaccessible online. The benefit of using the Internet Archive and the Wayback Machine is usually, backups of these pages exist in an earlier format before the link went dead. However, it still presents a problem maintaining up-to-date statistics for use in policy purposes. We realized that there is both a need and an opportunity for the Internet Archive to serve as the infrastructure of access to federal government documents by creating these backups. Everyone wins when we have multiple robust copies of data. We also realized the critical role of the Filecoin network and its distributed protocol in supporting a centralized, queryable repository of information.

If the landscape analysis was Phase One, what comes next?

Another part of our landscape analysis was figuring out what data exists in the ecosystem of government agencies at a federal, state, and municipal level. There are estimated to be anywhere from 630 to 800 federal agencies and 100 to 250 state agencies' websites and archives on top of the numerous municipal government repositories. No internal archive exists to keep all this information in one place and keep everyone apprised of the various places to find information. We realized that government-related websites stretch beyond .gov and .mil, opening numerous other search avenues. Though civic tech stakeholders maintain Githubs with the non-.gov and .mil websites, the amount of information to sift through is immense.

We need to be able to scope out information related to any topic based on an individual's or organization's goals. For example, something specific like nuclear energy policy. There are federal agencies that have relevant data and archives to capture. If the state has a nuclear energy plant, there will be different state agencies with important information and municipalities with even more information. In the recent debate in California over the state's last remaining nuclear power plant, we saw websites explicitly designed for the public utilities to interface with citizens in a town hall format. While that data isn't necessarily governmental or quasi-governmental, it's still highly relevant to the overall debate and policymaking process and needs to be captured and archived.

What is the impact if this information isn't available? Why is this so important?

Depending on the information, there is a cultural, historical, and anthropological need to access data to see where we came from and how we arrived in the current moment to inform what way we will go in the future. When it comes to policy and law, however, we build up on things over time. The US government creates and stores backups of information on policy and regulations in libraries around the country for informational resilience. We need private, public, and governmental coordination to ensure the information resilience is substantial, and has a better chance of standing the test of time and resisting various potential catastrophic events. Without that information, we cannot look back on 250 years of building laws and policy precedents that adapt to the country's needs.

It's also crucial to have a record of lessons learned from past decisions. If we go back to the nuclear energy policy example, what if we lost the information on specific power plants underperforming because of continuously malfunctioning machine components or lackadaisical security procedures? Without those archives, we have no record of where shortfalls must be addressed. Conversely, we have no record of power plants with impeccable performance that could serve as a teaching example to improve plant safety and performance. If we forget our historical memory, it inhibits our ability to move forward strategically: 1) we can't continue to build on precedent, and 2) we won't have a record of things we've learned.

Where do we go from here?

This project is not just about creating backups for their own sake. There is another angle to consider as well. We've spent significant time interacting with stakeholders in civic tech, government tech, watchdog organizations, and more to discuss the importance of having everything in an easily accessible, centralized, and query-able location. Going back to my nuclear energy example one more time, if I want to examine the environmental impact of nuclear power plants, I can't find that information in one place. There might be 15 or more agencies and organizations with bits and pieces of the information I need, which presents two problems to me as a stakeholder: 1) I might not know all of the different organizations, and there may not be one index with them all listed, correlated with their jurisdictions, and 2) even if I find all the archives I need, I don't necessarily have the time to sift through a bunch of disconnected sources.

So, it's critical to have backups of this information and present it in a fashion that enables knowledge transfer and better collective societal management. We also need to consider how new technologies interact with the past. How can we leverage Web3 and decentralization to get non-digitized content online? When you consider web3 and the new knowledge management infrastructure that the Filecoin ecosystem presents, there is an unprecedented opportunity for linked data. When the Mueller Report on Russian election interference went live online, it was a PDF with virtually no live links. How can such a massively important document with so much information not have links? It's not just about access and new tools and services; when we digitize that data and make it computable, we can radically transform the structure of knowledge. We are no longer bound to the four corners of the page, and it's time to consider how to break data free from its constraints.

–

At FFDW, our priority is preserving humanity's most important information. We are proud to work with organizations like the Internet Archive and their Democracy's Library program, dedicated to protecting critical information and making it accessible to anyone who wants to use it. The mission is only getting started, and we are in it for the long haul. To learn more about Democracy's Library and its quest to make government records, research, and data freely and permanently accessible, explore the Democracy's Library collections on the Internet Archive, featuring more than 700 collections and half a million individual documents from over 50 government organizations.