Modernizing Voter Roll List Management

Mar 22

We often hear election officials lament, and quite rightly, that there is more that they could do to increase voter confidence and do their jobs better, but they don’t have the funding. That’s not surprising to hear, because it’s no secret that U.S. elections are far more complex than elections anywhere else, and less well funded than in other countries with simpler elections.

More recently, one of the biggest challenges faced by U.S. elections officials (EOs) is a substantial erosion of public confidence in the verifiability and accuracy of elections. That’s also not surprising, given that EOs currently rely on technology that’s not transparent, is aging-out, and unable to leverage new innovations without starting over, and which has serious cybersecurity issues as well.

Nowhere is that more true than in the accuracy of voter lists. Given recent news about states abandoning ERIC without an alternative in sight (as I will explain below), this has suddenly appeared as a serious issue—serious as in a hand-grenade rolling on the floor missing its pin.

The technology for voter databases is opaque, voter databases have been attacked by foreign adversaries, and the technology in popular use today has inherent vulnerabilities. It’s often pointed out that even if the voting process works perfectly to everyone’s satisfaction, you can still have a lousy election if it looks to observers that the voter lists were significantly wrong.

There is something to be concerned about for anyone on the political spectrum, whether it’s the possibility that some voters were blocked from voting who should have been able to, and the possibility of enabling voting for those who shouldn’t be able to cast a ballot.

Probably ground zero for both of those concerns—and the catalyst for widespread misunderstanding of voter records practices—is the task of voter “list management.”

That’s where I recently noted a complaint from one state election official who seemed caught between two opposites.

On the one hand, he appreciated the help provided by the ERIC system in matching his state’s voter lists to a variety of external data sources; yet at the same time he was no longer confident in the process of sharing data with ERIC, and was stopping its use.
On the other hand, he wanted to do similar list matching within his own organization, but was blocked there as well. Some of the data is expensive, such as citizen death records from the Social Security Administration, or change of address records from the U.S. Postal Service, and more-so from private data brokers. (Let’s not even start down the path of why the U.S. Federal Government charges each state for data they need to run elections.)

If you’d like to better understand ERIC, look here for a good explanation, but to finish this point, here’s what the election official told the Washington Post:

“I can buy the [change-of-address and Social Security records]. But what I can’t do is spend millions of dollars to create matching software to match the information in those files to my voter rolls.”

It’s actually a bit worse regarding the custom software cost issue: in addition to the external data sources, state election officials must also match their voter lists to intrastate sources, such as their own department of motor vehicle records (another source for change of address) and their department of corrections (to suspend the ability to vote for those serving felony jail time).

That Election Official is both right and wrong in his complaint.

He’s right in pointing out the typical high-cost of developing custom software for his state (or any state), to match his state’s voter lists to all the other people-lists.
He’s right under the assumption that such development would be from scratch, and would follow the typical state’s technology bidding, procurement, acquisition, and contracting processes, which (though providing other important benefits of transparency and competitive bidding) pretty much guarantee projects are (too) large and (overly) expensive.

But he’s potentially wrong if there is a scenario where that assumption is false.

Before describing that scenario, let’s quickly address something that some readers might be wondering.

How hard is it really to do a list match?
Why should it be so expensive?
One might be imagining a spreadsheet with rows of people and columns of names and addresses and other data, and a similar 2nd spreadsheet with rows of people and columns of names and addresses and some other data different from the 1st spreadsheet.
How hard is it to merge them and make a list of people who appear in both lists, but not identically?
For example, suppose same name and birthday, but different address—how hard can that be?

The answer is: not very difficult.

In fact, prior to ERIC (which started in 2012), in 2005, there began a bi-annual data sharing event between a couple dozen states—an interstate voter records exchange—where one state, Kansas, collected voter records data from the other states, and did a pair-wise comparison of each state’s records to every other state’s records, providing back data on matches. What each state did with that match data was up to them. The tech-side of the matching process was some software developed by a couple of people and ran on an ordinary PC workstation of early 21st century vintage, mostly using typical off the shelf software from Microsoft. The collaboration and comparison software became known as Interstate-Crosscheck.

It wasn’t perfect for several reasons, but mostly it was limited to comparing voter records to voter records, rather than comparing voter records to a variety of other kinds of data that isn’t shaped exactly the same. For a solution that meets all of a state’s list matching needs, there are some parts that are not straightforward, for example: defining specific rules that combine potentially state-specific regulations on what constitutes a full or partial match, with some context for the nature of the data being matched against. It’s not overly difficult, but it does mean that there is no “one-size-fits-all” voter list matching system that works for every state and every external data source “out-of-the-box.” Hence, therein lies the assumption that the solution has to be custom software for each state. Maybe so. Maybe not so.

A Public-Technology Approach

Now, consider an alternative scenario, where there is public non-proprietary (i.e., open source) software for voter list-matching.

The software exists, so it does not have to be developed from scratch on behalf of a requesting jurisdiction.
The software is public technology, so it is not owned by a commercial entity, thus an adopting state, county or other jurisdiction has no licensing costs to acquire and use the software—although it will have costs for operating and using it, and could have costs for local adaptation and integration (as we say, it is free as in “free puppy” not “free food”).
For any given state, county or other adopting jurisdiction, the software likely won’t work exactly to the state’s requirements initially upon download, but it can be “parameterized” to meet the requirements—that is, not making changes to the software itself, but changes to the parameters that inform the software of the rules for a particular kind of say, partial match, for example. Other kinds of adaptation may also be necessary, for ingesting state-specific data formats, or modifying the kinds of reports and output data that fulfills a state specific requirement.

Yet, a state’s election operations team may not have the capability or desire to make these adaptations in-house, or to operate the software in-house. And most states rely on an IT services provider of some kind, to support the operation of voter records management systems. How does that factor into this alternative story? Well, pretty straightforward: that’s where the state’s IT services procurement comes into play, to retain those services, but without the pricey up-front custom software development costs, and without paying license fee for the right to use the software, much less the ability to adapt it.

How real is this scenario?

It turns out, at the OSET Institute’s TrustTheVote® Project, we’ve been working on voter registration technology for well over a decade, and that includes voter list-matching functionality— involving serious innovation (such as fuzzy-logic or '“approximate string matching,” logistic regression and probilistic matching, and more). We even went so far as to work on a standalone solution, which could be adopted alongside a state’s existing voter records management system, rather than requiring extensive modifications. Part of that was work we participated in, and even led portions of, for the development of Federal standards addressing common data formats for voter records. [Ed. Note: see 4th author listed 🤓]

In prior years, there was not an urgent demand for such advanced technology. Many states were satisfied with in-house methods of comparing their state’s voter records with other in-state voter records, with occasional forays into using external data. Some relied on the collaborative approach I mentioned earlier called Interstate Crosscheck. Later, an alternative system (and from a technical standpoint, better software—code-wise) “ERIC” offered a central clearinghouse approach to regular matching with a valuable set of external data. Crosscheck became less popular and suffered software aging, eventually being shut-down in 2019 for security purposes. Now, remarkably and more recently, many states are souring on ERIC, for reasons you can read about elsewhere.

This is potentially a significant problem: for states that have (for whatever reason, I won’t go down that rabbit-hole) cancelled their ERIC subscription, what do they do now? Seriously: the Crosscheck service was shut down years ago, and now some states have abandoned ERIC. At last check (someone update us here if we’ve missed it) there is no alternative to CrossCheck. So, how do they intend to ensure the quality and integrity of their voter rolls? Whether you’re concerned about registrants being errantly dropped or concerned that people who should be removed remain or worse, hard-to-detect tampering occurs from an external malfeasant source, without any data matching service to provide the necessary hygiene, you are exposed to whatever may happen. This seems like an avoidable reversion after years of improving election security.

Well, all may not be lost. Today, the value of highly flexible list matching has been demonstrated. And based on the lack of any system to perform this level of hygiene, some states need a way to acquire it on a “do it yourself” basis—without actually having to do it themselves (or spend millions to do so). Moreover, the politics (which we steadfastly avoid) have evolved as well. Today, much larger numbers of people are far more concerned about:

inaccuracy of voter lists,
insider malicious activity to alter them in an unauthorized manner, and/or
external cyber-attacks including ransomware,

…all fraught with possibilities for voter fraud and for disenfranchisement.

Given this changed landscape, it is clear there is far more impetus for better voter list management practices that:

cast a broader net of information,
are more flexible to avoid false positives,
tailored for humans-in-the-loop, and
able to support the kind of transparency that only a few states provide,

…all in order to be transparent about changes in voter lists.

These are all the requirements that we started with early on (circa 2012, about the same time the ERIC project was in gestation), and it seems that the world is finally catching-up with our objectives of a decade ago. So, it’s time to dust off our prior work on public technology for voter list matching: another of our many software R&D foundry projects, this one we call, PairWise™.

PairWise is a fine complement to another system we’ve built, and are preparing for a pilot called Vanadium—a security prophylactic for vote rolls built in collaboration with AWS. Our Verity matching engine is designed to take the very best of preceding technologies, bring them up-to-date with modern algorithms, data models, data structures, and performance engineering, and make the technology available for states (or ERIC) to update and innovate this important data hygiene step. It will take some more funding to finish it, but a minuscule fraction of the millions of dollars software houses would charge to scratch-build something under contract to a single state. (If interested in our technology, get in touch to seriously geek-out).

If you are, or know of an election official threatening to turn to substance-abuse over the lack of a clear path to higher integrity and more reliable voter-roll verification and list matching, we want to hear from you. Our open-source PairWise technology may be just the controlled substance (er, “solution”) necessary. And the price of the technology itself is extremely attractive, as in public technology.

open-sourcevoter rollselection security

John Sebes

Co-Founder and Chief Technology Officer

Modernizing Voter Roll List Management

A Public-Technology Approach

ELECTIONS NEED AN UPGRADE.
WE HAVE A PLAN.
YOU CAN HELP.

THE WORK

ABOUT

RESOURCES

GET INVOLVED

SITEWIDE SEARCH

Modernizing Voter Roll List Management

A Public-Technology Approach

Elections and the Fractal Fracas of Misguided Innovation

De-Weaponizing Voter Registration News

ELECTIONS NEED AN UPGRADE.WE HAVE A PLAN. YOU CAN HELP.

THE WORK

ABOUT

RESOURCES

GET INVOLVED

SITEWIDE SEARCH

ELECTIONS NEED AN UPGRADE.
WE HAVE A PLAN.
YOU CAN HELP.