November 5, 2019

Mapping Data Flows (41).jpg

Mapping Data Flows John Batelle and scholars
Senior Research Scholar John Batelle (right) was joined by researchers for a presentation on data mapping.
Oil is no longer the number one commodity—data is.


Should we be concerned, then, about today’s data oligarchs—Facebook, Amazon, Google and Apple—amassing insurmountable power? Should we apply antitrust laws that, in earlier ages, led to the breakup of monopolies like Standard Oil? In his new research project, Mapping Data Flows, serial entrepreneur and SIPA senior research scholar John Battelle considers what the future looks like for these giants and just how little control users have over their data.

Battelle shared the work of a team Columbia researchers at an event cosponsored by SIPA's Tech and Policy Initiative, Entrepreneurship and Public Policy Initiative, TMaC and Management specializations, along with the University’s Brown Institute for Media Innovation. The researchers—Zoe Martin MPA ’20, Natasha Bhuta MPA-DP ’20, and 2019 Journalism alumni Matthew Albasi and Veronica Penney—developed an interactive visualization that helps users understand how large technology companies collect, use, and share user information across the internet.

The main question is something all of us are guilty of: Does anyone really read privacy policies or do we just scroll down the page and blindly click “accept” to move on?

“In a given year, the typical American—should they actually care to read and understand the privacy policy they mostly skip—would spend 76 days of that year reading,” Batelle said, adding that such documents can be confusing and often too complex for the average user to understand.

“They are incredibly vague,” Batelle said. “In fact, when Natasha [Bhuta] ran some of these terms of services through some advanced processing software for terms that use the conditional tense, some of them actually hit 60 percent of words that could be interpreted conditionally.”

The project’s aim is to provide awareness to enable a conversation on generating public policy options for the future.

Asked Battelle: “Could we create a way of understanding a governance structure that wasn’t just tens of thousands of words of text?”

Martin’s case study, Say No Evil, but Keep Your Options Open, focused on Apple’s terms in contrast to their messaging on protecting user data.

Apple “may collect, use, transfer, and disclose nonpersonal data for any purpose,” Martin said. “That’s basically everything you do on or with your phone, just with your name and phone number cut out. They can basically use it for whatever they want.

“They can use personal data to create, develop, operate, and deliver and improve their products, services, content, and advertising; which is also pretty much everything that they can do with your data,” Martin added.

Penny manipulated the data visual tool to show how maximizing your Facebook privacy settings changes its effectiveness. To the crowd’s amusement, there was no noticeable difference.

“They are general protections around [personally identifiable information] and Facebook says they won't share personally identifiable information with any outside company for any reason but it still leaves a lot of data up for grabs,” Penny said.

Your personal news is now an announcement on Facebook to companies, whether you post it or not.

“It could tell a company like Target whether or not you are pregnant,” she continued. “That non-personal identifiable information, at least with Facebook, you can’t control it. It can get through no matter what. They pretty much know everything about you.”

Despite this grim reality, Battelle charged SIPA students with finding the policy solution that balances regulation and simultaneously empowers users. He even offered a hint.

“An elegant one-line piece of regulation with about 5,000 supporting pages but the one line is, ‘machine readable data portability,’ that's it,” he said.

Nestled in the demonstration of the website was the irony of the day. Juan Francisco Saldarriaga, of the Brown Institute for Media Innovation, walked the room through the data visualization tool and acknowledged that project relies on Google Analytics to collect data on its site users.

“We are helping Google gather data about you, he said. “We use it so we can understand. We promise not to do anything nefarious with that.

“We can’t speak on behalf of Google with your data. There is a little disclaimer there.”

— Daniel E. White MPA ’20

Mapping Data: How the Largest Tech Firms Use Your Data