Big Data holds great promise in helping companies mine the vast quantities of data the collect for new insights on risk, fraud, and especially on the purchasing habits of customers. But along with that promise come the challenges of using customer data in a responsible way.

“Analytics holds dramatic potential to fuel innovation and economic growth. To realize that potential it will be necessary to address the data protection and privacy issues that the use of analytics raises,” says Paula Bruening, vice president of global public policy for the Centre for Information Policy Leadership.

For example, companies now have the power to assemble various pieces of information into a more complete profile of an individual. On their own, those bits of data may not seem very sensitive, but when they can be assembled through powerful analytics tools to discern something more delicate—say a medical condition or a credit history—new guidelines must be established to address the resulting privacy and data security risks.

To this end, the Centre, founded in 2001 by law firm Hunton & Williams, announced the launch of a new pan-industry initiative in September to address the risks raised by analytics in the age of Big Data by developing voluntary guidelines for their responsible use by companies.

The initiative, lead by Bruening, attempts to address the question: “How do we use information practices in a way that really reflects the world of Big Data, but at the same time instilling discipline in the way we use analytics?”

Concern about privacy and the adoption of fair information practices are not new. “We've relied on them for a long time, and they've served us quite well in the world of data protection,” says Bruening. “But the nature of Big Data and the way analytics processes work can really challenge the way we apply fair information practices.”

One unique challenge, for example, is the issue of consent. “There is so much data being not just collected, but also provided through social networks and derived through analytics, that data is really ubiquitous,” says Bruening.

The result is that providing comprehensive notice and obtaining consent in every instance becomes extremely difficult. “So what does that mean when you're using Big Data for analytics?” says Bruening. “If consent is so difficult, how do you make sure that you're still protecting the individuals, even when you're using the information pertaining to them for analytics?”

Identifying the Problems

The Cloud Security Alliance (CSA) Big Data Working Group has identified 10 new technical and organizational security and privacy challenges that Big Data poses, which it detailed in a paper issued last month. Similar to the Centre's initiative, the Big Data Working Group, made up of 30 member companies, also seeks to develop uniform best practices and guidelines on Big Data security and privacy.

The concerns identified in the paper fall into four main categories:

Technical challenges. How do you ensure that the applications which work on Big Data are secure themselves and cannot be hacked? How can you ensure that you protect the Big Data itself, by applying, for example, encryption? How can you ensure that the Big Data platform (the cloud, for example) which you use is secure?

Legal and privacy challenges. How can we provide end users the tools to be masters of their own data, enabling them to decide who has access to what data and for what purpose? How can we ensure and manage data governance and data stewardship of Big Data? How do we monitor who has access to which Big Data, when, and for what purpose?

Data Analytics. “The challenge is whether we can do a better job at analyzing (security) events and come up with better information, which, hopefully, leads to better insights, value and decision making,” says Wilco Van Ginkel, senior security strategist at Verizon, one of the Working Group's member companies.

“The silver lining between what we can, or should do, is becoming more eminent and important than ever, simply because gathering and processing of data is so easy nowadays.”

—Wilco Van Ginkel,

Senior Security Strategist,

Verizon

Consider real-time security and compliance monitoring as an example. Highly-regulated industries, such as financial and healthcare companies, that maintain sensitive personally identifiable information cannot afford to wait three months to understand the data and make decisions based on it.  Detecting in real-time the retrieval of sensitive information—intentional or not—allows companies to timely repair the damage and stop future misuse.

Ethical challenges. Just because Big Data is available for analytics, for example, doesn't mean it should be done. “The silver lining between what we can, or should do, is becoming more eminent and important than ever, simply because gathering and processing of data is so easy nowadays,” says Van Ginkel.

Another ethical challenge is building a common definition around privacy. Some might argue that the definitions around privacy are too stringent to allow any form of analytics. “We have to have a balance between privacy and utility,” says Arnab Roy, research staff member of Fujitsu Laboratories of America, which co-launched the working group with CSA.

“The definitions have to be fine-tuned so they're not so restrictive as to disallow any kind of meaningful analytics,” Roy adds. “At the same time they should have a strong form of privacy preservation in the sense that the owners of the data have reasonable guarantee of anonymity.”

Formulating Solutions

By identifying at least a few of the challenges posed by Big Data, companies are now working toward solutions. At the top of the list is assigning proper ownership and related responsibilities to the data.

A common mistake companies make is that “they don't involve the right data and business owners to define the business case,” says Van Ginkel. Merely implementing Big Data technology doesn't cut it.

DATA SECURITY CHALLENGES

Below is a list of top ten big data specific security and privacy challenges identified by the Cloud Security Alliance.

1.Secure computations in distributed programming frameworks

2.Security best practices for non-relational data stores

3.Secure data storage and transactions logs

4.End-point input validation/filtering

5.Real-time security/compliance monitoring

6.Scalable and composable privacy-preserving data mining and analytics

7.Cryptographically enforced access control and secure communication

8.Granular access control

9.Granular audits

10.Data provenance

Source: Cloud Security Alliance.

“You need business owners, and especially domain experts, to think about the potential value out of all that data,” Van Ginkel adds. “That being said, it is difficult to define a business case if you really don't know what big data means to you.”

The issue of accountability has become “a very important piece of the conversation about data protection, particularly in the last year,” says Bruening.

One way that companies can overcome these security and privacy challenges is to establish “process, with an emphasis on technology strategy,” says one chief privacy officer of a large software company who asked not to be named. In particular, data demands aggregation and anonymization standards. “Big Data guidance starts at chief architect level and cascades throughout the technology leadership,” she says.

For now, companies will continue to develop solutions. The Big Data Working Group, for example, plans to issue future reports that will focus on actual standards for Big Data security and privacy and establishing test beds to help strengthen Big Data cloud platforms.

The Centre, similarly, continues to make progress of its own. “We are going to continue our conversation into 2013,” says Bruening. “We're working on a report that raises these issues, talks about possible frameworks to help us move forward, and figure out what kind of processes we need to have to develop some kind of useful guidance for companies.”

Because Big Data is rather new, Van Ginkel advises that companies start small.  “Get a better understanding of the Big Data technologies, play with a sample set of data, and take it from there.”