Is it possible to connect a breach to specific harm?

The fallout from the Equifax case has already begun. Those of you who may be affected by this breach should read this article to guard against some of the scams that are already being thrown at data subjects:

Beware – the Equifax Scams Are Coming

For those not affected, the article is worth reading to get small sample of the variety of attacks that follow leaked data.

In other news, we find that a lawsuit against the U.S. government over a leak of some 22 million employees’ records has been dismissed (link) on the grounds that the aggrieved data subjects (here represented by their labor unions) have not proven that they were harmed by the breach.

In a normal case of injury or damage, proof of cause and effect is a normal requirement. But how could the plaintiffs possible prove that damaging effects are caused by this particular breach? After all, the government could point out that a given harm was possibly caused by the Equifax breach, or some other. In other words, the traditional rules of evidence and inference are of little use in the age of internet crime.

Stolen data is completely anonymous. When you are attacked, you have no idea where the attacker obtained his data; even the attacker himself may not know the seller’s identity. The data in question might even have been legally obtained; many businesses (such as Equifax) are in the business of collecting and selling data.

Unlike money, data does not need to be laundered; it’s already untraceable. Unlike money, it can be copied and distributed without limit. Exposing one’s personal data on the internet is like exposing one’s body to radiation: the effects are cumulative, and for life.

At the moment the technical, legal, political, and other capabilities are not in place to stop this trend, nor even to slow it down by very much. Our technology is running ahead of our ability to deal with its consequences.



Equifax breach, part 2: is this our future?

The Equifax data breach continues to reverberate in the media, raising various issues that pertain to data security and privacy.

Asymmetry between errors and consequences

These issues present an asymmetry, what risk theorist Nassim Taleb refers to as ‘convexity’ (link), between the degree of negligence on the part of the data owner and the extent of the damage. In options theory, a convex payoff means that, in exchange for a small, defined loss, you obtain the possibility of an unbounded gain. The usual example is an exchange-traded option to buy or sell a financial asset. (Many people also think of lotteries in this context, but the analogy doesn’t apply, since lotteries are artificial, arranged as games of chance with known probabilities and a maximum payout.) Continue reading “Equifax breach, part 2: is this our future?”

U.S. credit-rating agency suffers mega-breach

One of the largest credit-rating bureaus in the United States suffered a data breach in May 2017 (link). This breach, not discovered until the following July, was made public only in September. According to Equifax, the exposed data includes names, birth dates, credit-card numbers, and Social-Security numbers (an important ID number for U.S. citizens and residents), among other things (link).

About half of the adults in the U.S. were exposed (link), along with some 44 million UK consumers (link). Although most of the victims are U.S. residents who will not be protected by the GDPR, there is probably a large number of affected persons across the EU who are U.S. or U.K. citizens, and for whom GDPR protections will apply as of next May. For someone trying to gauge the impact of the GDPR on data-controllers and processors (as I assume you are if you are reading this), the Equifax case poses a number of questions. Continue reading “U.S. credit-rating agency suffers mega-breach”

Considering an end-to-end GDPR solution: Oracle

In this post I will discuss an Oracle presentation on how its product line provides the technical means for GDPR compliance (link). Although I am impressed with the presentation, my main reason for introducing it here is to be able to refer to it when discussing different stages of GDPR measures. Simply reading through this document will give you an idea of the range of technical measures necessary for good compliance at the database level.

In particular, I like the 3-page Appendix at the end of the document, which lists various GDPR articles and the Oracle feature that helps you to comply with each one. If you’re considering a packaged solution, your vendor should be able to present a similar mapping of GDPR requirements to product features. Continue reading “Considering an end-to-end GDPR solution: Oracle”

What parts of the GDPR are most relevant to you?

One way to become familiar with the legislation is to read it from beginning to end. Consisting of 88 pages of PDF and over 55,000 words, the GDPR is not a fast read. Nor, having read it, are you likely to hold it in your head. What if you could skip to the parts that are most interesting for you? This post suggests a simple approach to doing just that. Continue reading “What parts of the GDPR are most relevant to you?”

Whither Agile in the age of GDPR?

“In theory there’s no difference between theory and practice; in practice, there is” – Yogi Berra

Agile and its manifesto – In theory

If Berra’s maxim applies to programming it presents a problem for many modern development operations. In my experience, most companies label their development method as ‘Agile’. Adherence to this method usually involves daily short meetings and some software to make sure that small, discrete tasks and problems are tracked. Here is the Agile Manifesto (link):

Continue reading “Whither Agile in the age of GDPR?”

Sensitive data combinations

This post is my first attempt to tackle the thorny issue of data which is not core personally-identifiable information (PII) but which, in some combinations, is enough to identify an individual. I’ll call this type of data combination-PII (or combo-PII), and such a combination in a specific search a ‘profile’, for this purpose of this discussion.

Combo-PII is reference data that describes living persons

This type of data is usually called ‘reference’ data by database specialists. This is the background data that structures our picture of a person using categories, such as the city and country we live in, our age range, consumer choices (e.g., electricity provider), and similar data. Each of these values, taken by itself, is not enough to identify a person. Many such values taken together can, in some cases, either identify the data subject with certainty, or narrow the number of possibilities enough for subject to be guessed, or combined with other data to produce a match. Continue reading “Sensitive data combinations”

Unrestricted email plus full PII access: recipe for trouble

Storage giant Seagate suffered exposure of the withholding-tax records of some 12,000 employees following a phishing attack.


At the time Seagate noted that there was no evidence that the information had been misused, also known as the absence-of-evidence defense.

Fast-forward a year or so, and the evidence has appeared.


As is usual in these cases, we have a combination of failures, such as: Continue reading “Unrestricted email plus full PII access: recipe for trouble”