In this post I will discuss an Oracle presentation on how its product line provides the technical means for GDPR compliance (link). Although I am impressed with the presentation, my main reason for introducing it here is to be able to refer to it when discussing different stages of GDPR measures. Simply reading through this document will give you an idea of the range of technical measures necessary for good compliance at the database level.
In particular, I like the 3-page Appendix at the end of the document, which lists various GDPR articles and the Oracle feature that helps you to comply with each one. If you’re considering a packaged solution, your vendor should be able to present a similar mapping of GDPR requirements to product features.
There are likely other E2E offerings than Oracle’s. In your case, the one to look at first is the one that requires the least disruption to your environment. Whether your IT shop revolves around SAP, Microsoft, or another product stack, the solution offered by your primary vendor is likely to be the most convenient one.
I cover the Oracle offering here because I have many years of experience with Oracle’s RDBMS server product and have always found it to be ahead of the pack; each new version brings such major enhancements that the mind boggles. I have no experience with many of the capabilities listed in the presentation, but I am confident that Oracle’s system works as advertised.
What follows is a discussion of several features of Oracle’s product stack (hereafter referred to simply as ‘Oracle’). Whatever solution you’re looking at, I would look for a breakdown of what the product covers and how it relates to privacy protection. (Numbers in parentheses refer to the page number in the presentation.)
There is a discussion (10) of the threat posed by “malicious intruders”, pointing out that third-party developers and cloud systems pose risks as well (which Oracle helps to mitigate, naturally). In the discussion of monitoring (17), we see “malicious users” mentioned as well.
This is very good, but as we have seen in previous posts, malicious employees (link) or simply thoughtless managers (link) have accounted for significant personal-data exposures. Oracle may or may not have been able to prevent or mitigate those breaches (say, by limiting the number of records that can be extracted at once (in the BUPA case) or by anonymizing the data (Sweden case).
Finally, monitoring is necessary, but catching the culprit after the data has been exposed is cold comfort. As far as actual data protection, after-the-fact detection should be seen as a mitigation strategy; catching the culprit eliminates the source of the attack, while a post-mortem analysis closes the channel that made the leak possible.
Either way, you should have loss-prevention measures on your evaluation checklist for prospective compliance products; if the chosen product doesn’t provide enough, you’ll need to add the necessary customization effort to your privacy impact assessment.
Next we are given a presentation (11) of Oracle’s automated vulnerability testing, which is explicitly aimed at GDPR’s required privacy-impact assessment. Automated testing includes:
Finding sensitive data – The paper mentions data items like credit-card numbers and national identifiers. I would want to know how well it identifies data that follows no particular pattern, such as personal names, street names, or sensitive reference data, such as designations for policeman, convicted criminal, or disease sufferer, which may render an individual identifiable.
Finding sensitive data that is not in English – Whatever language(s) form the basis for your personal data, you should make sure that the feature works for your data. If it doesn’t, you may be able to enhance the recognition feature by providing custom vocabularies of things like common names (useful not only for personal names but also for streets, at least in Belgium), common nouns (often used as street names), city names, and so forth.
Assessing privilege levels – Checks for more-than-necessary privileges, unused privileges, and so forth. Could be useful in spotting un-closed accounts of departed users, or multiple users sharing an account.
Assessing database configurations – Detects unused accounts, default passwords, profiles, and many other common errors
Assessing database security profile – Provides canned reports supporting security and privacy assessments, which Oracle points out can aid in creating the impact assessment and in reporting to the data protection authority.
The queries run by the automated-discovery module could be written, run, and debugged by by internal IT; in fact, there are probably some database products for which you will have no alternative. But having them already written means more than convenience and time-savings now; you also have the capability for the future, as the suite will be updated to work with new versions of Oracle.
It is worth noting that this kind of analysis, made necessary by the GDPR, is fairly new. In the past, analysis was oriented more toward data models, metadata, integrity constraints, and other technical features of the database. The actual content was only examined when data-quality problems were at issue.
But now, with GDPR, the ordinary database enters into the realm of semantic analysis; is this piece of data personal? Is it personal by itself (e.g., a person’s name), or only in combination (link) with other data (e.g., a person’s IP address)? This kind of analysis is part of how big-data-style analytics decide whether a movie review is positive or negative, or what an article is about.
In other words, GDPR forces us to raise our game.
(12-16) Here we see a long list of measures that help to prevent attacks: encryption (at-rest and in-transit), on-the-fly redaction/anonymization, identity verification, label security, virtual private database (access to rows is transparently based on the value of a sentinel column), and restrictions on privileged users, such as administrators.
I introduced the concept of in-flight masking in an earlier post (link), but it was just a minimal concept, to show that static data classification is not enough. Implementing my approach in production would be a major programming challenge, with much trial-and-error learning along the way.
I won’t go into detail here as I have not used any of these features myself, but as a former database administrator I am impressed that Oracle has a solution for the problem of administrative access. This is not an easy thing to do with an in-house or custom solution.
Oracle also boasts a ‘database firewall’. Although I have not yet see how it works, its name implies that, even should an intruder manage to obtain operating-system access, he or she still faces access barriers erected by the database server itself.
Oracle’s database auditing feature, which has a long history of reliability, can provide you with a complete audit trail in case you need to reconstruct a chain of events or transactions from the past. This can be useful in case you need to trace:
the transactions affecting a data subject (e.g., to show how and when the subject’s data entered your system)
the transactions used to purge a data subject’s information (i.e., to prove that you did in fact perform the requested action)
the sequence of events surrounding an attempted or successful breach (e.g., in order to perform a post-mortem)
Oracle’s Vault and Database Firewall products can watch the activity flow on multiple databases and generate alerts (or even blocking actions) if unauthorized or against-policy SQL statements. These products also supports Oracle’s MySQL plus other RDBMS servers, including those from IBM, Sybase, and Microsoft.
My guess is that most successful attacks are preceded by a number of unsuccessful attempts. Monitoring at Oracle’s level can alert you as soon as the first attempt is made. This protection, valuable in itself, also helps you to demonstrate your commitment to GDPR compliance in case of an audit.
Finally, I should add that Oracle provides a dashboard-type application (Enterprise Manager) to enable administrators to manage all of these capabilities from a single desktop.
If the unthinkable occurs and someone obtains your database’s files, Oracle has the ability to encrypt them in a transparent way, meaning that the consuming applications need not have their code altered to handle decryption (or even be aware of the encryption). As Oracle states:
TDE [transparent data encryption] encrypts data automatically when written to storage including backups, data dumps exports, and logs. Encrypted data is correspondingly decrypted automatically when read from storage. This automatic encryption-decryption capability at the database layer makes the solution transparent to database applications. Access controls that are enforced at the database and application layers remain in effect. SQL queries are never altered, and hence no application code or configuration changes are required. Oracle Database comes pre-installed with TDE and can be enabled easily.
The securing of database files (e.g., backups, exports (dumps), audit files, and recovery logs) is a serious challenge, one that I have yet to see addressed anywhere. Although these files cannot be queried, an expert is potentially able to extract any data that resides in the database.
With Oracle’s GDPR solution you have and end-to-end encryption solution managed by a single vault for encryption keys. This is not an easy thing to get right using an in-house solution, or one that is not integrated into your database product.
Encryption has tangible benefits. If there is a breach and only encrypted data is exposed, you may not be required to make a breach notification under GDPR Article 34.3 (a), which states:
3. The communication to the data subject referred to in paragraph 1 shall not be required if any of the following conditions are met:
(a) the controller has implemented appropriate technical and organisational protection measures, and those measures were applied to the personal data affected by the personal data breach, in particular those that render the personal data unintelligible to any person who is not authorised to access it, such as encryption [emphasis added]
Other useful Oracle features
I have a few personal favorite features of the Oracle RDBMS which can be useful for GDPR compliance. All of these are, as far as I know, included in the standard license for the database server.
Powerful stored procedures
In Oracle a stored procedure can provide a virtual ‘parametrized’ view. You could, for example, have a view of a customer’s transactions which would require a customer id and deliver the records for that customer only. Suppose, for example, you follow best practice and allow to applications no direct access to your database tables, requiring them to use views instead. Suppose further that an intruder attempts to execute a multi-record query against your view, say, by providing a list of customer numbers. Your stored procedure (which appears to be the view) can raise an alert and capture state information which can help to identify the circumstances around the attempt.
This strategy enables you to avoid exposing the ability to query a large set of customers or transactions, as well as the ability to provide any custom pre-processing you may need.
Oracle provides the ability to trace changes to table data in the same way that programmers can trace the evolution of code using version-control products such as SubVersion, Clearcase, and Git. While perhaps not feasible for every table in your database, it is definitely usable for your data-inventory schema (link) and will allow you to trace all changes and to view the data as it was at any point in the past.
If your environment is like the ones I usually see, where uncounted files (mostly PDF and MS Office) lie around in shared folders, Sharepoint, and similar repositories. These repositories make it difficult to search the contents of the files. Oracle Text (formerly called ConText) enables you to search such material in a very flexible way (comparable to, say, LEXIS); you can do things like fuzzy, stemmed, proximity, and regular-expression searches.
Let’s face it: your GDPR compliance effort will involve a massive document shuffle. You will not only need to document every important thing you do (from impact assessment to requirements to modeling, test plans, contingency plans, checklists, and so forth), you will also need to be able to find the relevant documents quickly and with maximum search flexibility. Oracle Text provides this in a high-performance package.
Publish & Subscribe
Suppose that the application that manages your customers receives an update to that customer’s personal information. Your data inventory needs to be notified of the change and to apply it (after all, when that customer comes and asks for a purge or an export, how will you know where all of her data is without a central repository?).
Oracle has its own publish/subscribe mechanism, which used to support only Oracle databases but now supports competing DBMS products as well. You may choose another means of doing publish/subscribe, but Oracle’s is there if you need it.
What will it cost?
This is the first (and possibly last) question management will pose about the Oracle stack. I have no idea, but I imagine that the price tag for this is high, and could induce sticker shock. But before ruling out an end-to-end (E2E) solution, whether Oracle’s or someone else’s, I suggest you consider that with an E2E solution you have:
not only the components you need, but also the assurance that they will work together. I have seen many project falter due to problems stitching tools from multiple vendors together
continuous patches and updates from the vendor, which is not only reduces costs and delays, but will also go a long way toward convincing the data-protection authority that you have made every effort at compliance and security
a line of support, should you need it, along with consulting services
But beware of trying to do it yourself, either with custom development or by cobbling freestanding packages together. Wherever I go I see examples where application A has a periodic requirement to send bulk data to application B. A writes a plain-text file, which is later read and processed by B. Anyone else who has access to the operating system can also read that file, which is clearly an unjustifiable risk if personal data is involved.
What do implementation delays, re-work, bug fixing, and so forth cost your business? You can’t put a money value on that, not even after the fact. Indeed, you could spend a lot and have nothing to show for it; the project might be so late or unsatisfactory that it is cancelled.
Chances are, if you knew the true cost of doing it yourself, it would come in as not only much slower, but also more expensive. I don’t know about you, but I am more afraid of unknown costs, with an unknown limit, than I am of a high cost attaching to a high likelihood of success.