Facebook’s DeepText: Understanding the legality of AI

Is Facebook violating your privacy by using DeepText to read through your Facebook posts and updates? Definitely. Can they legally do it? Probably – because users have consented to it. Should you be worried? Probably not; DeepText is neither truly intelligent nor human. (You should, however, be worried about advertisers and how Facebook will promote them.)


Facebook has always been at the forefront when it comes to complaints regarding breach of privacy. Just a few weeks back, a lawsuit was certified for class action in the US where major complaints have been made regarding the collection of URLs from private messages sent amongst Facebook users.[1] The allegations are that these URLs are collected so that Facebook can generate suggestions for other users, and so that third parties can provide targeted suggestions to the users adding those URLs. Following on the heel of this class action certification is the recent news regarding Facebook’s rolling out of their AI – creatively called DeepText, since it deeply analyzes texts with ‘near-human accuracy’.[2]

This post aims to take a closer look at what DeepText is actually doing, and frame it according to privacy laws in the EU. The reason for this is that EU data privacy law is far more paranoid than that in any other country/region; most legal objections will probably arise (and be held valid) here. We will look at whether existing laws can be molded to fit the use that Facebook is putting its so-called Artificial Intelligence to, or if contradictions may arise.

The DeepText AI

Intelligent – Really?

The first thing to get out of the way is that DeepText is not, in any form, actually intelligent in the way humans unconsciously think of intelligence. AI does not mean something that is, in a sense, aware of its own existence[3] (the old adage, “I think therefore I am”). It now generally refers to techniques that allow the computer (or in this case, massive servers and supercomputers) to make correlations between data on its own (“machine learning”).[4] To make it clear, current iterations of AI are not self-aware and will not try to take over the world – we’ll leave that up to future versions of AI. Adhering to non-technical lingo, I will be using the terms ‘machine learning’ and ‘AI’ interchangeably – but keep in mind that these terms are not technically interchangeable.

Data Analysis

So what does DeepText, the dumb machine that we need not be afraid of, actually do? Facebook makes it pretty clear in a couple of papers: here[5] and here[6]. The technology forms its basis in a paper written by people from Facebook AI Research.[7]

The paper itself is pretty technical, but a general readthrough sheds light on their methods – which, essentially, boils down to the creation of a basic data set that would allow a computer to figure out patterns in the data already sorted out by humans. In their paper, they used a thesaurus, among other things, to create their base library, and then ran their code through massive test databases such as Amazon.

Training an AI

The vocabulary around machine learning is quite confusing for the layman; let alone the use of the term ‘artificial intelligence’, experts and researchers refer to their work as ‘training’ or ‘teaching’ the computer. This makes it very easy to imagine the computer as human; while it is anything but. These terms actually refer to running machine learning ‘algorithms’ through ‘test sets’[8] – small databases that have already been sorted by humans. This allows researchers to check whether the correlations being made by the computer are actually correct; for example, whether a computer is accurately identifying a kitten in a picture or not. Regular check-ups in this way allow researchers to ‘test’ or ‘train’ their machine and make course corrections.

Once this is done, the AI is let loose to find correlations among massive databases. This is truly where machine learning shines; and shines in a way that humans could never have expected.[9]

Data Collection

Facebook states that it will use information generated by users and stored in its data centers to ‘better understand people’s interests’, jointly understand picture and text, and through ‘neural networks’, figure out contextual dependencies between words used by Facebook users. Knowing Facebook’s general inclination to use any and all data available,[10] I believe it is quite likely that the same principle will be used for DeepText as well – for the purposes of this post, I will assume that everything posted on Facebook will be used as a database that allows the AI to make correlations (since this is the worst case scenario from a privacy perspective).

“Near Human Accuracy”

Again, to clarify – the word ‘near-human’ is so vague that it is essentially a marketing gimmick here. It is ‘near human’ in the sense that it is closer to the way a human analyzes and makes correlations between data sets, when compared to a regular computer. A truly incredible achievement, but one need not believe that the AI has the capacity of an actual, intelligent, human being in its ability to analyze data. It is simply nowhere near as incapable as the PC/Mac/mobile phone that you, dear reader, are using to browse this website.

The Legal Perspective in the EU

Let us now focus on applying the general principles of data privacy in the European Union to “AI”.

As always, the following ingredients are involved when dealing with European privacy law per the General Data Protection Regulation (GDPR):[11]

  1. Lawfulness, Fairness and Transparency:

The lawful processing of personal data requires that there be adequate grounds for processing such data, which are provided under Article 6(1) of the GDPR. Two processing grounds stand out, in my opinion, that could allow Facebook to use DeepText:

  • Under Article 6(1)(a), which states: “the data subject has given consent to the processing of his or her personal data for one or more specific purposes”. It would be pretty easy for Facebook to update its terms and conditions to ensure that users are providing their consent for their data to be analyzed by DeepText. Indeed, Facebook’s “Data Policy” blatantly states: “We conduct surveys and research, test features in development, and analyze the information we have to evaluate and improve products and services, develop new products or features, and conduct audits and troubleshooting activities.” Facebook’s Data Policy, essentially, already gives them the right to use DeepText.
  • Under Article 6(1)(f), which allows processing of personal data for the purposes of legitimate interests pursued by the controller (that is, Facebook). This particular ground is a subject of much debate. As discussed in a previous post in the Google Spain case, the use of this ground is quite bespoke.
  1. Purpose Limitation:

This principle is covered under Article 5(1)(b) of the GDPR, and states that personal data should be collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes. There is even an exception when it comes to use for scientific or statistical purposes – it could be clearly argued that machine learning research falls squarely within this ambit.

  1. Data Minimisation[12]:

Information must be ‘adequate, relevant and limited’ to what is necessary in relation to the purposes for which data is processed. Now, with regard to machine learning, one would need to define the minimum requirements for a database, or even a test set; after all, the point of machine learning is to use as much data as possible to come up with significant and interesting correlations. I will be very interested to see how courts determine this one.

  1. Security, Integrity and Confidentiality:[13] Data must be processed in a manner that ensures protection against unauthorized use. I will not delve into this further because the concern when it relates to AI is not unique. Facebook needs to follow this principle anyway.
  2. Accountability:[14] This one was not present in the old Data Protection Directive; essentially, this places responsibility on Facebook to demonstrate that it can comply with all of the above.

All together, it seems Facebook has quite a few recourses to using DeepText to provide their services. This does not, of course, mean that this ability will not be heavily contested, which brings us to the next point.

Possible Legal Objections to Facebook’s use of DeepText:

EU data privacy laws are quite nuanced. In the case of machine learning, it is quite possible that Article 22 of the GDPR (Automated individual decision-making, including profiling) would apply. This Article requires data controllers to give ‘data subjects’ the right “not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.” This, however, does not apply in case there is explicit consent – simply agreeing to Facebook’s Data Policy may not be considered as sufficient consent in this case.

Further, Article 21 of the GDPR gives data subjects the right to object in cases where the processing ground is ‘legitimate interest’ (discussed above), or is used for direct marketing purposes (which DeepText is definitely being used for). This could be used as grounds by an EU resident to object to the use of DeepText.


This post only begins to delve into the legal issues involved in AI, or machine learning. Even Facebook is only one amongst many users of this cutting-edge tech. Google, for example, has been quietly using machine learning for a long while and has even termed itself “AI-first”.[15] Machine learning is, by its very nature, exponential – the more the data and more the time, the better its correlations will be. It is very difficult to predict where this will take us. Indeed, it has been speculated that soon, programmers would have to turn into computer psychologists of a sort, since the mathematics involved in machine learning correlations are so difficult that few will be able to master them.[16]

At any rate, I personally sit with bated breath to see where this technology takes us; and hope that Elon Musk is wrong.[17]


[1] The Verge, Lawsuit claims Facebook illegally scanned private messages, http://www.theverge.com/2016/5/19/11712804/facebook-private-message-scanning-privacy-lawsuit; last accessed: 7/6/2016

[2] Facebook, Introducing DeepText: Facebook’s text understanding engine, https://code.facebook.com/posts/181565595577955; last accessed: 7/6/2016

[3] Promoted by Douglas Hofstadter in his excellent book, Godel, Escher Bach. Read a short review here: http://www.techinsider.io/godel-escher-bach-hofstadter-artificial-intelligence-2015-10; I would strongly recommend reading this book if you are interested in AI.

[4] Wikipedia, Machine Learning, https://en.wikipedia.org/wiki/Machine_learning; last accessed: 7/6/2016; actually pretty technical; I’d recommend looking into its citations.

[5] Facebook, Introducing FBLearner Flow: Facebook’s AI backbone, https://code.facebook.com/posts/1072626246134461/introducing-fblearner-flow-facebook-s-ai-backbone/; last accessed: 7/6/2016

[6] See endnote 2

[7] Xiang Zhang, Yann LeCun, Text Understanding from Scratch, Cornell University Library; available at: https://arxiv.org/abs/1502.01710; last accessed: 7/6/2016

[8] Test set, Wikipedia, https://en.wikipedia.org/wiki/Test_set; last accessed: 7/6/2016

[9] 10 Surprising Machine Learning Applications, http://www.lauradhamilton.com/10-surprising-machine-learning-applications; last accessed: 7/6/2016

[10] BBC, What is Facebook doing with my data?, http://www.bbc.com/news/magazine-34776191; last accessed: 7/6/2016

[11] Article 5, General Data Protection Regulation (GDPR)

[12] Article 5(1)(c) of the GDPR

[13] Article 5(1)(f) of the GDPR

[14] Article 5(2) of the GDPR

[15] Siliconbeat, Google CEO talks AI-first world — where will the hardware go?, http://www.siliconbeat.com/2016/04/29/google-ceo-talks-ai-first-world-will-hardware-go/; last accessed: 7/6/2016

[16] Wired, The End of Code, http://www.wired.com/2016/05/the-end-of-code/; last accessed: 7/6/2016

[17] The Verge, Elon Musk says artificial intelligence is ‘potentially more dangerous than nukes’, http://www.theverge.com/2014/8/3/5965099/elon-musk-compares-artificial-intelligence-to-nukes; last accessed: 7/6/2016

If you found this helpful, don't forget to share:

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top