Timesdelhi.com

December 12, 2018
Category archive

Aleksandr Kogan

Yet another massive Facebook fail: Quiz app leaked data on ~120M users for years

in Advertising Tech/Aleksandr Kogan/Cambridge Analytica/data breach/data misuse/Delhi/Europe/Facebook/India/Mark Zuckerberg/Policy/Politics/privacy/quiz apps/Security/Social/social media/vulnerability by

Facebook knows the historical app audit it’s conducting in the wake of the Cambridge Analytica data misuse scandal is going to result in a tsunami of skeletons tumbling out of its closet.

It’s already suspended around 200 apps as a result of the audit — which remains ongoing, with no formal timeline announced for when the process (and any associated investigations that flow from it) will be concluded.

CEO Mark Zuckerberg announced the audit on March 21, writing then that the company would “investigate all apps that had access to large amounts of information before we changed our platform to dramatically reduce data access in 2014, and we will conduct a full audit of any app with suspicious activity”.

But you do have to question how much the audit exercise is, first and foremost, intended to function as PR damage limitation for Facebook’s brand — given the company’s relaxed response to a data abuse report concerning a quiz app with ~120M monthly users, which it received right in the midst of the Cambridge Analytica scandal.

Because despite Facebook being alerted about the risk posed by the leaky quiz apps in late April — via its own data abuse bug bounty program — they were still live on its platform a month later.

It took about a further month for the vulnerability to be fixed.

And, sure, Facebook was certainly busy over that period. Busy dealing with a major privacy scandal.

Perhaps the company was putting rather more effort into pumping out a steady stream of crisis PR — including taking out full page newspaper adverts (where it wrote that: “we have a responsibility to protect your information. If we can’t, we don’t deserve it”) — vs actually ‘locking down the platform’, per its repeat claims, even though the company’s long and rich privacy-hostile history suggests otherwise.

Let’s also not forget that, in early April, Facebook quietly confessed to a major security flaw of its own — when it admitted that an account search and recovery feature had been abused by “malicious actors” who, over what must have been a period of several years, had been able to surreptitiously collect personal data on a majority of Facebook’s ~2BN users — and use that intel for whatever they fancied.

So Facebook users already have plenty reasons to doubt the company’s claims to be able to “protect your information”. But this latest data fail facepalm suggests it’s hardly scrambling to make amends for its own stinkingly bad legacy either.

Change will require regulation. And in Europe that has arrived, in the form of the GDPR.

Although it remains to be seen whether Facebook will face any data breach complaints in this specific instance, i.e. for not disclosing to affected users that their information was at risk of being exposed by the leaky quiz apps.

The regulation came into force on May 25 — and the javascript vulnerability was not fixed until June. So there may be grounds for concerned consumers to complain.

Which Facebook data abuse victim am I?

Writing in a Medium post, the security researcher who filed the report — self-styled “hacker” Inti De Ceukelaire — explains he went hunting for data abusers on Facebook’s platform after the company announced a data abuse bounty on April 10, as the company scrambled to present a responsible face to the world following revelations that a quiz app running on its platform had surreptitiously harvested millions of users’ data — data that had been passed to a controversial UK firm which intended to use it to target political ads at US voters.

De Ceukelaire says he began his search by noting down what third party apps his Facebook friends were using — finding quizzes were one of the most popular apps. Plus he already knew quizzes had a reputation for being data-suckers in a distracting wrapper. So he took his first ever Facebook quiz, from a brand called NameTests.com, and quickly realized the company was exposing Facebook users’ data to “any third-party that requested it”.

The issue was that NameTests was displaying the quiz taker’s personal data (such as full name, location, age, birthday) in a javascript file — thereby potentially exposing the identify and other data on logged in Facebook users to any external website they happened to visit.

He also found it was providing an access token that allowed it to grant even more expansive data access permissions to third party websites — such as to users’ Facebook posts, photos and friends.

It’s not clear exactly why — but presumably relates to the quiz app company’s own ad targeting activities. (Its privacy policy states: “We work together with various technological partners who, for example, display advertisements on the basis of user data. We make sure that the user’s data is pseudonymised (e.g. no clear data such as names or e-mail addresses) and that users have simple rights of revocation at their disposal. We also conclude special data protection agreements with our partners, in which they commit themselves to the protection of user data.” — which sounds great until you realize its javascript was just leaking people’s personally identified data… [facepalm])

“Depending on what quizzes you took, the javascript could leak your facebook ID, first name, last name, language, gender, date of birth, profile picture, cover photo, currency, devices you use, when your information was last updated, your posts and statuses, your photos and your friends,” writes De Ceukelaire.

He reckons people’s data had been being publicly exposed since at least the end of 2016.

On Facebook, NameTests describes its purpose thusly: “Our goal is simple: To make people smile!” — adding that its quizzes are intended as a bit of “fun”.

It doesn’t shout so loudly that the ‘price’ for taking one of its quizzes, say to find out what Disney princess you ‘are’, or what you could look like as an oil painting, is not only that it will suck out masses of your personal data (and potentially your friends’ data) from Facebook’s platform for its own ad targeting purposes but was also, until recently, that your and other people’s information could have been exposed to goodness knows who, for goodness knows what nefarious purposes… 

The Facebook-Cambridge Analytica data misuse scandal has underlined that ostensibly frivolous social data can end up being repurposed for all sorts of manipulative and power-grabbing purposes. (And not only can end up, but that quizzes are deliberately built to be data-harvesting tools… So think of that the next time you get a ‘take this quiz’ notification asking ‘what is in your fact file?’ or ‘what has your date of birth imprinted on you’? And hope ads is all you’re being targeted for… )

De Ceukelaire found that NameTests would still reveal Facebook users’ identity even after its app was deleted.

“In order to prevent this from happening, the user would have had to manually delete the cookies on their device, since NameTests.com does not offer a log out functionality,” he writes.

“I would imagine you wouldn’t want any website to know who you are, let alone steal your information or photos. Abusing this flaw, advertisers could have targeted (political) ads based on your Facebook posts and friends. More explicit websites could have abused this flaw to blackmail their visitors, threatening to leak your sneaky search history to your friends,” he adds, fleshing out the risks for affected Facebook users.

As well as alerting Facebook to the vulnerability, De Ceukelaire says he contacted NameTests — and they claimed to have found no evidence of abuse by a third party. They also said they would make changes to fix the issue.

We’ve reached out to NameTests’ parent company — a German firm called Social Sweethearts — for comment. Its website touts a “data-driven approach” — and claims its portfolio of products achieve “a global organic reach of several billion page views per month”.

After De Ceukelaire reported the problem to Facebook, he says he received an initial response from the company on April 30 saying they were looking into it. Then, hearing nothing for some weeks, he sent a follow up email, on May 14, asking whether they had contacted the app developers.

A week later Facebook replied saying it could take three to six months to investigate the issue (i.e. the same timeframe mentioned in their initial automated reply), adding they would keep him in the loop.

Yet at that time — which was a month after his original report — the leaky NameTests quizzes were still up and running,  meaning Facebook users’ data was still being exposed and at risk. And Facebook knew about the risk.

The next development came on June 25, when De Ceukelaire says he noticed NameTests had changed the way they process data to close down the access they had been exposing to third parties.

Two days later Facebook also confirmed the flaw in writing, admitting: “[T]his could have allowed an attacker to determine the details of a logged-in user to Facebook’s platform.”

It also told him it had confirmed with NameTests the issue had been fixed. And its apps continue to be available on Facebook’s platform — suggesting Facebook did not find the kind of suspicious activity that has led it to suspend other third party apps. (At least, assuming it conducted an investigation.)

Facebook paid out a $4,000 x2 bounty to a charity under the terms of its data abuse bug bounty program — and per De Ceukelaire’s request.

We asked it what took it so long to respond to the data abuse report, especially given the issue was so topical when De Ceukelaire filed the report. But Facebook declined to answer specific questions.

Instead it sent us the following statement, attributed to Ime Archibong, its VP of product partnerships:

A researcher brought the issue with the nametests.com website to our attention through our Data Abuse Bounty Program that we launched in April to encourage reports involving Facebook data. We worked with nametests.com to resolve the vulnerability on their website, which was completed in June.

Facebook also claims it received De Ceukelaire’s report on April 27, rather than April 22, as he recounts it. Though it’s possible the former date is when Facebook’s own staff retrieved the report from its systems. 

Beyond displaying a disturbingly relaxed attitude to other people’s privacy — which risks getting Facebook into regulatory trouble, given GDPR’s strict requirements around breach disclosure, for example — the other core issue of concern here is the company’s apparent failure to enforce its own developer policy. 

The underlying issue is whether or not Facebook performs any checks on apps running on its platform. It’s no good having T&Cs if you don’t have any active processes to enforce your T&Cs. Rules without enforcement aren’t worth the paper they’re written on.

Historical evidence suggests Facebook did not actively enforce its developer T&Cs — even if it’s now “locking down the platform”, as it claims, as a result of so many privacy scandals. 

The quiz app developer at the center of the Cambridge Analytica scandal, Aleksandr Kogan — who harvested and sold/passed Facebook user data to third parties — has accused Facebook of essentially not having a policyHe contends it is therefore Facebook who is responsible for the massive data abuses that have played out on its platform — only a portion of which have so far come to light. 

Fresh examples such as NameTests’ leaky quiz apps merely bolster the case Kogan made for Facebook being the guilty party where data misuse is concerned. After all, if you built some stables without any doors at all would you really blame your horses for bolting?

News Source = techcrunch.com

UK report urges action to combat AI bias

in Aleksandr Kogan/Artificial Intelligence/British Business Bank/chairman/cybernetics/data processing/data security/deep neural networks/DeepMind/Delhi/Diversity/Europe/European Union/Facebook/General Data Protection Regulation/Google/Government/Health/India/London/Matt Hancock/National Health Service/oxford university/Policy/Politics/privacy/Royal Free NHS Trust/Technology/UK government/United Kingdom/United States by

The need for diverse development teams and truly representational data-sets to avoid biases being baked into AI algorithms is one of the core recommendations in a lengthy Lords committee report looking into the economic, ethical and social implications of artificial intelligence, and published today by the upper House of the UK parliament.

“The main ways to address these kinds of biases are to ensure that developers are drawn from diverse gender, ethnic and socio-economic backgrounds, and are aware of, and adhere to, ethical codes of conduct,” the committee writes, chiming with plenty of extant commentary around algorithmic accountability.

“It is essential that ethics take centre stage in AI’s development and use,” adds committee chairman, Lord Clement-Jones, in a statement. “The UK has a unique opportunity to shape AI positively for the public’s benefit and to lead the international community in AI’s ethical development, rather than passively accept its consequences.”

The report also calls for the government to take urgent steps to help foster “the creation of authoritative tools and systems for auditing and testing training datasets to ensure they are representative of diverse populations, and to ensure that when used to train AI systems they are unlikely to lead to prejudicial decisions” — recommending a publicly funded challenge to incentivize the development of technologies that can audit and interrogate AIs.

“The Centre for Data Ethics and Innovation, in consultation with the Alan Turing Institute, the Institute of Electrical and Electronics Engineers, the British Standards Institute and other expert bodies, should produce guidance on the requirement for AI systems to be intelligible,” the committee adds. “The AI development sector should seek to adopt such guidance and to agree upon standards relevant to the sectors within which they work, under the auspices of the AI Council” — the latter being a proposed industry body it wants established to help ensure “transparency in AI”.

The committee is also recommending a cross-sector AI Code to try to steer developments in a positive, societally beneficial direction — though not for this to be codified in law (the suggestion is it could “provide the basis for statutory regulation, if and when this is determined to be necessary”).

Among the five principles they’re suggesting as a starting point for the voluntary code are that AI should be developed for “the common good and benefit of humanity”, and that it should operate on “principles of intelligibility and fairness”.

Though, elsewhere in the report, the committee points out it can be a challenge for humans to understand decisions made by some AI technologies — going on to suggest it may be necessary to refrain from using certain AI techniques for certain types of use-cases, at least until algorithmic accountability can be guaranteed.

“We believe it is not acceptable to deploy any artificial intelligence system which could have a substantial impact on an individual’s life, unless it can generate a full and satisfactory explanation for the decisions it will take,” it writes in a section discussing ‘intelligible AI’. “In cases such as deep neural networks, where it is not yet possible to generate thorough explanations for the decisions that are made, this may mean delaying their deployment for particular uses until alternative solutions are found.”

A third principle the committee says it would like to see included in the proposed voluntary code is: “AI should not be used to diminish the data rights or privacy of individuals, families or communities”.

Though this is a curiously narrow definition — why not push for AI not to diminish rights, period?

“It’s almost as if ‘follow the law’ is too hard to say,” observes Sam Smith, a coordinator at patient data privacy advocacy group, medConfidential, discussing the report.

“Unlike other AI ‘ethics’ standards which seek to create something so weak no one opposes it, the existing standards and conventions of the rule of law are well known and well understood, and provide real and meaningful scrutiny of decisions, assuming an entity believes in the rule of law,” he adds.

Looking at the tech industry as a whole, it’s certainly hard to conclude that self-defined ‘ethics’ appear to offer much of a meaningful check on commercial players’ data processing and AI activities.

Topical case in point: Facebook has continued to claim there was nothing improper about the fact millions of people’s information was shared with professor Aleksandr Kogan. People “knowingly provided their information” is the company’s defensive claim.

Yet the vast majority of people whose personal data was harvested from Facebook by Kogan clearly had no idea what was possible under its platform terms — which, until 2015, allowed one user to ‘consent’ to the sharing of all their Facebook friends. (Hence ~270,000 downloaders of Kogan’s app being able to pass data on up to 87M Facebook users.)

So Facebook’s self-defined ‘ethical code’ has been shown to be worthless — aligning completely with its commercial imperatives, rather than supporting users to protect their privacy. (Just as its T&Cs are intended to cover its own “rear end”, rather than clearly inform people’s about their rights, as one US congressman memorably put it last week.)

“A week after Facebook were criticized by the US Congress, the only reference to the Rule of Law in this report is about exempting companies from liability for breaking it,” Smith adds in a MedConfidential response statement to the Lords report. “Public bodies are required to follow the rule of law, and any tools sold to them must meet those legal obligations. This standard for the public sector will drive the creation of tools which can be reused by all.”

 

Health data “should not be shared lightly”

The committee, which took evidence from Google -owned DeepMind as one of a multitude of expert witnesses during more than half a year’s worth of enquiry, touches critically on the AI company’s existing partnerships with UK National Health Service Trusts.

The first of which, dating from 2015 — and involving the sharing of ~1.6 million patients’ medical records with the Google-owned company — ran into trouble with the UK’s data protection regulator. The UK’s information commissioner concluded last summer that the Royal Free NHS Trust’s agreement with DeepMind had not complied with UK data protection law.

Patients’ medical records were used by DeepMind to develop a clinical task management app wrapped around an existing NHS algorithm for detecting a condition known as acute kidney injury. The app, called Streams, has been rolled out for use in the Royal Free’s hospitals — complete with PR fanfare. But it’s still not clear what legal basis exists to share patients’ data.

“Maintaining public trust over the safe and secure use of their data is paramount to the successful widespread deployment of AI and there is no better exemplar of this than personal health data,” the committee warns. “There must be no repeat of the controversy which arose between the Royal Free London NHS Foundation Trust and DeepMind. If there is, the benefits of deploying AI in the NHS will not be adopted or its benefits realised, and innovation could be stifled.”

The report also criticizes the “current piecemeal” approach being taken by NHS Trusts to sharing data with AI developers — saying this risks “the inadvertent under-appreciation of the data” and “NHS Trusts exposing themselves to inadequate data sharing arrangements”.

“The data held by the NHS could be considered a unique source of value for the nation. It should not be shared lightly, but when it is, it should be done in a manner which allows for that value to be recouped,” the committee writes.

A similar point — about not allowing a huge store of potential value which is contained within publicly-funded NHS datasets to be cheaply asset-stripped by external forces — was made by Oxford University’s Sir John Bell in a UK government-commissioned industrial strategy review of the life sciences sector last summer.

Despite similar concerns, the committee also calls for a framework for sharing NHS data be published by the end of the year, and is pushing for NHS Trusts to digitize their current practices and records — with a target deadline of 2022 — in “consistent formats” so that people’s medical records can be made more accessible to AI developers.

But worryingly, given the general thrust towards making sensitive health data more accessible to third parties, the committee does not seem to have a very fine-grained grasp of data protection in a health context — where, for example, datasets can be extremely difficult to render truly anonymous given the level of detail typically involved.

Although they are at least calling for the relevant data protection and patient data bodies to be involved in provisioning the framework for sharing NHS data, alongside Trusts that have already worked with DeepMind (and in one case received an ICO wrist-slap).

They write:

We recommend that a framework for the sharing of NHS data should be prepared and published by the end of 2018 by NHS England (specifically NHS Digital) and the National Data Guardian for Health and Care should be prepared with the support of the ICO [information commissioner’s office] and the clinicians and NHS Trusts which already have experience of such arrangements (such as the Royal Free London and Moorfields Eye Hospital NHS Foundation Trusts), as well as the Caldicott Guardians [the NHS’ patient data advocates]. This framework should set out clearly the considerations needed when sharing patient data in an appropriately anonymised form, the precautions needed when doing so, and an awareness of the value of that data and how it is used. It must also take account of the need to ensure SME access to NHS data, and ensure that patients are made aware of the use of their data and given the option to opt out.

As the Facebook-Cambridge Analytica scandal has clearly illustrated, opt-outs alone cannot safeguard people’s data or their legal rights — which is why incoming EU data protection rules (GDPR) beef up consent requirements to require a clear affirmative. (And it goes without saying that opt-outs are especially concerning in a medical context where the data involved is so sensitive — yet, at least in the case of a DeepMind partnership with Taunton and Somerset NHS Trust, patients do not even appear to have been given the ability to say no to their data being processed.)

Opt-outs (i.e. rather than opt-in systems) for data-sharing and self-defined/voluntary codes of ‘ethics’ demonstrably do very little to protect people’s legal rights where digital data is concerned — even if it’s true, for example, that Facebook holds itself in check vs what it could theoretically do with data, as company execs have suggested (one wonders what kind stuff they’re voluntarily refraining from, given what they have been caught trying to manipulate).

The wider risk of relying on consumer savvy to regulate commercial data sharing is that an educated, technologically aware few might be able to lock down — or reduce — access to their information; but the mainstream majority will have no clue they need to or even how it’s possible. And data protection for a select elite doesn’t sound very equitable.

Meanwhile, at least where this committee’s attitude to AI is concerned, developers and commercial entities are being treated with favorable encouragement — via the notion of a voluntary (and really pretty basic) code of AI ethics — rather than being robustly reminded they need to follow the law.

Given the scope and scale of current AI-fueled sandals, that risks the committee looking naive.

Though the government has made AI a strategic priority, and policies to foster and accelerate data-sharing to drive tech developments are a key part of its digital and industrial strategies. So the report needs to be read within that wider context.

The committee does add its voice to questions about whether/how legal liability will mesh with automated decision making — writing that “clarity is required” on whether “new mechanisms for legal liability and redress” are needed or not.

We recommend that the Law Commission consider the adequacy of existing legislation to address the legal liability issues of AI and, where appropriate, recommend to Government appropriate remedies to ensure that the law is clear in this area,” it says on this. “At the very least, this work should establish clear principles for accountability and intelligibility. This work should be completed as soon as possible.” 

But this isn’t exactly cutting edge commentary. Last month the government announced a three-year regulatory review focused on self-driving cars and the law, for instance. And the liability point is already generally well-aired — and in the autonomous cars case, at least, now having its tires extensively kicked in the UK.

What’s less specifically discussed in government circles is how AIs are demonstrably piling pressure on existing laws. And what — if anything — should be done to address those kind of AI-fueled breaking points. (Exceptions: Terrorist content spreading via online platforms has been decried for some years, with government ministers more than happy to make platforms and technologies their scapegoat and even toughen laws; more recently hate speech on online platforms has also become a major political target for governments in Europe.)

The committee briefly touches on some of these societal pressure points in a section on AI’s impact on “social and political cohesion”, noting concerns raised to it about issues such as filter bubbles and the risk of AIs being used to manipulate elections. “[T]here is a rapidly growing need for public understanding of, and engagement with, AI to develop alongside the technology itself. The manipulation of data in particular will be a key area for public understanding and discussion in the coming months and years,” it writes here. 

However it has little in the way of gunpowder — merely recommending that research is commissioned into “the possible impact of AI on conventional and social media outlets”, and to investigate “measures which might counteract the use of AI to mislead or distort public opinion as a matter of urgency”.

Elsewhere in the report, it also raise an interesting concern about data monopolies — noting that investments by “large overseas technology companies in the UK economy” are “increasing consolidation of power and influence by a select few”, which it argues risks damaging the UK’s home-grown AI start-up sector.

But again there’s not much of substance in its response. The committee doesn’t seem to have formed its own ideas on how or even whether the government needs to address data being concentrating power in the hands of big tech — beyond calling for “strong” competition frameworks. This lack of conviction is attributed to hearing mixed messages on the topic from its witnesses. (Though may well also be related to the economic portion of the enquiry’s focus.)

“The monopolisation of data demonstrates the need for strong ethical, data protection and competition frameworks in the UK, and for continued vigilance from the regulators,” it concludes. “We urge the Government, and the Competition and Markets Authority, to review proactively the use and potential monopolisation of data by the big technology companies operating in the UK.”

The report also raises concerns about access to funding for UK AI startups to ensure they can continue scaling domestic businesses — recommending that a chunk of the £2.5BN investment fund at the British Business Bank, which the government announced in the Autumn Budget 2017, is “reserved as an AI growth fund for SMEs with a substantive AI component, and be specifically targeted at enabling such companies to scale up”.

No one who supports the startup cause would argue with trying to make more money available. But if data access has been sealed up by tech giants all the scale up funding in the world won’t help domestic AI startups break through that algorithmic ceiling.

Also touched on: The looming impact of Brexit, with the committee calling on the government to “commit to underwriting, and where necessary replacing, funding for European research and innovation programmes, after we have left the European Union” . Which boils down to another whistle in a now very long score of calls for replacement funding after the UK leaves the EU.

Funding for regulators is another concern, with a warning that the ICO must be “adequately and sustainably resourced” — as a result of the additional burden the committee expects AI to put on existing regulators.

This issue is also on the radar of the UK’s digital minister, Matt Hancock, who has said he’s considering what additional resources the ICO might need — such as the power to compel testimony from individuals. (Though the ICO itself has previously raised concerns that the minister and his data protection bill are risking undermining her authority.) For now it remains to be seen how well armed the agency will be to meet the myriad challenges generated and scaled by AI’s data processors.

“Blanket AI-specific regulation, at this stage, would be inappropriate,” the report adds. “We believe that existing sector-specific regulators are best placed to consider the impact on their sectors of any subsequent regulation which may be needed. We welcome that the Data Protection Bill and GDPR appear to address many of the concerns of our witnesses regarding the handling of personal data, which is key to the development of AI. The Government Office for AI, with the Centre for Data Ethics and Innovation, needs to identify the gaps, if any, where existing regulation may not be adequate. The Government Office for AI must also ensure that the existing regulators’ expertise is utilised in informing any potential regulation that may be required in the future.”

The committee’s last two starter principles for their voluntary AI code serve to underline how generously low the ethical bar is really being set here — boiling down to: AI shouldn’t be allowed to kill off free schools for our kids, nor be allowed to kill us — which may itself be another consequence of humans not always being able to clearly determine how AI does what it does or exactly what it might be doing to us.

News Source = techcrunch.com

Go to Top