In this tech era, all tech giants and others companies have come under scrutiny on how they collect and use user data. Even search engine giant Google, regularly tells us that their prior objective is users' privacy and security. But it seems this is all for saying only.
A new research paper [PDF] by Douglas Leith, a computer science professor at the Trinity College Dublin, reveals that Google's Messages and Dialer/Phone apps have been collecting some user data without showing any notice or taking user consent. Given the recent changes to EU privacy laws, this could be a violation of GDPR.
Leith in his paper titled "What Data Do The Google Dialer and Messages Apps on Android Send to Google?" claims that Google's Messages and Dialer apps have been sending data to the company's servers without taking explicit user consent. More specifically, these apps collect information about user communications, including a SHA256 hash of the messages and their timestamp, phone numbers, incoming and outgoing call logs, call duration, and length. All these data were been shared with Google Server through Google Play Services Clearcut logger service and the Firebase Analytics service.
"The data sent by Google Messages includes a hash of the message text, allowing linking of sender and receiver in a message exchange," the paper says. "The data sent by Google Dialer includes the call time and duration"
The timing and duration of other user interactions with these apps have also been transmitted to Google. And Google offers no way to opt out of this data collection.
Nowadays Google is getting strict with the Android app developer regarding the users' privacy concerns. For this, Google strictly defines rules for a developer regarding user's permissions, but Google itself didn't follow these rules for their own apps.
The research paper highlights that both Google apps (Google Messages and Phone app) do not feature privacy policies to explain what data is being collected from users, as Google itself requires from third-party apps on the Play Store.
Google Play Services does inform users that some data is collected for security and fraud prevention, but there's no explanation on why exactly message content and call info are collected. If users want to download the information shared with Google via Google TakeOut, no such information is made available in exported data associated with their account.
Google Messages app is installed on over a billion Android devices worldwide. Also, Google Dialer (also known as Phone by Google) comes preloaded on many phone manufacturers like Huawei, Samsung, and Xiaomi. Both pre-installed versions of these apps, the paper observes, lack app-specific privacy policies that explain what data gets collected.
Suggestion to Google and its Fix
Leith disclosed his findings to Google last November and said he has had several conversations with Google's engineering director for Google Messages about suggested changes. He recommended the total of nine changes to Google, out of which six have already been implemented:
- The specific data collected by Dialer and Messages apps, and the specific purposes for which it is collected, should be clearly stated in the app privacy policies.
- Data on user interactions with an app, e.g., app screens viewed, buttons/links clicked, actions such as sending/receiving/viewing messages and phone calls, is different in kind from app telemetry such as battery usage, memory usage, slow operation of the UI. User’s should be able to opt-out of the collection of their interaction data.
- User interaction data collected by Google should be made available to users on Google’s https://takeout.google.com/ portal (where other data associated with a user’s Google account can already be downloaded).
- When collecting app telemetry such as battery usage, memory usage, etc., the data should only be tagged with short-lived session identifiers, not long-lived persistent device/user identifiers such as the Android ID.
- When collecting data, only coarse time stamps should be used, e.g., rounded to the nearest hour. The current approach of using timestamps with millisecond accuracy risks being too revealing. Better still, use histogram data rather than timestamped event data, e.g., a histogram of the network connection time when initiating a phone call seems sufficient to detect network issues.
- Halt the collection of the sender phone number via the CARRIER_SERVICES log source when a message is received, and halt collection of the SIM ICCID by Google Messages when a SIM is inserted. Halt collection of a hash of sent/received message text.
- The current spam detection/protection service transmits incoming phone numbers to Google servers. This should be replaced by a more privacy-preserving approach, e.g., one similar to that used by Google’s Safe Browsing antiphishing service, which only uploads partial hashes to Google servers.
- A user’s choice to opt-out of “Usage and diagnostics” data collection should be fully respected, i.e., result in a halt to all collection of app usage and telemetry data.
Google has provided an explanation of some of its data collection practices:
- The message hash is collected for detecting message sequencing bugs.
- Phone numbers are collected to improve regex pattern matching for automatic recognition of one-time passwords sent over RCS. Messages automatically recognize incoming One-Time Password (OTP) codes to avoid the user having to fill them in. This can be a frequent point of failure and the phone number data is used to improve recognition by providing ground truth based on known OTP sender numbers.
- The ICCID data is used to support Google Fi.
- Firebase Analytics logging of events (not including phone numbers) is used to measure the effectiveness of app download promotions (for Messages and Dialer specifically). Namely, to measure not only whether the app was downloaded but also whether it was used once downloaded.