The Truth About Privacy in COVID Tracing Solutions
TL;DR
- There are two most promising solutions that ensure most of the requirements for COVID tracing apps and mitigate the majority of security risks.
- The decentralized approach is the only correct one but still there exists open issues, mostly coming from the fact that Authorities are treated as a trusted entity.
- The authors of COVID tracing applications should also discuss non-privacy threats such as calling a panic or denial of service.
- There is a need for a solution that treats the Authorities (e.g. governments) as a potential threat to users’ privacy and is not vulnerable to attacks resulting in panic. It is achievable.
Do we really need COVID tracking app?
The SARS-CoV-2 virus caused the lockdown in many countries which had a significant economic and social impact. Now, the governments have begun the processes of “defrosting” economies and “freeing” citizens from their homes. However, the virus is still out there.
That was the motivation behind COVID tracing solutions (or any virus tracing solutions).Their main goal is to enable quick notification of people at risk that have been in a close range to the infected person. Such a person is informed about the next steps. An additional goal indicated by scientists is to enable epidemiologists to analyze the spread of COVID-19 (or any other future disease).
The aim of this article is to shed some light on the privacy issues of the COVID tracing solutions from the perspective of a typical user (sharing their data). I will focus on the existing solutions recognized as a good direction and propose additional privacy enhancements.
If you are looking for a simple answer whether it is possible to construct a tracing solution that respects users’ privacy then the quick answer is “Yes”. Stay with me and I will explain how.
What requirements should a successful contact tracking app meet?
The aforementioned goals are achievable only if certain requirements are met. The most important requirement is to keep the COVID tracing solution as transparent and privacy-preserving as possible. Without ensuring this requirement the people will not trust it and it will not get widely adopted and become practically useless.
Note: The countries that enforce people to use such solutions under pain of punishment are out of scope.
More specifically we can enumerate the following requirements:
Privacy | The application must not allow to track the location of the users nor make their infection status public. Some additional privacy requirements are enumerated, such as not allowing to find the places with many infected people (without visiting the place). |
Confidentiality | No one can get access to the sensitive user information that must be and is processed by the system (e.g. the data encrypted by cryptographic keys or processed by partially trusted parties).Fulfilled with proximity based systems. |
Completeness | The solution must collect as little information as possible but enough to be sure that the information about the contacts made by infected patients are deliverable to all users in contact.Fulfilled with anonymously broadcasted identifiers . |
Soundness | The solution must not generate false positives and must reflect the real situation.Fulfilled with proximity based systems and the Health Authority confirmation. |
Integrity | Malicious users or institutions cannot modify or forge the messages and falsify the results. There must be a way to authenticate them because otherwise the malicious actor would be able to spoof positively diagnosed users on a big scale and create a panic.Fulfilled with identifiers generated from a secret value and by signing the important messages (e.g. the positive diagnosis) by the Health Authority. |
Scalability | The solution must be operable by millions of users and the adoption process should be as easy as possible. It cannot require to get (or buy) additional devices.Fulfilled with proximity based systems that use Bluetooth communication between the smartphones. |
Interoperability | The solution must be operable world-wide (across borders and authorities). Fulfilled with an idea to create an international standard. |
To be or not to be decentralized?
The systems that fulfills in general all requirements and are widely accepted are the proximity based systems. There exist two approaches: centralized and decentralized systems.
Centralized
In these systems (e.g. BlueTrace) the identifiers are generated by the government or other trusted institution and assigned to users on the central server. The location history of infected users is built on the server and the other users which were in the range of the infected one are informed. If you see a “decentralized” in their description it means that the identifiers are broadcasted and logged in a decentralized way (like in all proximity based systems) but all the correlation is done on the central server.
Pros:
- Easier to implement and maintain.
- Not affected by a number of threats coming from users (e.g. generating false alarms).
Cons:
- Prone to attacks because of the huge amount of sensitive data stored. In case of a successful attack a huge amount of sensitive data is leaked.
- Assumes that the central server operator (most probably governments) is trusted and will not misuse the collected data. This is an optimistic assumption in my opinion as people tend to not trust their governments.
Decentralized
Decentralized systems (e.g. DP-3T, Privacy-Preserving Contact Tracing by Google & Apple solution) keep as much data offline on the users’ devices as possible. They transmit as much information (only the required information of course) device-to-device as possible instead of using the central server.
Basically we can say that these systems assume that the authorities are not fully trusted in order to increase people’s trust. When the Authorities (health authority or government) are assumed as one of the threat actors, the centralized solution does not fulfill the privacy requirement
Pros:
- Easier to preserve privacy and build people’s trust.
- The sensitive data is stored on users’ devices and is not prone to large leaks.
Cons:
- Requires protocols (e.g. cryptographic) to ensure additional security requirements.
- Affected by more threats coming from users.
We can state that the decentralized approach is the correct one but this does not solve the problem. The decentralized approach creates many challenges that have been taken by the Google & Apple team and DP-3T authors. And when you dig into the design and implementation of proposed solutions you notice that the devil is really in the details.
Let’s first find out what are the potential privacy threats and later we will focus on the particular solutions including Google & Apple solution and DP-3T.
What are the threats in COVID tracking solution?
Basing on the key assets (location history, user diagnosis status) and security requirements we can simply enumerate the threats:
- Leak of user’s location history
- Leak of user’s diagnosis status
- Overall panic (e.g. caused by massive broadcast of positively diagnosed identifiers)
By enumerating threat actors we can deduce the risk which is the level of real impact on system operations and key assets based on the potential impact of the threat, the likelihood of the threat occuring and the easiness of causing the threat.
is caused by | Users | Authorities | Mobile Vendors |
Leak of location | High | Medium | Not Applicable |
Leak of diagnosis status | High | Not applicable | Low |
Panic | High | Not applicable | Low |
Risks coming from Users
(for example hacktivists or those interested in the information about a particular user.)
I have set all risks coming from users to high because the group of users is huge (counted in millions or even billions world-wide) and there is a significant probability that some users will be highly motivated to track other users and know their diagnosis.
Additionally, hacktivist groups could be willing to cause panic. Recent history shows that such events like pandemia are very delicate and people get scared easily.
Risks coming from Authorities
(including health authorities and governments which are treated as a trusted entity in most of the approaches.)
The Authorities might be interested in the location of the particular citizens (e.g. for the political purposes) and this is the information that Authorities cannot easily gain without the cooperation with big corporations.
It depends on the country but generally people are afraid about their privacy and do not fully trust the governments. Numerous events showed that people treat their privacy and are willing to fight for it when the governments try to restrict it (even though they “sell” it to big companies).
I have not assigned any risk generated by the Authorities to the last two threats because:
- The Authorities know the user’s diagnosis status and that is a part of the protocol flow (the Authorities are partially trusted) so this scenario is out of the scope.
- The Authorities, such as the governments, are able to falsify the information in the media and fake the official data. That would be easier to perform and it would be harder to verify by citizens.
Risks coming from Mobile Vendors
(that include device vendors and mobile operating systems vendors.)
The diagnosis status is an interesting information for the mobile vendors but I would say they have bigger interest in the correct operation of the system. I also do not see the point why would mobile vendors like to cause panic by broadcasting falsely positive diagnoses identifiers.
The Mobile Vendors already have the technical possibilities to create panic or know the user’s diagnosis status by reading the contents of the user’s mail or spoofing them.
I have not assigned the risk related to the location history leak because Mobile Vendors (especially operating system vendors) already have access to the places visited by users and in fact, the users have access to this list as well. You can check where your mobile has been over time.
How do decentralized solutions achieve privacy?
As mentioned earlier I am going to focus on the decentralized approaches only, including:
- Privacy-Preserving Contact Tracing by Google & Apple,
- Decentralized Privacy-Preserving Proximity Tracing (DP-3T).
In this chapter I am going to address the previously enumerated risks. In the next section I will address the issues that are still open. If you are interested in how they work to achieve the following mitigations check out my previous article The Most Promising COVID Tracing Solutions.
Risk mitigations in Privacy-Preserving Contact Tracing by Google & Apple
Google and Apple companies started to work together on a solution that would be interoperable across the most popular mobile operating systems (Android and iOS) which is a good decision in my opinion, because such cooperation simply fulfills some of the requirements (i.e. interoperability, scalability).
The solution by Google and Apple mitigates the risks related to the users’ location and diagnosis status leaks by the following:
- The applications published by the Health Authorities must follow the special security rules defined by Google and Apple (e.g. it cannot ask for the GPS permission). This protects from tracing by Authorities.
- Authorities’ applications cannot include static or predictable information that could be used for tracking users because the temporary key schedule is fixed and defined by operating system components.This protects from tracing by Authorities.
- A temporary key is required to correlate between a user’s consecutive identifiers. This reduces the risk of privacy loss from broadcasting the identifiers. This protects from tracing by Users.
- When uploading a set of temporary keys to the Diagnosis Server each key allows enumeration of identifiers from one day only. This protects from long-term tracing by Users and Authorities.
- If the user stays healthy, his temporary keys do not leave the device.This protects from tracing by Users and Authorities.
Mitigation of risks related to panic:
- The solution is protected from the attack where the malicious user re-broadcasts the identifiers marked as infected. It is mitigated with timestamping identifiers received from other users and timestamping the uploaded temporary keys. When a potential victim retrieves the re-broadcasted identifier she notices that the identifier timestamp is later that the temporary key timestamp thus the identifier is replayed and invalid. This protects from massively called false alarms and panic.
- As mentioned above I could not find any information about how the diagnosis of the user related to the identifiers is verified by the Health Authority.
- Additionally, I could not find how users verify whether the retrieved identifiers are valid. If they do not an identifier flooding by malicious users could extremely increase the size of collected data. This is going to be discussed in the next section (see Open privacy issues and solutions).
Risk mitigations in DP-3T
The Decentralized Privacy-Preserving Proximity Tracing is a proposal by a wide group of scientists from different fields. The authors stated that the solution proposed by Google and Apple system is very similar to their early proposal named “Low-cost decentralized proximity tracing”.
Later they add: But, we also strongly believe that Apple and Google should adopt our subsequent enhancements, detailed in later versions of our white paper, which increase user privacy. We also strongly encourage both companies to allow an external audit of their code to ensure its functionality corresponds to its specification.
The following list of risk mitigations introduced by DP-3T contains additional mitigations compared to the Google & Apple solution.
The risks related to the users’ location and diagnosis status leaks are mitigated by the following:
- The identifiers are short-lived (1 minute). It makes it more difficult to track a person by eavesdropping the identifiers. This protects from tracing by Users.
- Diagnosed users select a subset of seeds (and thus the identifiers) to be sent to the server. They can redact the identifiers that reveal the particular location. This protects from tracing by Authorities and Users.
- The secret keys (the first version) and seeds (the second version) are known only to the user’s device and the Authorities if and only if the user is infected. This protects from tracing by Users.
- The authors propose additional identifiers’ protection against eavesdropping by spreading with secret sharing. The idea is to divide the identifier into many parts and broadcast them sequentially. To recover the identifier the contacted user must retrieve a given number of parts. This protects recovering identifiers by eavesdropping from a distance (with strong antenna) because of the higher packet loss. This protects from tracing by Users.
- The use of trusted execution environments (TEE) is proposed (in the first version) to mitigate the risk of infected patients identification. Indeed an adversarial at-risk user can learn which infected individuals they have been in contact with. The idea of mitigation is to decrypt the identifiers and calculate the risk in TEE. This protects from diagnosis status leak.
Mitigation of risks related to panic:
- Same as for Google & Apple solution, DP-3T is protected from the attack where the malicious user re-broadcasts the identifiers marked as infected. Both the broadcasted identifiers and filtered identifiers sent by the Health Authority are timestamped. When a potential victim retrieves the re-broadcasted identifier she notices that the identifier timestamp is later that the temporary key timestamp thus the identifier is replayed and invalid. This protects from massively called false alarms and panic.
- The authors point out the risk of fake contact events (a victim retrieves the information that has been in contact with an infected patient). In the first version no mitigation is proposed, but in the second version the identifiers are bound to an epoch (timestamped) so the adversary cannot broadcast the identifiers from the past epoch because the victim would easily detect that they are not valid anymore.
- A malicious user cannot impersonate the identifiers of others and later claim their ownership. To do so he would have to know the identifier’s secret key or the seed which are later sent to the Health Authority (after positive diagnosis). This protects from generating false positive alerts which could lead to panic.
- Same as with the Google & Apple solution I could not find any information about how the diagnosis of the user related to the identifiers is verified by the Health Authority.
- Additionally, I could not find how users verify whether the retrieved identifiers are valid. If they do not an identifier flooding by malicious users could extremely increase the size of collected data. This is going to be discussed in the next section (see Open privacy issues and solutions).
Open privacy issues and solutions
The authors of both mentioned solutions have addressed many potential threats. However, there still exists unresolved issues related to privacy and security. Some of them are mentioned in the documentation but no mitigation is proposed.
In this section I enumerate these issues and propose some additional mitigations.
Disclosure of the relationship between user and his identifiers (Tracing by Authorities)
The potentially dangerous step in the contract tracing solutions is verification of the infected user and uploading their identifiers (or secret keys, seeds, etc.) to the Health Authority. The solution by Google & Apple does not focus on this step while DP-3T proposes the use of some authorization tokens. After a positive verification of the user’s diagnosis, the Health Authority generates a token that is required to upload the identifiers.
However, both solutions assume that the Health Authority (or backend server) is trusted. Such a statement conflicts with the threats identified in the previous section. Indeed, the Health Authority can reveal the relationship between the verified user and his identifiers. Even if the identifiers are uploaded by an independent channel, the user’s IP address is revealed. That allows to track users in the past based on their identifiers given that the Health Authority (usually a part of the government) has also covered the monitored area with Bluetooth receivers.
To mitigate this risk, the following can be introduced:
- Fully anonymous identifiers that will not allow to link them with the user on any step of the system protocol (even when submitting identifiers to the Health Authority after positive diagnosis).
- The secure channel to upload the identifiers without revealing the IP address. Here are the three possibilities:
- An upload through a Tor network,
- A decentralized upload of the identifiers via other users’ devices,
- An upload to a public decentralized storage.
Panic call through submitting identifiers falsely marked as infected
This potential threat appears at the same step as the previous one – verification of the infected user and uploading their identifiers (or secret keys, seeds, etc.) to the Health Authority.
It is important for the Health Authority to make sure that the uploaded data belongs to the infected patient. Otherwise, a malicious infected user could get the secret keys or seeds from other malicious users all over the country and submit them. This would immediately call false alarms in the places visited by other malicious users.
To mitigate this risk, following can be introduced:
- The Health Authority must be able to confirm that the submitted keys, seeds or other form of identifiers belong to the infected person without revealing the relationship between the identifiers and the patient.
The proposed mitigation however does not protect from the scenario where the malicious users exchange their identifiers and secret keys (seeds, etc.) beforehand and broadcast all of them. Such an attack would have the same consequences – one diagnosed patient would call an alarm in many places he has not visited. The problem is that there is no simple solution to mitigate it.
Denial of service by identifier flooding
Neither Google & Apple solution, nor DP-3T focus on the validity of the broadcasted identifiers. A malicious user could easily broadcast random Bluetooth messages and significantly increase the size of collected and stored data.
Even though it is quite hard to perform such an attack at scale it is quite easy to mitigate such risk by introducing identifiers anonymously signed by the Health Authority. After receiving a signed identifier, the recipient could easily verify the signature and deny invalid ones.
The proposed solution reduces the storage size at the cost of increased CPU consumption. Therefore it is a matter of balance between the resources consumption.
Conclusions
It is no discovery that a COVID tracing application which preserves users’ privacy is a challenging task. There are solutions that go in the right direction. However, there are still open issues that need to be addressed.
I am working on a solution that will allow us to keep privacy from the Authorities. A solution based on the timestamped blind signatures. In the next article I will present my vision.
If you enjoyed this kind of research and want to be informed about the next one, subscribe to our newsletter below!
Head of Blockchain Security