Blurring as a Service
1. The organization
The organisation subject of the present mini data-ethical consultation is the municipality of Amsterdam. In 2020, Amsterdam along with Helsinki were the first cities to launch open AI registers, which provide an overview of algorithms being developed and deployed for city services. In response to increasing use of algorithms for public services, the initiative was launched to allow citizens to understand and participate in the debate about AI developments. The version of the register in 2020 included information about three algorithms (Team AI Regulation, 2021). At the time of writing, the Amsterdam algorithm register counts 42 reported algorithm projects. One interesting project listed in the register is titled ‘Blurring as a Service (BaaS)’, which is an anonymization algorithm blurring people and license plates in panoramic images of Amsterdam City.
The blurring project was launched in response to the already ongoing collection and analysis of panoramic images of the entire city which are taken every year since 2016 by the Mobile Mapping team.
2. The AI Technologies Employed
First, Mobile Mapping is a technique where a vehicle uses special scanners to create 2D/3D panoramic images of surroundings. This scanning technique allows to determine the exact position and dimensions of each object in the 3D photo (Mobile Mapping, 2021).
The panoramic images allow municipal employees to inspect the public space from their workspace for various purposes, such as “accessibility with special vehicles or the inspection of roads (translated from Dutch)”. To fulfil these tasks recognizing personal data is not necessary, therefore it was decided to anonymize the images by means of the blurring algorithm.
The Blurring as a Service (BaaS) algorithm consists of a machine learning model that detects people and license plates in images and subsequently blurs them.
The model used for people detection is YOLOv5, a convolutional neural network commonly used for object detection. It can be adapted to recognize new categories in images based on a user’s specific needs. For BaaS, YOLOv5 was trained to take panoramic images as input and predict the location of people and license plates indicated by bounding boxes. Next, the areas within the bounding boxes are blurred.
Training data. The panoramic images of prior years were used as training data (10.000 raw images) for the development of the algorithm. The training images were manually annotated where annotators draw a box to indicate the presence of people and license plates. The training dataset is stored in the Azure cloud environment of the municipality of Amsterdam and is retained for the duration that the algorithm is in use for potential further improvements. The raw panoramic images are stored in an encrypted environment and only civil servants who need the images can access them, such as the developers of the algorithm.
Testing data. A portion of the data (1.000 images) is kept aside to test algorithm performance.
Model performance. The people and license plate detection algorithm has an accuracy of roughly 95% for people and 97% for license plates that are close to the camera. Note, that the algorithm is tuned so that it rather anonymizes a little too much than too little, which means that a tree or a scooter may end up blurred as well. Finally, the website states that people who are not recognized are usually not identifiable by humans either, for instance because they are partially behind a tree. They state that ideally, these people would also be anonymized, however this is not yet possible.
Model oversight. Multiple processes were set up to prevent model errors:
- When processing a batch of images, a sample is taken and checked manually. This aims to verify that the algorithm does what is expected.
- A feedback process has been set up so that errors can be corrected. In addition, these errors can also be used to improve the algorithm.
- An annual evaluation takes place to determine whether the algorithm needs to be improved.
Finally, the areas inside the box are blurred. The blurring is done by applying a filter to the pixels within the box to reduce detail (Computer-Vision Team Amsterdam, n.d.).
3. Ethical concerns
At first glance, the blurring algorithm seems to be a largely positive development, which enables previously lacking privacy protection. Municipal employees will now be working with images that contain non-identifiable instead of identifiable people and license plates. However, ethical concerns regarding BaaS and the general context in which it used remain.
Privacy
First, the question rises whether privacy is effectively protected via Blurring as a Service (BaaS).
In the algorithm register website, personal data at stake is listed as follows:
- Face and posture of people in public space.
- Face and posture of people who are in a residential object or office, or who are behind windows.
- License Plates.
Looking at the example picture on the algorithm register website, one can see that while people close to the camera are blurred, one can may still identify characteristics such as skin colour, age, profession, clothing, religion and more. Moreover, looking through the digital map of Amsterdam based on panoramic images, one can see many examples of people are further away, which are not blurred. While their face might not be directly identifiable due to distance other physical traits remain recognizable. This raises the question whether the scope of personal data may be too narrow and would benefit from further attention and whether the extent of blurring is sufficient to prevent privacy risks. Here, another important concern rises, namely that even when pictures are effectively blurred people may still be identifiable by means of advanced facial recognition software. Research found that only 10 fully visible example images of a person’s face are necessary to identify a blurred image with 91.5% accuracy (Oh et al., 2016). Finally, what may not be immediately clear to outsiders is that to blur images, it is first necessary to detect people and license plates in images, which very closely resembles facial recognition technology. Hence, to blur images it is necessary to implement technology similar to very high privacy risk technology implicitly encouraging its development. Hence, blurring images may give a naive sense of privacy protection.
Discrimination
Another ethical concern typical of machine learning systems relates to bias and discrimination, which the algorithm register mentions as an important area of concern. The accuracy of blurring images will differ between different demographic groups due to underrepresentation in the training dataset. In the case of BaaS, the accuracy of detection of children that are further away is lower compared to other groups. While this is an important concern within the project, the protected categories that they are using are currently limited to: age, sex and skin color (Computer Vision Team Amsterdam, n.d.). However, other groups may also not be well represented and are not accounted for. For instance, visually disabled people, or people in religious attire are not part the protected groups. Moreover, intersectional fairness seems not to be accounted for either. Note that a fairness analysis document is referred to on the website, but it does not open, therefore the detailed reasoning behind bias mitigation is not available.
Legitimization of Surveillance Infrastructure, Dual Use and Function Creep
The last concern is a more serious, broader and inherent concern about the use of BaaS and it pertains to the legitimization of surveillance infrastructure.
The algorithm register’s website states that BaaS is currently used within two applications:
- Collecting panoramic images twice a year with the purpose of keeping core registrations up-to-date and reliable.
- In the short term, the algorithm is expected to be used to anonymise images that may show illegally placed (heavy) containers on vulnerable quay walls and bridges. (This involves 2000 images per year.)
They add that in the future the number of applications may increase. Moreover, what it is not immediately apparent from the website is that the creation of the panoramic image database of Amsterdam City is also used to create a central data platform and a virtual data map of Amsterdam, where images are blurred (Data en informatie, n.d.). This application context significantly differs from the first since here access to the panoramic images along with other city information (e.g., transport routes, cultural heritage) is available online for anyone not only to municipal workers.
As its name suggests BaaS is presented as a privacy solution and is already applied to various applications, which will likely increase. At the same time, however, BaaS may normalize the use of object and people detection algorithms, which closely resemble facial recognition systems. Hence, the increased use of BaaS indirectly legitimizes the creation of surveillance infrastructure. While access management to the algorithm and its training data may currently be well-monitored, the fact that the infrastructure is increasingly built poses a risk in terms of dual-use and function creep.
If people detection algorithms are in place, they create the risk for expanding their functions beyond the original purpose, which may intentionally or unintentionally contribute to harmful consequences. For instance, the function of people detection algorithms may be expanded to quickly detect protestors in various images instead of blurring people. Moreover, in order to fight discrimination, the developers of the algorithm have to actively collect data in public spaces of protected groups such as children or dark coloured people in their neighbourhoods so that algorithm performs better for them, which again leads to increased surveillance practices.
4. Recommendations
One straightforward recommendation to improve privacy and the effectiveness of blurring is to increase the level of blur so that other sensitive traits become non-visible as well. A step further would be to apply blacked-out/greyed-out boxes instead of a blur to lower the risk of facial recognition being able to identify people in public space. To improve discrimination prevention in terms of who is blurred, the scope of the fairness analysis should be expanded and continuously reconsidered. For instance, other protected groups as well as intersectional fairness should be added in the evaluation. Finally, to prevent the concern of legitimizing surveillance infrastructure, strong safety mechanisms that prevent illegitimate use of the technology should be put in place. Every technology should be evaluated in terms of the extent it allows different uses and tailored guardrails should be created, for instance in terms of access. Finally, long term risks need to be considered in the development of such projects, specifically it should be made clear in the beginning the type of uses that should be prohibited under any circumstance.
References
Computer Vision Team Amsterdam. (n.d.). GitHub - Computer-Vision-Team-Amsterdam/Blurring-as-a-Service: Removing personal data from Imagery. GitHub. https://github.com/Computer-Vision-Team-Amsterdam/Blurring-as-a-Service
Data en informatie. (n.d.). https://data.amsterdam.nl/data/geozoek/?center=52.3496702%2C4.8561457&fov=27&heading=-144&lagen=geldzn-mgpindg|geldzn-mgpindgz&legenda=true&locatie=52.3733935%2C4.8935746&pitch=4
Mobile Mapping (2021). https://www.amsterdam.nl/bestuur-organisatie/organisatie/dii/basisinformatie/basisinformatie/mobile-mapping/
Oh, S. J., Benenson, R., Fritz, M., & Schiele, B. (2016). Faceless person recognition: Privacy implications in social media. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14 (pp. 19-35). Springer International Publishing.
Team AI Regulation (2021, March 12). Amsterdam and Helsinki launch Algorithm and AI Register - MIAI. MIAI. https://ai-regulation.com/amsterdam-and-helsinki-launch-algorithm-and-ai-register/