Computer Vision for European Urban Sidewalk Accessibility

Kasper Verhavert

Sep 19, 2023 • 8 min read

Feature image for: Computer Vision for European Urban Sidewalk Accessibility

Cities should be easy for everyone to get to and move around in. But this doesn’t always happen. While cities are putting effort in creating a more open environment for people with disabilities, these efforts do not always suffice. This is not just due to a lack of resources, but also the way they are being used.

Right now, inspectors have to go out and physically check the sidewalks and streets to gather information about how accessible they are. While this is the best way to know about the condition of different parts of a city, it takes a great amount of time. But what if there is a way to avoid this slow and tedious process.

This is where Project Sidewalk comes in. With this project research is being done in using AI to do these inspections, instead of doing them manually. However, creating an architecture to do these inspections has some drawbacks. One major hurdle to overcome, is that training an Artificial Intelligence model requires a lot of data.

Project Sidewalk is a public crowdsourcing tool, made by the University of Washington, designed to overcome this hurdle. Using this tool everyone, whether they are an expert or just an interested individual, can start validating the sidewalk accessibility features of a certain city. By following the labeling guide, the user can virtually walk through these cities, and point out inadequate accessibility features for people with disabilities.

This is a screenshot of the Project Sidewalk crowdsourcing tool. It works just like Google Streetview, but the user can also place accessibility labels according to the labeling guide.

This platform has yielded a ton of crowdsourced data, which can be directly used to train the AI model for automatic feature detection. Using this model, the city council could potentially reduce the amount of costly and time-consuming inspections and instead focus their resources on doing what really matters: fixing these accessibility issues.

The city of Amsterdam saw the immense potential and has therefore joined this initiative. By collaborating with the people of Project Sidewalk, the City wanted to find out how this platform could work for them.

Quantity vs quality of data

One important aspect to keep in mind when training an AI model, is the quality of data that is going to be used. It makes sense that testing a model that was originally trained with data from Seattle, is going to perform better when testing it on Seattle itself than, for example, Chicago. The model is therefore going to be more used to the surface materials, road marks, curb ramps, general looks, etc. of Seattle, than the city of Chicago.

This is what we mean with the quality of training data. If the AI model gets fed with more accurate data, the model itself is also going to be more accurate when being used. However, there is a catch. Sometimes, there is not enough good quality training data available to train the model. In this case, the AI model can be fed more, but lesser quality data to improve its decision making.There is a clear trade-off researchers have to deal with. Are we going to use more dissimilar data or less but accurate data to train our model? The researchers over at the University of Washington tested on both.¹ To test these claims, they trained AI models on four American cities.

1) In a first experiment, they created a model for each city, trained on their respective dataset.

(2) The second experiment consisted of testing one model, trained with all the data over the four cities, on each city.-

(3) In the last experiment, they created a model for each city, trained on the data over all cities, except the city the model is being tested on.

The results were not really surprising. The cities that had the most crowdsourced data performed the best in the first experiment. In the second experiment, they saw that cities with a lack of data performed better when the data of all cities was combined. However, they also saw a decrease in performance in cities with a lot of available training data, like Seattle, for example.

The most interesting conclusion, however, comes from the third experiment. While the performance of those models was worse in every aspect than the general model of the second experiment, it did not perform badly at all. This is surprising, to say the least, as those models were tested on a dataset from a completely different city than they were being trained on. This is really good news because it opens the possibility of creating a general AI model that can be used in every city without needing data for that city.

Amsterdam compared to American cities

While a general model certainly seems like a good idea on paper, it is not going to be that easy in practice. The models in the previous experiments were all trained and tested on American cities. While these cities all differ from one another, they still look vastly similar. The problem is that not every city in the world looks like an American city.

We could for example take a look at the city of Amsterdam. The city was founded in the thirteenth century as a small fishing village. This makes it 500 years older than the whole United States. The city council has put in a lot of resources to preserve the city. Therefore, the city is obviously going to look different as opposed to a newer American city like Seattle.

Apart from age, cities have other reasons to look different. Different climates and geographical locations are also going to give shape to a city. But there are still so many more aspects. From the political situation to the economic growth over the decades, but most importantly, the cultural differences between cities also play a role.

The fact is that cities in different countries or continents look massively different. And this could have a significant influence in exporting Project Sidewalk outside the borders of North America. The general model trained on American data could potentially perform much worse on cities in different continents.

Another question we could ask is if we could improve a model trained on a small dataset like Amsterdam by adding this vastly different data, or is the drop in quality so significant it is not even worth it to feed the model this data?

The experiments

To have an idea how difficult it is for an AI model to detect accessibility features from vastly different cities, we used the same general model as from the second experiment researchers at the University of Washington, that is, a neural network trained on four American cities. This way we could directly apply it to Amsterdam and compare the results with the American experiments.

Here we can see some mistakes the Amsterdam model made on the “Obstacle” feature. The model classified these crops as positive for having an obstacle. We could argue there is an obstacle in the first and last crop, but since there is plenty of space to go around this obstacle, it should not be classified as such.

The answer however, is not all black and white. The general American model performs significantly worse on Amsterdam data across almost all accessibility labels. These labels, like for example “Surface Problem” or “Missing Curb Ramp”, are all depending on the general look of the city. We could say that a surface problem looks vastly different on concrete than on red bricks. Therefore, this label is really dependent on the type of surface. Thus it actually makes sense that for these labels the American general model does not perform as well on a different looking city like Amsterdam. However, there also was a label that performed just as well as on the other cities in experiment 3: “Obstacles”. This also makes sense, since this label is not dependent on, for example, the type of surface. We could say for the obstacle feature, that the quantity of data is more important than the quality.

To see whether quality is more important than quantity for the other accessibility features, we could compare the results of the last experiment to the ones of a model trained only on Amsterdam data. To do this, we trained a model on 20.000 panorama images from the streets of Amsterdam. This is significantly less than the general model trained with over 500.000 panoramas of North American cities.

The results were also not very clear. The “Curb Ramp” feature for example got classified significantly better than the general model. In the “Obstacle” accessibility feature, there was not an improvement, but it also did not perform worse. The results were therefore comparable. However, the “Surface Problem” accessibility feature performed, surprisingly, significantly worse than the general model. Because the model was trained on the smaller but more significant Amsterdam data, the reason for this decrease in performance could only be because of one thing and one thing only: lack of training data.

This brings us to our third and final experiment. In this experiment, we have tested if we could increase performance when using a combination of Amsterdam data and American data, when tested on an Amsterdam dataset.

To do this experiment, we started from the second experiment. We trained eight more models on the Amsterdam dataset combined with different amounts of American data. Starting from the 20000 cropped images from the Amsterdam dataset, we added 5000 crops of American images for each subsequent model. This means that at each step we extended the Amsterdam dataset by 25%. Thus, we ended up with a final model trained on double the amount of American data as opposed to Amsterdam data.

These results were not as groundbreaking as we had hoped for. Again, to get a better understanding of the results of this experiment, we should look at them for each accessibility feature individually. The “Curb Ramp” feature for example, did not profit from the extra training data. In fact, it performed worse with the added data. Then there was the “Obstacle” accessibility feature, which did benefit a bit from this added data, but since every model performed very well on this feature, the profits were negligible.

The other two labels, “Missing Curb Ramp” and “Surface Problem” were significantly harder to test performance on, since there were so few positive labels. Therefore, testing accuracy can really give a distorted image, since the best tested models were just overpredicting the majority class while not having the best results on the other. In terms of accuracy, the best result for “Missing Curb Ramp” was the general American model. For the “Surface Problem” however, it was a split between Amsterdam data and American training data. But, as said before, the models were not the best at classifying the positive cases.

Conclusion

The main goal of these experiments was to find out if the American model would be heavily affected by testing on Amsterdam data instead of American data. We can confidently say it is. However, since there is lack of Amsterdam data, we could see that for some accessibility features it is preferred to have a lot of dissimilar data as opposed to the test set, than to have little accurate training data.

So, should we use Amsterdam data, American data or a combination of both? Well, there is not a clear cut answer. Since in this particular case, the model gets split into binary classifiers (one per accessibility feature), we really should look at it feature per feature to decide as to what data to train the model on.

Some classifiers of certain features did improve with adding the dissimilar data to the train set. However, the safest bet to improve the classification of sidewalk accessibility features is still to gather more data specifically for Amsterdam.

References

[1] Duan, Michael, et al. "Scaling Crowd+ AI Sidewalk Accessibility Assessments: Initial Experiments Examining Label Quality and Cross-city Training on Performance." Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. 2022.

Note: This research was done as part of a Bachelor thesis internship project.

* Header image taken from Google street view.

Kasper Verhavert

Data Science and AI Bachelor Student from the University of Brussel.

Related Resources

Results

Sort by latest