How Can Your Hotel Photos Help Combat Human Trafficking?

Abby Stylianou built an app that asks its users to upload photos of hotel rooms they stay in when they travel. It may seem like a simple act, but the resulting database of hotel room images helps Stylianou and her colleagues assist victims of human trafficking.

Traffickers often post photos of their victims in hotel rooms as online advertisements, evidence that can be used to find the victims and prosecute the perpetrators of these crimes. But to use this evidence, analysts must be able to determine where the photos were taken. That’s where TraffickCam comes in. The app uses the submitted images to train an image search system currently in use by the U.S.-based National Center for Mission and Exploited Children (NCMEC), aiding in its efforts to geolocate posted images—a deceptively hard task.

Stylianou is currently working with Nathan Jacobs‘ group at the Washington University in St. Louis to push the model even further, developing multimodal search capabilities that allow for video and text queries.

Stylianou on:

Which came first, your interest in computers or your desire to help provide justice to victims of abuse, and how did they coincide?

Abby Stylianou: It’s a crazy story.

I’ll go back to my undergraduate degree. I didn’t really know what I wanted to do, but I took a remote sensing class my second semester of senior year that I just loved. When I graduated, [George Washington University professor (then at Washington University in St. Louis)] Robert Pless hired me to work on a program called Finder.

The goal of Finder was to say, if you have a picture and nothing else, how can you figure out where that picture was taken? My family knew about the work that I was doing, and [in 2013] my uncle shared an article in the St. Louis Post-Dispatch with me about a young murder victim from the 1980s whose case had run cold. [The St. Louis Police Department] never figured out who she was.

What they had was pictures from the burial in 1983. They were wanting to do an exhumation of her remains to do modern forensic analysis, figure out what part of the country she was from. But they had exhumed the remains underneath her headstone at the cemetery and it wasn’t her.

And they [dug up the wrong remains] two more times, at which point the medical examiner for St. Louis said, “You can’t keep digging until you have evidence of where the remains actually are.” My uncle sends this to me, and he’s like, “Hey, could you figure out where this picture was taken?”

And so we actually ended up consulting for the St. Louis Police Department to take this tool we were building for geolocalization to see if we could find the location of this lost grave. We submitted a report to the medical examiner for St. Louis that said, “Here is where we believe the remains are.”

And we were right. We were able to exhume her remains. They were able to do modern forensic analysis and figure out she was from the Southeast. We’ve still not figured out her identity, but we have a lot better genetic information at this point.

For me, that moment was like, “This is what I want to do with my life. I want to use computer vision to do some good.” That was a tipping point for me.

So how does your algorithm work? Can you walk me through how a user-uploaded photo becomes usable data for law enforcement?

Stylianou: There are two really key pieces when we think about AI systems today. One is the data, and one is the model you’re using to operate. For us, both of those are equally important.

First is the data. We’re really lucky that there’s tons of imagery of hotels on the Internet, and so we’re able to scrape publicly available data in large volume. We have millions of these images that are available online. The problem with a lot of those images, though, is that they’re like advertising images. They’re perfect images of the nicest hotel in the room—they’re really clean, and that isn’t what the victim images look like.

A victim image is often a selfie that the victim has taken themselves. They’re in a messy room. The lighting is imperfect. This is a problem for machine learning algorithms. We call it the domain gap. When there is a gap between the data that you trained your model on and the data that you’re running through at inference time, your model won’t perform very well.

This idea to build the TraffickCam mobile application was in large part to supplement that Internet data with data that actually looks more like the victim imagery. We built this app so that people, when they travel, can submit pictures of their hotel rooms specifically for this purpose. Those pictures, combined with the pictures that we have off the Internet, are what we use to train our model.

Then what?

Stylianou: Once we have a big pile of data, we train neural networks to learn to embed it. If you take an image and run it through your neural network, what comes out on the other end isn’t explicitly a prediction of what hotel the image came from. Rather, it’s a numerical representation [of image features].

What we have is a neural network that takes in images and spits out vectors—small numerical representations of those images—where images that come from the same place hopefully have similar representations. That’s what we then use in this investigative platform that we have deployed at [NCMEC].

We have a search interface that uses that deep learning model, where an analyst can put in their image, run it through there, and they get back a set of results of what are the other images that are visually similar, and you can use that to then infer the location.

Identifying Hotel Rooms Using Computer Vision

Many of your papers mention that matching hotel room images can actually be more difficult than matching photos of other types of locations. Why is that, and how do you deal with those challenges?

Stylianou: There are a handful of things that are really unique about hotels compared to other domains. Two different hotels may actually look really similar—every Motel 6 in the country has been renovated so that it looks virtually identical. That’s a real challenge for these models that are trying to come up with different representations for different hotels.

On the flip side, two rooms in the same hotel may look really different. You have the penthouse suite and the entry-level room. Or a renovation has happened on one floor and not another. That’s really a challenge when two images should have the same representation.

Other parts of our queries are unique because usually there’s a very, very large part of the image that has to be erased first. We’re talking about child pornography images. That has to be erased before it ever gets submitted to our system.

We trained the first version by pasting in people-shaped blobs to try and get the network to ignore the erased portion. But [Temple University professor and close collaborator Richard Souvenir’s team] showed that if you actually use AI in-painting—you actually fill in that blob with a sort of natural-looking texture—you actually do a lot better on the search than if you leave the erased blob in there.

So when our analysts run their search, the first thing they do is they erase the image. The next thing that we do is that we actually then go and use an AI in-painting model to fill that back in.

Some of your work involved object recognition rather than image recognition. Why?

Stylianou: The [NCMEC] analysts that use our tool have shared with us that oftentimes, in the query, all they can see is one object in the background and they want to run a search on just that. But when these models that we train typically operate on the scale of the full image, that’s a problem.

And there are things in a hotel that are unique and things that aren’t. Like a white bed in a hotel is totally non-discriminative. Most hotels have a white bed. But a really unique piece of artwork on the wall, even if it’s small, might be really important to recognizing the location.

[NCMEC analysts] can sometimes only see one object, or know that one object is important. Just zooming in on it in the types of models that we’re already using doesn’t work well. How could we support that better? We’re doing things like training object-specific models. You can have a couch model and a lamp model and a carpet model.

How do you evaluate the success of the algorithm?

Stylianou: I have two versions of this answer. One is that there’s no real world dataset that we can use to measure this, so we create proxy datasets. We have our data that we’ve collected via the TraffickCam app. We take subsets of that and we put big blobs into them that we erase and we measure the fraction of the time that we correctly predict what hotel those are from.

So those images look as much like the victim images as we can make them look. That said, they still don’t necessarily look exactly like the victim images, right? That’s as good of a sort of quantitative metric as we can come up with.

And then we do a lot of work with the [NCMEC] to understand how the system is working for them. We get to hear about the instances where they’re able to use our tool successfully and not successfully. Honestly, some of the most useful feedback we get from them is them telling us, “I tried running the search and it didn’t work.”

Have positive hotel image matches actually been used to help trafficking victims?

Stylianou: I always struggle to talk about these things, in part because I have young kids. This is upsetting and I don’t want to take things that are the most horrific thing that will ever happen to somebody and tell it as our positive story.

With that said, there are cases we’re aware of. There’s one that I’ve heard from the analysts at NCMEC recently that really has reinvigorated for me why I do what I do.

There was a case of a live stream that was happening. And it was a young child who was being assaulted in a hotel. NCMEC got alerted that this was happening. The analysts who have been trained to use TraffickCam took a screenshot of that, plugged it into our system, got a result for which hotel it was, sent law enforcement, and were able to rescue the child.

I feel very, very lucky that I work on something that has real world impact, that we are able to make a difference.

From Your Site Articles

The Best Budget Running Watch Is 20% Off for Black Friday

More North Sea exploration to be allowed in new Labour plan

MIT scientists debut a generative AI model that could create molecules addressing hard-to-treat diseases

We may need a fourth law of thermodynamics for living systems