Understanding Human Trafficking with data

Analysis of the CTDC global trafficking data to derive meaningful insights.

Vijayasri Iyer
6 min readApr 7, 2021
Photo credit : Unpsplash
Photo courtesy : Unsplash

Ever watched a movie on human trafficking and fervently wished you were able to do something to help out ? Well, I have. So, I’m making a small start by educating myself more about trafficking by diving deep into some open source datasets available on the internet. The idea is to look at the data, come up with relevant questions and then answer those questions.

To start with, let’s check out this dataset on Kaggle originally derived from the CTDC global human trafficking data. It contains 48.8k records of trafficking incidents across the globe. While going through this data I came up with some questions of my own based on the fields that were provided.

What is the primary demographic of victims being trafficked ?
How are these victims being trafficked ?
Who is the enabler for these trafficking events ?
What happens to the victims once they are trafficked ?
How are the victims forced to stay in the trade ?

Now that the questions have been identified, let’s begin exploring the data. The dataset, has 63 features for each datapoint. Below is an image of all the features listed using the python pandas library.

Features present in the dataset

There are plenty of features to describe the demographic of the victims, the country of exploitation, means of control and the type of exploitation they are subjected to, once trafficked.

Q1 : What is the primary demographic of the victims ?

Referring to the above columns, we can see that some of the relevant features to answer this question would be the ‘gender’, ‘ageBroad’, ‘citizenship’, ‘majority status’, ‘majorityStatusAtExploit’, ‘majorityEntry’ and the ‘yearofRegistration’. I began my analysis my exploring each of these features and visualizing the results.

Gender distributions of the victims

Here, we can see that 72.81% of the victims identify as female. Now, plotting the age group distribution of the victims gives us the following pie chart.

Distribution of age groups of the victims

Combining the gender attribute with the age group and majority status columns we get a sense of the current majority status of the victims. We can see that of all the responses in the age group columns 83% of the victims are between the ages of 9 and 38.

Majority Status of the Victims

From the sunburst plot, we can also notice that most of the victims in each gender group identify as “adults”.

Taking this further, we also analyze the citizenship status of the victims.

Citizenship status of victims visualization

Note that 18.72% of the victims had unknown citizenship status. Below is a summary of the percentage distribution of citizenship of victims in the top 10 countries according to the dataset.

Citizenship Status of the victims

Demographic Profile

1. Around 72% of the victims were female.
2. The number of female victims across all age groups seems to be higher than the number of male victims.
3. The two most common age groups of the victims were 9–17 (23.72%) and 30–38 (19.46%). Also, 83% of the victims are between the ages of 9 and 38.
4. More number of victims identified as ‘adults’ in their respective gender categories.
5. More than 50% of the victims in the data have citizenship in the Phillipines, Ukraine and Republic of Moldova and the United States followed by Cambodia and Indonesia. 18% have unknown citizenship status.

Q2 : How are the victims being trafficked ?

To answer this question, we can have a look at the columns ‘isAbduction’. First let us identify the number of missing values in these columns.

Value counts for ‘isAbduction’

Assuming 0.0 to be a ‘No’ and 1.0 to be a ‘Yes’, 99% of the responders (aka approx 33% out of the 48.8k victims) say that they were not abducted. Although not conclusive this is interesting !

Q3 : How are the victims being trafficked ?

In this section, we are trying to identify the enablers or “recruiters” for these trafficking victims. These are usually the people who identify and recruit prospects for trafficking. For this, we turn our attention to the recruiter relationship columns. Let us again identify the percentage of missing values in these columns.

Recruiter Relationship

Visualizing the ‘recruiterRelationship’ column as well as the yes/no columns recording answers the relationship of the recruiter separately, we can easily see that there is a strong indication of the recruiter being someone “unknown” to the victims.

Q4 : What happens to the victims once they are trafficked ?

In order to answer this question, we will first analysis the ‘countryofExploitation’ column followed by the columns indicating the type of labour and other practices the victims are forced into doing, once they are in the trade.

Country of Exploitation

The number of victims with unknown country of exploitation is 22.58%.

Top 10 countries of exploitation

More than 50% of the responders were exploited in one of the top five countries of exploitation mentioned above. Now, let us visualize the types of exploitation and labour.

Type of Exploitation

The column describing the type of exploit shows that a large number female respondents indicate that they have been exploited sexually.

Type of Labour

The columns indicating the type of labour shows that female victims are made to do domestic/household work whereas males are pushed into construction.

Q4 : How are the victims forced to take part/stay in the trade ?

Here, we look at the columns indicating the various means of control.

Means of Control

The highest number of female respondents mention the means of control as “unspecified”. Other popular responses in the data are “psychological abuse”, “restricting movement” and “threats”. Since the means of control columns are sparsely populated it is difficult to make a definitive conclusion.

Bonus

Visualizing the number of cases registered each year gives us an interesting line plot.

Number of cases registered over the years

This shows a sharp spike in the number of registered cases between the years 2015–2017. This could mean that the number of registrations made in the year 2016 was the highest.

Next, combining the country of origin and exploit, we can find the countries seeing the most inflow and outflow of victims.

Flow of victims from country or origin to country of exploitation

Call To Action

In this dataset, we could estimate a victim demographic, possible country of origin and exploit along with the relationship of the enablers and means of control being used to force the victims. Although, the sample set is not big enough and the data is sparsely populated, these estimations would be helpful in making us more aware about the people that are at risk of being trafficked. In the future, I will push this pet project further by trying to find more data and combining it with the analysis done here.

All the code for my analysis, is available on my Github repo.

References

  1. https://www.kaggle.com/andrewmvd/global-human-trafficking/code

--

--

Vijayasri Iyer

Machine Learning Scientist @ Pi School. MTech in AI. Musician. Yoga Instructor. Learnaholic. I write about anything that makes me curious.