Names
Sep 7, 2023

What's in a Name, Part 3 - Analyzing Names

It turns out there's an awful lot in a given name! Age, gender, and race are all encoded in forenames, if you just know where to look.

What's in a Name, Part 3 - Analyzing Names
A rose by any other name would smell as sweet.
- Juliet, from Romeo and Juliet, by William Shakespeare

The introduction to this series explored the "name problem" and how HumanGraphics uses data and domain knowledge to crack it. The second post showed how HumanGraphics models names, and what each component is. This post will discuss how HumanGraphics decodes the latent demographics information encoded in every name.

Analyzing Names

Personal names are very much a product of their time and place. Despite being one of the most popular names in Ancient Greece, you'd be hard pressed to meet a "Dionysios" born in the US! Similarly, despite being the most popular name among newborn Maori boys in New Zealand in the 2010s, "Nikau" doesn't appear anywhere in the Social Security Administration's dataset of popular US birth names!

Like a fingerprint, a person's name is deeply characteristic of the person, while also ultimately the product of their parents and ancestors. Through careful study, it's possible to decode these metaphorical loops and whorls into a meaningful demographic profile of a person.

Decoding Age

Given names' popularities wax and wane with time. For many names, their popularity profile over their time is such that one can make a good educated guess of a person's name from their given name alone!

There aren't a lot of US-born Aidens over the age of 30 running around...

HumanGraphics integrates the Social Security Administration's datasets for baby name popularity over time. This allows its demographics engine to estimate the ages of people believed to be born in the US from their given names. HumanGraphics is adding and updating data sets regularly, so this capability should come online for new countries over time.

Decoding Gender

It's common knowledge that in most cultures, given names are gendered, or given primarily to individuals of one sex versus the other. Of course, not all names are gendered (e.g.., Alex, Jamie, Jesse), and gender conventions for a particular name are not the same everywhere. As a result, estimating gender based on a given name requires a global dataset of given names and their associated genderedness, and in the presence of country hints also requires this same information broken down by country.

Italian Andrea Bocelli and American Andrea Bowen

It is less common knowledge that in some cultures, family names can also be gendered! For example, in Ukraine, a man Ivan's son might be given the surname "Ivanov," whereas his daughter might be given the surname "Ivanovna." As with given names, estimating gender based on a family name requires a global dataset of family names and their associated genderedness. In the presence of country hints, the availability of per-country data is particularly important because different countries observe these naming traditions to different extents.

HumanGraphics combines official data from sources like the US Census with massive, proprietary datasets collected from social media with inferred gender and location to inform its gender analyses. This capability is available for all countries.

Decoding Race

Certainly, there is a cultural component to naming, and a racial components to cultures as well, if only due to geography. While modern science generally understands race as a social construct, it is in many situations (e.g., consumer behavior) a predictive characteristic, and therefore useful. Thus, HumanGraphics reports racial estimates for names using the US Census model for race, for wont of a better model.

HumanGraphics incorporates multiple datasets for modeling race, including the US Census. At this time, all of these datasets are US-centric. HumanGraphics is adding and updating data sets regularly, so this capability should come online for new countries over time.

Decoding Geography

HumanGraphics leverages its large social media datasets to provide informed estimates of location at the country level.

Conclusion

Hopefully this makes it clearer how HumanGraphics is able to leverage its reverse template-based statistical parser with its proprietary world-spanning datasets to provide the most accurate name-based demographics data available in the market today. Thanks for reading!

Andy Boothe

Andy Boothe

Andy Boothe is the owner of HumanGraphics