It turns out there's an awful lot in a given name! Age, gender, and race are all encoded in forenames, if you just know where to look.
A rose by any other name would smell as sweet.
- Juliet, from Romeo and Juliet, by William Shakespeare
The introduction to this series explored the "name problem" and how HumanGraphics uses data and domain knowledge to crack it. The second post showed how HumanGraphics models names, and what each component is. This post will discuss how HumanGraphics decodes the latent demographics information encoded in every name.
Personal names are very much a product of their time and place. Despite being one of the most popular names in Ancient Greece, you'd be hard pressed to meet a "Dionysios" born in the US! Similarly, despite being the most popular name among newborn Maori boys in New Zealand in the 2010s, "Nikau" doesn't appear anywhere in the Social Security Administration's dataset of popular US birth names!
Like a fingerprint, a person's name is deeply characteristic of the person, while also ultimately the product of their parents and ancestors. Through careful study, it's possible to decode these metaphorical loops and whorls into a meaningful demographic profile of a person.
Given names' popularities wax and wane with time. For many names, their popularity profile over their time is such that one can make a good educated guess of a person's name from their given name alone!
HumanGraphics integrates the Social Security Administration's datasets for baby name popularity over time. This allows its demographics engine to estimate the ages of people believed to be born in the US from their given names. HumanGraphics is adding and updating data sets regularly, so this capability should come online for new countries over time.
It's common knowledge that in most cultures, given names are gendered, or given primarily to individuals of one sex versus the other. Of course, not all names are gendered (e.g.., Alex, Jamie, Jesse), and gender conventions for a particular name are not the same everywhere. As a result, estimating gender based on a given name requires a global dataset of given names and their associated genderedness, and in the presence of country hints also requires this same information broken down by country.
It is less common knowledge that in some cultures, family names can also be gendered! For example, in Ukraine, a man Ivan's son might be given the surname "Ivanov," whereas his daughter might be given the surname "Ivanovna." As with given names, estimating gender based on a family name requires a global dataset of family names and their associated genderedness. In the presence of country hints, the availability of per-country data is particularly important because different countries observe these naming traditions to different extents.
HumanGraphics combines official data from sources like the US Census with massive, proprietary datasets collected from social media with inferred gender and location to inform its gender analyses. This capability is available for all countries.
Certainly, there is a cultural component to naming, and a racial components to cultures as well, if only due to geography. While modern science generally understands race as a social construct, it is in many situations (e.g., consumer behavior) a predictive characteristic, and therefore useful. Thus, HumanGraphics reports racial estimates for names using the US Census model for race, for wont of a better model.
HumanGraphics incorporates multiple datasets for modeling race, including the US Census. At this time, all of these datasets are US-centric. HumanGraphics is adding and updating data sets regularly, so this capability should come online for new countries over time.
HumanGraphics leverages its large social media datasets to provide informed estimates of location at the country level.
Hopefully this makes it clearer how HumanGraphics is able to leverage its reverse template-based statistical parser with its proprietary world-spanning datasets to provide the most accurate name-based demographics data available in the market today. Thanks for reading!