Last week I blogged about MyHeritage’s special free offer for assistance to adoptees who are interested in finding their birth families. I was also pleased that in the past few days, MyHeritage announced several other improvements – their DNA Matches are now 1-to-many instead of 1-to-1 enabling more connections and if you are a member of the LDS faith, a new synch with FamilySearch.org. These innovations are positive and important to the genealogy community.
Unfortunately, their latest “scientific” data analysis that was recently published in Science, is hogwash. You can read about it here and here. I have several problems with the study:
“The tree is based on data assembled by roughly 3 million genealogy enthusiasts who have identified the familial relationships of more than 86 million individuals” the key word here being enthusiasts. I’m enthusiastic about many things but that does not make my attempts at the arts, dance, cooking, etc. well done or accurate. Using inaccurate data does not provide an accurate result. If the results of professional genealogists were used I’d be more inclined to believe the findings.
The study’s authors claimed their data was accurate because they cleaned it. “The researchers found that on average there was a 2% error when listing a person’s father, and a 0.3% error for a mother. They also found that about 0.3% of profiles included clear mistakes such as a person having more than two parents, or someone being the parent and offspring of the same person.” Removing the obvious errors does not mean that the resulting information is correct. Did they check to see the validity of the remaining source citations? Actually, were there any source citations? Did they use DNA? No, they did not. After eliminating the obvious mistakes they took the remaining data an analysed it. That is a major mistake. Anyone can place any info online but that does not make it factual; I would think a computer scientist would be aware of that.
The study is clearly biased and Eurocentric. First of all, only users who have placed info on the website are included. The majority of the sites users would most likely be middle to upper class individuals from the U.S. who have access to a computer. Most of those individuals are not people of color and most would have European ancestry. So, duh, they’re going to see this result “By comparing people in the system with 80,000 death records from Vermont spanning from 1985 to 2000, the authors also found that the people included in their family tree were not any more likely to be rich or poor than the general population. They were, however, much more likely to be white.” Did they know that 96.7% of Vermont is white?1 Are they aware that people who inputted the information were probably middle class as Vermont’s sizable middle class population grew rapidly from 1990 to 2010?2 Making conclusions based on faulty data is irrational.
One of their “findings” was that social norms more than increased modes of travel led to Americans marrying unrelated individuals, ie. someone other than a cousin, after 1875. The time period they were exploring was 1825-1875. For 40 of those 50 years, slavery prohibited large numbers of people from using any form of transportation to go a ‘courtin. Native Americans were increasingly subjected to a life on a reservation. The Irish potato famine contributed to large numbers of very poor individuals scraping together the fare for passage and would settle down in the large cities, like Boston and New York, where the ship landed and they stayed until they could earn enough to relocate elsewhere. Only after becoming established in their new homeland did people have the opportunity to move from Chinatown, Little Italy and other ethnic neighborhoods that had provided support to the new immigrant. And let’s not forget the Civil War during this time period. Unfortunately, the authors excluded all of these important influences in their study. The social norms did change by 1875, thus allowing more movement and along with the increased modes of transportation, migrations farther from place of birth to marry did occur. Claiming analysis of their data cobbled from Geni to reach this conclusion is laughable.
I first read of the study on one of my genealogy list servs and then friends and family began to contact me about it. Here’s my analogy of the Geni database. Imagine asking every kindergartner in a private school in the U.S. what their favorite ice cream is. Now take all of their favorite flavors and extrapolate the findings to every other kindergartener – those in public, charter and home schools. Now take it further and apply it to every individual in every state. Without including other groups, you cannot draw a correlation between the private kindergarteners’ results and others. I would say it was simply silly but the scary part is that the study is being given press by legitimate media outlets on both coasts. If the headlines and the story explained that the most novel finding in the study was it is one of the first to explore free crowd sourced provided information I would be okay with it but that is not what the headlines state.
One outcome I am applauding is that I understand some folks are concerned that their data was used in a way they had not intended. INMHO, that is their own fault for not reading the fine print of the Terms of Service. This is the beginning of the use of large crowd sourced data. If you are uncomfortable with your information being used then it’s a wake up call for you to take the time to read the company’s rights without merely clicking the box to accept. Yes, it is boring and time consuming but important.
I am extremely disappointed in MyHeritage. I expected better from an organization that has been making such positive strides.
1 “% Vermont white” abcnews.go.com, accessed: 4 March 2018.
2 “%Vermont middle class” http://publicassets.org, accessed: 4 March 2018.