Long train of derivations

First I have my handy dandy citing genetic sources quick sheet. Here is my pretty graph

network graph

Here are the layers. I have downloaded, via DNAGedCom the ICW's from Ancestry. I've then uploaded those results (after editing the headers) into Gephi. Then using Gephi I've created a network graph of those ICW's. I've further filtered the graph to highlight the matches that are of interest for my case. A bit of a tangled mess.

The layers as I see them

  • Author: me as I've edited and selected options along the way so:
    • David Grawrock, editor, Net1.png, selected network graph of in common with (ICW) atDNA matches;
    • Editor seems a bit off as it implies book editing. Is creator better?
  • Original source: Ancestry along with their ICW calculations, so:
    • ICW sources, "Shared matches: David Grawrock," [collected by DNAGedCom October 2021, 16382 matches, 49683 ICW] Ancestry [files held by David Grawrock];
  • Network graph created by Gephi, so:
    • network graph, Gephi [software], options selected by David Grawrock.

I'm not sure about the third layer as I can't list out all of the options, nor since I used the random setting, can I even recreate it exactly. Plus as you let it run it continues to move nodes around so you would have to know exactly how many cycles you ran and I really have no clue.

Submitted byEEon Tue, 11/09/2021 - 17:01

Cryptoref, it seems to me that what you are trying to cite (with the exception of the original DNA data from AncestryDNA) is not a source or a series of sources, but a process and a set of tools you have used.

If this were my citation I'd focus it on the original AncestryDNA "report." Then I'd explain, free form, the process and the tools that led to thus-and-such conclusion.

Submitted bycryptorefon Tue, 11/09/2021 - 18:35

So the citation would only refer to the "blob" of records, and it's provenance, and the rest is text I add either in a separate sentence or up above in the text?

I understand that but it feels wrong. The analogy feels like I will cite a huge 10 volume set (which would look nice on one of your bookshelves) and not worry about the internal details (I know it's a bad analogy). There is this vat of stuff and I'm just going to chant "Double double, toil and trouble" and out pops my graph :)

There just seems to be the need for a bit of something in the citation between just the blob and the resulting graph.

Submitted byEEon Wed, 11/10/2021 - 08:36

Cryptoref, you write:

"There is this vat of stuff and I'm just going to chant 'Double double, toil and trouble' and out pops my graph.There just seems to be the need for a bit of something [more] in the citation."

Isn't that the case with all serious research? A simple citation to the source does in no way reflect all the work that went into finding that source.

In this case, your discussion would explain that work and the tools you used much more clearly than a formal citation to a tool. Those tools are the same for every person, but the results are different. You can identify the tool and the website; but the result is unique to you alone and no one can go to the website for that tool and verify your results.

In EE's view, the article you are writing to defend your conclusion will or should provide the details the reader needs to understand and evaluate your results. Or, it may be necessary to write a much-more detailed "technical paper" that explains your work.

Submitted bycryptorefon Wed, 11/10/2021 - 14:06

Said that way, I see the point. We normally don't see the sausage making that went into the kielbasa of our report. We imply it in many places. This is just another place where we say here is the result. We just need to do a good job of stating why we believe our process. 

Submitted byEEon Thu, 11/11/2021 - 10:18

Yes, and the explanation we make would then enable anyone, who has access to the original AncestryDNA report that we cite, to replicate the processes of our "experiment" with that data and, theoretically, get the same results.

Submitted byEEon Thu, 11/11/2021 - 11:39

Cryptoref, I also should have asked you another question.  In your proposed first layer, you cite a title as Net1.png.  Given that italics for a title mean that we are citing a standalone publication, did you mean to use italics?

Submitted byyhoitinkon Thu, 11/11/2021 - 15:51

We visualize data in different ways in our reports. This visualization of the DNA matches is just another way to present the evidence from the source, similar to a table that correlates data from three census records. You cite the census resources as the source, not the table.

Submitted byMichael Haiton Fri, 11/12/2021 - 12:37

As Elizabeth and Yvette have both stated, the graph is simply a visualization of sources. It does not provide any value as a source, so a citation to the image itself would not be appropriate for any textual assertion. Furthermore, its value even as a graph is limited by the sheer unwieldy volume of matches depicted. A far more informative graphic would be to localize the graph to the matches in question, so that the connections among them could actually be read. However, in this case, the graphic itself is simply a representation and would be referenced as (for example) "Figure 1. The In-Common-With Connections of Select DNA Matches," itself citing the underlying test results.

Beyond this, as a long-time student of Social Network Analysis and user of Gephi (I lectured on the subject of historical/ancestral SNA, demonstrating Gephi, at the 2016 APG Professional Management Conference), I would note that the image in question could have been produced by nearly any graphing tool. Gephi itself, on the other hand, is a powerful SNA tool that is capable of conducting complex mathematical functions related to the network structure itself. If you had been using any of these functions as part of a proof argument, those calculations could potentially have value and the tool calculating them could *possibly* be cited as a source---based on the complexity of the math involved and the fact that the software was doing them rather than you yourself. (It is, of course, possible for some of the more mathematics-oriented genealogists to read the underlying literature and perform those mathematical calculations manually.) Whether using a tool like Gephi or manually calculating the metrics oneself, however, the text would require a deep understanding of the meaning of the various metrics and their applicability to the genealogical argument being produced, in addition to the ability to express this in a comprehensible manner to an audience that may not understand any of it. In this way, SNA metrics would be similar to the use of other mathematical functions like measures of probability or, on a profoundly simpler level, calculating date of birth from a reported age. Ultimately, though, any of these mathematical calculations would be at best a form of analysis and/or correlation and insufficient to meet the Genealogical Proof Standard without meeting each criterion, esp. including conducting reasonably exhaustive research and performing other types of analysis and correlation well beyond just that represented by the mathematics.

(I apologize if I rambled a bit here, but it is a subject that I greatly enjoy discussing.)


Michael, that presentation about social network analysis of yours at the PMC in 2016 was one of the most mind-opening presentations I have ever seen. It's like you connected all sorts of vague ideas in my head and gave it a name. Sorry to digress, but I thought you should know how important that lecture was for me. I have since been learning Gephi and applying the concepts of social network analysis in my work.
I hope you will consider giving a presentation about that again, at a please where it can be re-re-rewatched, as it deserves to be.

I plan on doing much deeper analysis. Yes this one isn’t that exciting but was the one I had for the question on the citation. If I understand you, then the “deeper” you dig into the details, and the higher math you use, the closer you get to needing a citation. BUT the math remains still something that you don’t normally cite. 

I’d love to at least read your talk on the subject.


Submitted byEEon Fri, 11/12/2021 - 16:45

Thanks, Yvette and Michael, for weighing in. Your perspectives greatly enrich my own humanities-based approach to the question.