Citing my work creating graphs

I'm creating for myself, and others, network graphs of DNA relationships. I want to include information as to when I downloaded and when I created the graph(s). Those dates are likely close but not always. There are two types: the first is the graph of all the matches, the second is a subgraph that focused on some aspect of the All graph (a particular cluster number or a surname). The graphs may use one or multiple testers as the focal point. The process creates both a PNG graph and an associated CSV file.

So here goes my thoughts:

David Grawrock, CG, "Sherrill Surname All Clusters," network graph, PNG and CSV formats, created 1 January 2024; DNA matches, Ancestry (https://www.ancestry.com : downloaded 1 December 2023), testers, Sherrill Surname, Wiliam Surname. Network graph creation from all matches associated with tester[s].

David Grawrock, CG, “Sherrill Surname Sub Cluster 5,” network graph, PNG and CSV formats, created 5 January 2024; DNA matches, Ancestry (https://www.ancestry.com : downloaded 1 December 2023), tester[s] Sherrill Surname, William Surname; sub cluster 5 of, “Sherrill Surname All Clusters,” 1 January 2024. Network graph creation is for the subset of matches associated with cluster 5 in “Sherrill Surname All Clusters.”

I added the description sentences as I'm not sure how many people understand the basics of the network graphs.

Submitted byEEon Tue, 10/22/2024 - 09:48

Cryptoref, as a starting point, I assume you've seent EE4 §5.11: Genetic Databases.

Three questions came to mind when I read your draft citation: 

  • When I go to the website in your Layer 2, what do I look for? Those who are not into DNA studies (or users of your work in future generations, after things evolve beyond what's standard today), will wonder.
  • What's the relationship between the information in your Layer 1 and the website you cite in your Layer 2?  Bridge words to explain this would be helpful.
  • How and where do I access that graph? If you don't say, then you leave readers with the impression: I'm waving a magic wand here, so take my word for it; you can't see this or question it.

Expanding upon the pattern at 5.11, you might do this:

David Grawrock, CG, “Sherrill Surname Sub Cluster 5,” network graph, PNG and CSV formats, created 5 January 2024; created from "Name of DNA Tool," on-request report, Ancestry (https://www.ancestry.com : downloaded 1 December 2023), tester[s] Sherrill Surname, William Surname; sub cluster 5 of, “Sherrill Surname All Clusters,” 1 January 2024; report available at [either your personal contact info or an online site where you have posted it]. Network graph creation is for the subset of matches associated with cluster 5 in “Sherrill Surname All Clusters.”

Submitted bycryptorefon Tue, 10/22/2024 - 18:15

Yep, I wasn't clear enough in the citation. Layer 2 is now the magic wand of how the graph is generated. Layer 3 is where the DNA data comes from. Note layer 3 could have multiple companies. Layer 4 is the reference to the previous report in which this graph is a subset from. Layer 5 is where to find the graph.

David Grawrock, CG, “Sherrill Surname Sub Cluster 5,” network graph, PNG and CSV formats, created 5 January 2024; cluster identified by Gephi, statistics > community detection > modularity, matches restricted to cM range 90-400; DNA data from, Ancestry (https://www.ancestry.com: downloaded 1 December 2023 [using DNAGedCom]), tester[s] Sherrill Surname, William Surname; sub cluster 5 of, “Sherrill Surname All Clusters,” 1 January 2024; report available at Grawrock Archives, [ADDRESS FOR PRIVATE USE], Ivins, Ut. 

While the process seems to be a bit much to put into the citation, with AI and other tools around, I think this is mandatory. 

Submitted byEEon Wed, 10/23/2024 - 09:13

David, this version is definitely more informative. Yes, some might say "overkill." But as genealogists, we know we have to explain how we arrived at our conclusions. It seems to me that the same applies when we use tools and platforms that will produce different results when different options are used. Evidence Style citations would definitely hold that the citation (i.e., the description) of your graph does need to briefly explain tools and methods.

Another approach that might make things easier to grasp for newer researchers might be to put the basic citation (Author, "Title," date; where report/graph can be accessed) in one sentence, then add the technical details of what the graph represents in a separate sentence (or more, as needed).

David Grawrock, CG, “Sherrill Surname Sub Cluster 5,” network graph, PNG and CSV formats, created 5 January 2024; available at Grawrock Archives, [ADDRESS FOR PRIVATE USE], Ivins, Utah. 

The cluster was identified using Gephi platform > statistics > community detection > modularity, with matches restricted to cM range 90-400 DNA data from, Ancestry (https://www.ancestry.com: downloaded 1 December 2023); parsed by DNAGedCom.com > Tool Name?, for all tester[s] Sherrill Surname, William Surname; sub cluster 5 of, “Sherrill Surname All Clusters,” 1 January 2024.

Now that I've parsed through it myself, in this fashion, I'm left with a couple of uncertainties:

  1. In the "tester[s]" phrase, you say "Sherill Surname, WIlliam Surname."  Given that "William" is usually a given name, with Williams being the surname, the reader will wonder whether you do mean Surname in both instances or whether there should be an "s" on "William." Is there another way to rephrase this? (Possibly:  "testers[s] with surnames Sherrill and William"—or does that change the meaning of what you intend?)
  2. It's not clear what your last statement, rendered above in green, applies to. Considering the sequence of the elements, are you also saying that "sub cluster 5 of 'Sherrill Surname All Clusters' " was a DNAGedCom result?

 

Submitted bycryptorefon Wed, 10/23/2024 - 12:50

Ok, I like the simplicity of making it a "normal" citation and then adding the details in subsequent sentences. 

On the names, I was just doing a quick anonymization. I'll make that explicit in this next one.

The green refers to the fact that this cluster, is a subset of the All Clusters graph. That is after creating of the All graph, the analysis took me to review cluster 5 from that graph. The goal is to provide the breadcrumbs of how I decided to create this subset of the All graph. In practice there would be at least two graphs in the report the All graph and the sub graph. 

David Grawrock, CG, “Sherrill Lastname All Clusters,” network graph, PNG and CSV formats, created 5 January 2024; available at Grawrock Archives, [ADDRESS FOR PRIVATE USE], Ivins, Utah. Cluster identification used the Gephi program > statistics > community detection > modularity. DNA data from Ancestry (https://www.ancestry.com: downloaded 1 December 2023); downloaded used DNAGedCom program, for testers Sherrill Lastname, William Lastname; cM range 15-3600 cM.

David Grawrock, CG, “Sherrill Lastname Sub Cluster 5,” network graph, PNG and CSV formats, created 5 January 2024; available at Grawrock Archives, [ADDRESS FOR PRIVATE USE], Ivins, Utah. Cluster identification used the Gephi program > statistics > community detection > modularity, with matches restricted to cM range 90-400. DNA data from Ancestry (https://www.ancestry.com: downloaded 1 December 2023); downloaded used DNAGedCom program, for all testers Sherrill Lastname, William Lastname. Sub Cluster 5 identified from David Grawrock, "Sherrill Lastname All Clusters".

One way to make the sub cluster citation smaller would be to remove the download information and reference the All citation directly like so

David Grawrock, CG, “Sherrill Lastname Sub Cluster 5,” network graph, PNG and CSV formats, created 5 January 2024; available at Grawrock Archives, [ADDRESS FOR PRIVATE USE], Ivins, Utah. Cluster identification used the Gephi program > statistics > community detection > modularity, with matches restricted to cM range 90-400. Sub cluster 5 is a subset of, David Grawrock, "Sherrill Lastname All Clusters."