URLs as sources

I am wondering just how much straight-up technical knowledge would one require to be able to properly analyze the evidentiary value of information contained only in a URL? MJN recently publicized such a case on the APG list and there was a discussion of how to cite such a thing. I'm wondering how (or whether) a non-techie could analyze and evaluate it.


Submitted bymjneillon Mon, 03/26/2012 - 22:32

I would think that it would really depend upon the complexity of the URL and the sophistication of the user. The example I gave was pretty straightforward in how the information was obtained from the URL itself. I hadn't really thought about more complicated examples where "hidden" information in the URL might only be discernable to someone who had a fairly good knowledge of programming, etc. 

In my case, the URL was this:


and I thought it pretty obvious from the URL that the middle name of this person was "Perry." The problem with this is that I really don't know who submitted the information for the website, who typed it, etc. but that's typical of information contained in a lot of records that we use. This may be a case where the only place I get the middle name is the URL. I'll have to think about other instances where the URL may be containing "hidden" information. 



Submitted bydhubbardon Mon, 03/26/2012 - 22:35

This is a bit of thinking out loud but perhaps there is a useful thought in it. I wonder if more cases aren't needed to have a general answer. There are certainly things that we won't imagine that come up.

The more I think about this the more I think there are really two questions—Can we even say with confidence what the information in the url is supposed to represent? Can we judge the likely accuracy of that information?

In this particular case (an apparent middle name contained in a url but not on the actual web page) the technical knowledge that comes to mind is that it is possible to automatically generate urls from a database. That would be enough to trigger contacting the funeral home to ask if the apparent middle name came out of their data. Depending on the quality of the answer, that might mean that the evidence from the funeral home is what would be used in building a case and the url as evidence is no longer really needed—it would not be independent evidence assuming the information can be had from the funeral home's data. Even if only a general statement that "the url is created from our record of the name of the deceased," that would certainly help the case.

Another technical thought is the experimental approach. When I read about it, I went to the site and checked other urls. Everywhere that I looked that had a middle name stated in the text of the page, also had that name in the url for the page. I also couldn't find anyone with anything inexplicable inserted between their first and last names. Those checks would seem to strengthen the case that at least the creator of the url has and uses that information in a predictable manner and that when something is in that place in the url, it ought to be a middle name.

If the url was just a dead link to a dead website, then aside from looking at the domain to determine if it was for a funeral home or a defunct repository for unverified family trees, I'm not sure what could be done to gauge the value of the information.


Submitted byldegraziaon Tue, 03/27/2012 - 06:56

Interesting subject, Harold!

As Michael said, the example he used is one in which the clue fairly jumps out at us—even at those of us without technical knowledge of URLs. If we study other examples (as Dan did) and discover a pattern we can infer meaning, but we still don’t have enough information to determine the informant’s identity or level of knowledge, or the extent of processing leading up to our seeing the online obituary and URL. But isn’t that the case with many sources? Sometimes we can’t determine exactly who provided information. Sometimes the meanings of written marks and notations are not known.  

In my view, a clue gleaned from a URL is really no different from a clue picked up from any other source. I consider the clue in its context, recognize the “unknown” aspects of that clue, look for additional relevant information, and analyze and correlate the findings at each step of the process.


Submitted bydhubbardon Wed, 03/28/2012 - 13:48

Laura made something explicit that I didn't, so I'll update my two questions to three-

  1. Can we even say with confidence what the information in the url is supposed to represent?
  2. Can we dig deeper to the underlying source or are we limited to the url?
  3. If limited to the url, can we judge the likely accuracy of that information?

If we can dig deeper (e.g. contact the funeral home) and move on to analyzing the accuracy of that information, then what we would do follows normal evidence analysis.

If truly limited to the url with no way to get to any underlying source and with no independent source for the same information, then I can't think of much that can be done other than getting a very crude impression from the domain in the url. If anyone has any other thought in this really limited situation, it would be interesting.


Submitted bymhaiton Fri, 03/30/2012 - 09:54

As Elizabeth noted on the mailing list, URLs can provide evidence just as any other source (even bathroom wallpaper!) can provide evidence.

But I would add another question or two to ask ourselves in addition to Dan's, as to how we weigh and evaluate the evidence that a URL can and might provide. And I am of course limited to the very explicit example Michael used, as I have no technical knowledge whatsoever. ;)

- Would we consider the URL an original or derivative source, or merely a finding aid? As Dan mentioned, do we use the URL as evidence in its own right, or use the domain name to identify the creator of the website, and retrieve the source records from that institution?

- How would we evaluate the evidence provided by a URL? We have a date of access, but the date of creation is the more important aspect in analysis. Can we determine a date of creation? Can we identify an informant?

I'm not certain I have any specific answers-at least right now. One analogy as far as the informant goes might be a death certificate where "hospital records" is listed as the informant. We might have an idea who the "original" informant was, but we really don't know for certain. 

The URL itself contains information not in the obituay/death notice in this case. URLs are finding aids in the sense that they get one to the webpage, but I'd be hesitant to consider the URL just a finding aid as it contains information itself. I guess an analogy here might be a marriage index where the indexer (being familiar with the families) added notations in the index not in the actual record itself. 

Is the URL original or derivative? Hmmm. That's a good question. 

Even if we don't arrive at specific answers, the discussion has got me thinking and that's never bad...


Submitted bysevanson Sat, 03/31/2012 - 14:53

Like Michael Hait I have no technical knowledge about URLs. It would seem that the ways we evaluate the information provided by a URL would vary depending on the information it offers and on how the information relates to our research question. But in the end, would we not evaluate *as best we can* the URLs form, its information, and its applicability as evidence? And then tease out the separate bits of information it offers and comment on or question how each piece impacts the research question?

Submitted byRondinaon Fri, 04/13/2012 - 19:32


Michael (Hait) asked:

- Would we consider the URL an original or derivative source, or merely a finding aid? As Dan mentioned, do we use the URL as evidence in its own right, or use the domain name to identify the creator of the website, and retrieve the source records from that institution?

- How would we evaluate the evidence provided by a URL? We have a date of access, but the date of creation is the more important aspect in analysis. Can we determine a date of creation? Can we identify an informant?

In response:

It seems to me that a URL might be equated with a repository. We don't consider if a repository is an original or derivative source, but if it contains original or derivative sources. The evaluation of the records that the URL takes us to goes through the normal process. 

This reminds me of how I approach a URL when nothing else is provided. Dissecting it by removing each element after a slash to see what is there. Only when I have looked at each web page behind the original URL can I make some determination of the validity of the information found in the source the URL took me to.



Submitted bymhaiton Tue, 04/17/2012 - 00:18

In reply to by Rondina

Under normal circumstances, when the URL takes you to the source, I would agree with you.

In this example, however, the URL itself contains evidence that is not contained on the page that the URL points to. In this example the URL/webpage is not the repository address, it is the source itself.

The URL (quoted here from above) is "http://www.lakelandfuneralhome.com/obituaries/Creagen-Perry-Neill2792018738/." The relevant part of the URL is the last "Creagen-Perry-Neill2792018738." The page itself refers only to "Creagen P. Neill." The URL provides a full middle name. This makes the URL a form of a statement of information in and of itself--an original source.

When we assess evidence two factors are most important: who provided the information, and when was the record created? Neither of these can be answered certainly from information within the URL or on the corresponding webpage, at least to my knowledge. I don't know what those numbers after Neill mean.


Submitted byBethon Fri, 09/28/2012 - 22:24

The format of a URL cannot be reliably be depended on to mean anything. There are too many technical ways to construct a URL and the way it is done says more about how the website is structured behind the scenes than anything.

For example on this website, if you go to Sample Book Pages, the URL is https://www.evidenceexplained.com/content/sample-text-pages. This could mean that Elizabeth has a folder called content where she places the HTML page called sample-text-pages.

On the other hand, there are "vanity URLs" where the actual URL is something like www.evidenceexplained.com/gobbledygook/nonsense, but a search-engine-friendly label is also applied so that it looks sensible to the reader. This is often the case when the page you see  is served up from a database at the moment you request it, and does not physically exist on the server at all times.

In your obituary case, I suspect the latter scenario. The funeral home has a database of records which is serving up pages. To keep their records straight, each new "customer " gets a record number assigned, which always consists of "deceased name-plus-randomly generated unique number". You would need a combination, because there could be multiple people of the same name. I suspect that the funeral home is being very careful to create record numbers from firstname-middlename-lastname-plus-random number, as there could be nothing worse for customer service than mixing up the recently deceased.

If this is the case the text of the obituary is a separate matter altogether from the record identifier. Whoever wrote the obit merely decided to use the middle initial only. That decision probably was editorial and nothing whatever to do with the ID system which is generating URLs.

All this being said, the middle name of the record ID mostly likely is accurate and does come from the same informant who gives the information for the death certificate and pays the bill. So yes, definitely a clue but I would venture to say that getting the death cert or funeral home record is going to establish your source for citation.