A Citation Assistant

A rather unusual question for this forum. I'm sure I've asked this before, but I cannot remember where.

I see a problem with EE being used as a "set of citation templates", rather than as a guide to evidence analysis and citation creation, and when combined with the software industry's pursuit of formulaic templates -- to the exclusion of layers, analytical notes, and "edge cases" -- then methodology is being lost at the expense of simplicity.

A tool that could really help would be an interactive citation assistant: one that grasped a particular situation through the combination of menus and question-answer sequences. For instance, selecting a main category, such as a book, and then being asked for the relevant details. However, rather than being a fixed set of questions for each category, it could vary the question according to the actual case in hand (I'm thinking of the recent image-in-a-book forum question as an example). This is quite a different approach from simply selecting a template from a huge list, and then filling in a form with details.

Creating such a piece of software would be relatively easy, and it could be either a standalone program or some component that other programs could include. The difficult bit would be the questions, and the sequencing between them. Depending on how comprehensive the questions were, such a tool could generate accurate citations for a much greater set of cases than the existing template systems.

Has this been considered before? Is it something that might be of interest? Although the software side of this is simple (more on that in a second) it would require a lot of input in handling the various cases.

A strong recommendation that I would make, if any other software people fancy having a go at this, is to create the questions first. For instance, a simple text file with labelled questions that direct control to further questions. If that data was easy to process then software programs could each put their own UI around it.

Tony

Submitted byEEon Tue, 05/17/2016 - 08:30

Tony, your concept is an excellent one. Yes, an interactive citation assistant is on our wish list as well. Thanks for raising the question here, where—optimistically—we'll have helpful feedback from other EE users.

Submitted byCrankyTodayon Tue, 05/17/2016 - 09:04

Computer codes to do this vary widely in complexity. At one end is the "dichotomous key"used to identify plants and other living things. An example of a page from such a key appears here: http://3ubio.weebly.com/uploads/1/6/8/1/16813674/7481061_orig.png

These are simple enough that they can actually be printed in books. The key factor in this sort of simplicity is figuring out an unvarying sequence of questions--questions that the reader should be able to answer, because there is no way to skip a question.

At the other end of the range of complexity are artificial intelligence programs that may be used to assist doctors in diagnosing obscure conditions. In these programs, doctors can skip a question because (1) the answer is unknown or vague; (2) the test to get the answer is expensive or potentially harmful. When the doctor skips a question, the program figures out another question that would be helpful in narrowing down the possibilities.

There is a tremendous difference in the cost of developing these two types of programs and the difference rests solely on the issue of whether one can figure out sequences of questions that are always answerable and always lead to one solution. If one can figure out such a sequence, then the good news is that one doesn't even need to write a program, one can write a handbook instead.

I think the AI direction would be overkill for this application. If the questions (which could be yes/no ones, or soliciting pieces of information) form a tree then every terminal node (the ends of those branches) must have amassed sufficient details that it could create a unique citation.

Tony

Submitted byCrankyTodayon Tue, 05/17/2016 - 09:33

Yes, ACProctor, that's correct. The "tree/terminal" tech-jargon that you've used is precisely equivalent to my statement that the simple approach is enabled by "whether one can figure out sequences of questions that are always answerable and always lead to one solution". This is not a matter of opinion, the choice between the two approaches depends on whether or not one can figure out the questions.

Submitted bygllovelaceon Tue, 05/17/2016 - 10:23

Tony wrote:  "... when combined with the software industry's pursuit of formulaic templates ... then methodology is being lost at the expense of simplicity."

Tony, while I would also like to have a tool like you describe, by proposing one, aren't you playing into the loss of of methodology you mention?  With complex citations, are you ever going to be able to design software which will meet our needs?  Or would it be better to have EE on our desks to help us figure out how we need to formulate our citations?

As a retired biologist, my first thought agreed with CrankyToday's...  a key akin to a biological key:

"Does such-and-such have this characteristic?  If so go here (a pointer to a section of the key).  If not, here (a pointer to another section of the key).  But, as Elizabeth has said repeatedly, citation is an art, not a science, and there isn't a "correct" methodology in crafting citations.  How are you going to rely on computer code to make the decisions necessary to craft a good citation?

 

Submitted byACProctoron Tue, 05/17/2016 - 11:37

Re: "How are you going to rely on computer code to make the decisions necessary to craft a good citation?"

I wouldn't, Greg! The code wouldn't be making decision all by itself. It would just guide the user through a sequence of questions corresponding to the their current source scenario. The scope of those questions could be as extensive or simplistic as you want, and the code would eventually put together a citation corresponding to its identification of that particular scenario.

I take your point about it being an art rather than a craft -- meaning that there will always be more scenarios than you can generate questions for -- but even a simplistic stab at this would be a lot more powerful than the template approach. Rather than having thousands of pre-prepared forms to cover book-with-one-author, book-with-multiple-authors, translated-book, reprinted-book, edited-book, blah, blah ... and all combinations thereof, I'd rather be asked the right questions having indicated that I want to cite information in a book.

The degree to which it covers all the main scenarios would be determined by the scope of those questions; adding refinements would simply require an updated data file -- the program itself would be entirely unchanged.

Tony

Submitted bygllovelaceon Tue, 05/17/2016 - 12:32

Tony, sounds like we agree on the basics here.  So we would need a combination of computer code and writer experience?  The program would handle basics, and the writer would need to tweak?  That's kinda what I'm doing now with the templates from "The Master Genealogist."  I continue to add custom source templates and tweak them however necessary to obtain what I need.

Submitted byyhoitinkon Tue, 05/17/2016 - 15:30

I think the tree idea would be one step up from the list of templates, but in essence is still the same thing. A tree where the nodes are defined by a series of closed questions is just a way to guide people to the correct template.  

I think the real challenge is in creating dynamic templates. I'm thinking more in terms of a template grammar. What are the core concepts that we see in many citations? What are the rules to combine the core concepts?

Core concepts could be:

  • citation
  • publication
  • a specific item within a source 
  • unpublished source 
  • source-of-the-source 

Grammar rules could be:

  • A citation for a source can be followed by a specific item by separating them with a comma
  • A source that is republished in another publication can be described using a multi-layer citation by separating them with a semicolon
  • A citation for an unpublished or published source can describe its source-of-the-source by separating it with a semicolon

Real-world examples are more complex than this, but I hope you get the idea.

I think you might be missing the point here Yvette. There is no set of templates. That guidance through the questions ensures that there is enough enough information amassed by the terminal node to generate an appropriate citation, and from that point of view it constsitutes a "dynamic template" with its own grammar.

Tony

Ah, I had interpreted your set of questions as questions to guide you, not to help build the citation. It seems like we're thinking in the same direction after all :-) I don't code anymore, but if you ever need a tester who knows how to write a bug report, I'm game!

Submitted byACProctoron Wed, 05/18/2016 - 10:16

Yes Yvette, once the question sequences have reached a terminal node then the system will have a set of details (determined by the question author) that collectively describe the current citation scenario. That allows a number of thing to happen: (a) it can generate a formatted citation from those details (for each of source-list, first-ref-note, subsequent-ref-note), and (b) it can generate the "citation elements" that FHISO are currently working on.

Tony

Submitted byrworthingtonon Wed, 05/18/2016 - 20:43

Tony,

Great conversation. Thank you.

The biggest problem that I see, is the step just before the Questions start, at least in the online world. That is, the information that we see from the provider of the information we are using. For example, the Source information of a record that has an image, should not be labled as a "online database". IF there is an image, along with a transcription (for indexing purposes) should contain "digital image" in the description.

With the cooperation between our online repositories, I am seeing more and more record on one website, while the record is really on another.

I look at the Inside Cover of Evidence Explained, 3rd Edition first, then the first 2 chapters of that book. Once I understand that part of the book then I try to find the right section of the book for the information that I need for the Citation. That is when my problem occurs.

I have to then fight my genealogy database management program. (I have posted questions here on that issue). Because I have that fight on my hand, the examples in Evidence Explained tells me what pieced of information needs to be included in the Citation. But, that is a bad term. The program I use ONLY has a Reference Note for output. But, most cases, I can come close, not exact.

Back to the first problem, is that the description of the Record or Record group should help GUIDE me to the right template (No bypassing the Inside Cover and first two chapters).

I had done a brief study of my Sources and realize that of the 100+ Template options I have been provided, I only have used 5 or 6 of the templates, but am able to come close to the Reference Note format in Evidence Explained.

My approach is simple, can't deal with much more, 1) What am I looking at, and 2) Where did I get it from. But with our online world, and this new era of cooperation between vendors, I have to ask a 3rd question. Where did THEY get it from?

Some how, we the users of this online data, have to find a way to get our online repositories to give us clear and accurate information in the description of what we are looking at. IF I had that, I could answer the qustions that your proposal would ask.

Great discussion. Thank you.

 

Russ

Submitted byACProctoron Thu, 05/19/2016 - 02:45

I agree that determining the origin and provenance of the source information may not be possible Russ. None of the online providers are perfect as regards indicating the source of their source, and some don't actually seem to know. If the user at least indicates the what, where and when then it is better than nothing.

However, if I were using a "citation assistant", as described here, then I would directly use the formatted citation generated by it, and would not be allowing any genealogy database program (which I don't use anyway) to override it, or modify it. I might make a small edit to it, if necessary --  we've already heard that it could not be 100% guaranteed because citations are an "art" -- but it would effectively be my final citation format.

Regarding images, there is a technical detail that I've raised before. Modern databases can incorporate images, and they nearly all have have data-types such as image/blob/etc for holding them. Whether a particular database takes advantage of this functionality, or merely constitutes an index to images held externally to it, is something that's deliberately opaque to us. Our convention is for "database" to mean the indexed textual extracts from the source, "image" to mean images of the source (either browsed or found via an index), and "database with images" to mean the combination (as with many census databases). In reality, though, those images could be held in the database -- it's just unlikely for an online database because it's easier to display an image in a web page if you have a specific filename for it (e.g. http:// ... /xyzzy.jpg).

Tony

Submitted byACProctoron Sat, 05/21/2016 - 04:22

I was recently asked, off-list, why an interactive session is more powerful than a form-fill approach. Apologies for mentioning the dreaded "mathematics" word, but bear with me.

If an interactive session asked just 7 yes/no questions then it could address 128 different different scenarios, and each new question could multiple that by another factor of 2. Comparing that with a more static approach then you either need 128 different source-type forms to fill in, or each one would involve a huge number of optional fields -- the choice really depends on whether they fields are inclusive or exclusive of ech other.

Regarding ergonomics, those interactive sessions could use simple questions to solicit yes/no responses, or to request discrete data values. But if there are more than two options then a menu list can be used in place of multiple yes/no questions. Also, if there are multiple inclusive data values then a smaller form can be used to solicit them rather than sequential questions.

The essential element in all cases is that the session is interactive, and the system can determine more about your specific requirement than the static approach can.

Tony