The records and summaries were presented as paper documents, and the questions on a computer. The participants were not told that the summaries were automatically generated. Each session started with a ‘dummy’ practice question to allow the user to become familiar with the question interface. Questions were presented one at a time on the computer find more screen and consisted of two parts that were presented on consecutive screens: a free-text box in which they could write their answers, followed by a multiple choice set of answers from which they had to choose one. They were able to proceed to the next question
or question-part by clicking on a ‘Next’ button that appeared on the screen; they were told that it was important to perform this action immediately on answering the first part of each question as their responses were being timed, that they should select the same answer in the second (multiple-choice) part or, if it was not one of the given options, select “None of the above”; they were not allowed to return to the first part of any question to change their original answers. They could if they wished break between questions by clicking on an on-screen ‘Pause’ button. At the end of the experiment, we asked
the participating clinicians to complete a questionnaire aimed at capturing their general impressions of the utility of the generated summaries. When this was completed, we told them that the summaries were computer-generated by an AI-based natural Regorafenib solubility dmso language generation system whose input were facts presented in the hospital records. They all expressed surprise (and in some cases, bewilderment) that the summaries were not written by a human author. We report here our finding with regard to the effect of the generated summaries (compared to the collection of documents that comprise the hospital records) on the accuracy of the assessments that the clinicians made on the histories of the individual patients and the efficiency of the clinicians in making their assessments. The results show that clinicians are slightly better at answering the set
of key questions when using the automatically-generated record summaries than the (traditional) full records. They PIK3C2G provide the correct answers 80% of the time when using the summaries, and only 75% of the time when using the full records (see Table 3). However, this difference is not significant (see Table 4). In other words, the use of generated summaries did not degrade the clinicians’ performance, even though record summaries are an entirely unfamiliar tool to them. Interestingly, there was no effect of level of experience (i.e., doctors vs students). The results show that use of the summaries reduced significantly the time taken to respond to the set of questions for each patient. Overall, using the summaries allowed the clinicians to shave off just over 50% of the time taken to answer all the questions compared to using the records (see Table 5).