Index: openacs-4/packages/assessment/www/doc/data_collection.html
===================================================================
RCS file: /usr/local/cvsroot/openacs-4/packages/assessment/www/doc/data_collection.html,v
diff -u -r1.3 -r1.4
--- openacs-4/packages/assessment/www/doc/data_collection.html 29 Jul 2004 09:35:11 -0000 1.3
+++ openacs-4/packages/assessment/www/doc/data_collection.html 3 Aug 2004 23:53:03 -0000 1.4
@@ -3,191 +3,266 @@
- Data Collection
-
-
-Overview
-
-The schema for the entities that actually collect, store and retrieve
-Assesment data parallels the hierarchical structure of the Metadata Data Model. In the antecedent
-"complex survey" and "questionnaire" systems, this schema was simple
-two-level structure:
-
-
-
- - survey_responses which capture information about which
-survey was completed, by whom, when, etc
- - survey_question_responses which capture the actual user
-data in a "long skinny table" mechanism
-
-
-This suffices for one-shot surveys but doesn't support the fine
-granularity of user-action tracking, "save&resume" capabilities,
-and other requirements identified for the enhanced Assessment package.
-Consequently, we use a more extended hierarchy:
-
-
-
- - Assessment Session which captures information about
-which Assessment, which Subject, when, etc
- - Section Data which holds information about the status of
-each Section
- - Item Data which holds the actual data extracted from the
-Assessment's html forms; this is the "long skinny table"
-
-To support user modification of submitted data (of which
-"store&resume" is a special case), we base all these entities in
-the CR. In fact, we use both cr_items and cr_revisions in our schema,
-since for any given user's Assessment submission, there indeed is a
-"final" or "live" version. (In contrast, recall that for any Assessment
-itself, different authors may be using different versions of the
-Assessment. While this situation may be unusual, the fact that it must
-be supported means that the semantics of cr_items don't fit the
-Assessment itself. They do fit the semantics of a given user's
-Assessment "session" however.)
-We distinguish here between "subjects" which are users whose
-information is the primary source of the Assessment's responses, and
-"users" which are real OpenACS users who can log into the system.
-Subjects may be completing the Assessment themselves or may have
-completed some paper form that is being transcribed by staff people who
-are users. We thus account for both the "real" and one or more "proxy"
-respondents via this mechanism. To make live not too complicated we
-will assume that all subjects have a user_id in OpenACS.
-
-Note that we assume that there is only one "real"
-respondent. Only one student can take a test for a grade. Even if
-multiple clinical staff enter data about a patient, all those values
-still pertain to that single patient.
-
-Synopsis of Data-Collection Datamodel
-
-Here's the schema for this subsystem:
-
-
-
-
-
-
-
-Specific Entities
-This section addresses the attributes the most important entities
-have in the data-collection data model -- principally the various
-design issues and choices we've made. We omit here literal SQL snippets
-since that's what the web interface to CVS is for. ;-)
-
-
-
- - Assessment Sessions (as_sessions) are the top of the
-data-collection entity hierarchy. They provide the central definition
-of a given subject's performance of an Assessment. Attributes include:
+ Data Collection
+
+
+
Overview
+
+ The schema for the entities that actually collect, store and retrieve
+ Assesment data parallels the hierarchical structure of the Metadata Data Model. In the antecedent
+ "complex survey" and "questionnaire" systems, this schema was simple
+ two-level structure:
+
+
- - session_id
- - cr::name - Identifier, format "$session_id-$last_mod_datetime"
+ - survey_responses which capture information about which
+ survey was completed, by whom, when, etc
+ - survey_question_responses which capture the actual user
+ data in a "long skinny table" mechanism
+
+
+ This suffices for one-shot surveys but doesn't support the fine
+ granularity of user-action tracking, "save&resume" capabilities,
+ and other requirements identified for the enhanced Assessment package.
+ Consequently, we use a more extended hierarchy:
+
+
+
+ - Assessment Session which captures information about
+ which Assessment, which Subject, when, etc
+ - Section Data which holds information about the status of
+ each Section
+ - Item Data which holds the actual data extracted from the
+ Assessment's html forms; this is the "long skinny table"
+
+ To support user modification of submitted data (of which
+ "store&resume" is a special case), we base all these entities in
+ the CR. In fact, we use both cr_items and cr_revisions in our schema,
+ since for any given user's Assessment submission, there indeed is a
+ "final" or "live" version. In contrast, recall that for any Assessment
+ itself, different authors may be using different versions of the
+ Assessment. While this situation may be unusual, the fact that it must
+ be supported means that the semantics of cr_items don't fit the
+ Assessment itself. They do fit the semantics of a given user's
+ Assessment "session" however.
+ We distinguish here between "subjects" which are users whose
+ information is the primary source of the Assessment's responses, and
+ "users" which are real OpenACS users who can log into the system.
+ Subjects may be completing the Assessment themselves or may have
+ completed some paper form that is being transcribed by staff people who
+ are users. We thus account for both the "real" and one or more "proxy"
+ respondents via this mechanism. Note that subjects may or may not be
+ OpenACS users who can log into the system running Assessment. Thus subject_id
+ will be a foreign key to persons not users. If the responding
+ user is completing the assessment for herself, the staff_id will be identical
+ to the subject_id. But if the user completing the assessment is doing it by proxy
+ for the "real" subject, then the staff_id will be hers while the subject_id will belong
+ to the "real" subject.
+
+
+ We've simplified this subsection of Assessment considerably from earlier versions, and
+ here is how and why:
+
+
+ - Annotations: We previously had a separate table to capture any type of
+ ad hoc explanations/descriptions/etc that a user would need to attach to a given
+ data element (either an item or section). Instead, we will use the OpenACS General Comments
+ package, which is based on the CR and thus can support multiple comments attached to a given
+ revision of a data element. The integration between Assessment and GC thus will need to
+ be at the UI level, not the data model level. Using GC will support post-test "discussions" between
+ student and teacher, for example, about inidividual items, sections or sessions.
+ - Scoring-grading: This has been a rather controversial area because of the
+ wide range of needs for derived calculations/evaluations that different applications need
+ to perform on the raw submitted data. In many cases, no calculations are needed at all; only
+ frequency reports ("74% of responders chose this option") are needed. In other cases, a given
+ item response may itself have some measure of "correctness" ("Your answer was 35% right.") or
+ a section may be the relevant scope of scoring ("You got six of ten items correct -- 60%.). At the
+ other extreme, complex scoring algorithms may be defined to include multiple scales consisting
+ of arbitrary combinations of items among different sections or even consisting of arithmetic means
+ of already calculated scale scores.
+
Because of this variability as well as the recognition that Assessment should be primarily
+ a data collection package, we've decided to abstract all scoring-grading functions to one
+ or more additional packages. A grading package
+ (evaluation)
+ is under development now by part of our group, but no documentation is yet available about it.
+ How such client packages will interface with Assessment has not yet been worked out, but this
+ is a crucial issue to work through. Presumably something to do with service contracts.
+ Such a package will need to interact both with Assessment metadata (to define what items are to be
+ "scored" and how they are to be scored -- and with Assessment collected data (to do the actual
+ calculations and mappings-to-grades.
+ - Signatures: The purpose of this is to provide identification and nonreputability during
+ data submission. An assessment should optionally be configurable to require a pass-phrase from the user
+ at the individual item level, the section level, or the session level. This pass-phrase would be used
+ to generate a hash of the data that, along with the system-generated timestamp when the data return to
+ the server, would uniquely mark the data and prevent subsequent revisions. For most simple applications
+ of Assessment, all this is overkill. But for certification exams (for instance) or for clinical data or
+ financial applications, this kind of auditing is essential.
+
We previously used a separate table for this since probably most assessments won't use this (at least,
+ that is the opinion of most of the educational folks here). However, since we're generating separate
+ revisions of each of these collected data types, we decided it would be far simpler and more appropriate
+ to include the signed_data field directly in the as_item_data table. Note that for complex
+ applications, the need to "sign the entire form" or "sign the section" could be performed by concatenating
+ all the items contained by the section or assessment and storing that in a "signed_data" field in as_section_data
+ or as_sessions. However, this would presumably result in duplicate hashing of the data -- once for the
+ individual items and then collectively. Instead, we'll only "sign" the data at the atomic, as_item level, and
+ procedurally sign all as_item_data at once if the assessment author requires only a section-level or assessment-level
+ signature.
- - assessment_id (note that this is actually a revision_id)
- - subject_id - references a Subjects entity that we don't
-define in this package. Should reference the users table as there is no
-concept of storing persons in OpenACS in general (yet)
- - staff_id - references Users if someone is doing the
-Assessment as a proxy for the real subject
- - target_datetime - when the subject should do the Assessment
- - creation_datetime - when the subject initiated the
-Assessment
- - first_mod_datetime - when the subject first sent something
-back in
- - last_mod_datetime - the most recent submission
- - completed_datetime - when the final submission produced a
-complete Assessment
- - session_status - Status of the session (and therefore of the
-assessment with regards to the subject)
- - ip_address - IP Address of the entry
-
- - percent_score - Current percentage of the subject achieved so
-far
- - consent_timestamp - Time when the consent has been given.
-
-
-
- - Assessment Section Data (as_section_data) tracks the
-state of each Section in the Assessment. Attributes include:
+
Synopsis of Data-Collection Datamodel
+
+ Here's the schema for this subsystem:
+
+
+
+
+
+
+
+ Specific Entities
+ This section addresses the attributes the most important entities
+ have in the data-collection data model -- principally the various
+ design issues and choices we've made. We omit here literal SQL snippets
+ since that's what the web interface to CVS is for. ;-)
+
+
- - section_data_id
- - cr::name - Identifier, format "$session_id-$last_mod_datetime"
+ - Assessment Sessions (as_sessions) are the top of the
+ data-collection entity hierarchy. They provide the central definition
+ of a given subject's performance of an Assessment. Attributes include:
+
+ - session_id
+ - cr::name - Identifier, format "$session_id-$last_mod_datetime"
+
+ - assessment_id (note that this is actually a revision_id)
+ - subject_id - references a Subjects entity that we don't
+ define in this package. Should reference the parties table as there is no
+ concept of storing persons in OpenACS in general. Note: this cannot
+ reference users, since in many cases, subjects will not be able (or should not
+ be able) to log into the system. The users table requires email addresses.
+ Subjects in Assessment cannot be required to have email addresses. If they
+ can't be "persons" then Assessment will have to define an as_subjects
+ table for its own use.
+ - staff_id - references Users if someone is doing the
+ Assessment as a proxy for the real subject
+ - event_id - this is a foreign key to the "event" during which this assessment is
+ being performed -- eg "second term final" or "six-month follow-up visit" or "Q3 report".
+ - target_datetime - when the subject should do the Assessment
+ - creation_datetime - when the subject initiated the
+ Assessment
+ - first_mod_datetime - when the subject first sent something
+ back in
+ - last_mod_datetime - the most recent submission
+ - completed_datetime - when the final submission produced a
+ complete Assessment
+ - session_status - Status of the session (and therefore of the
+ assessment with regards to the subject)
+ - ip_address - IP Address of the entry
+
+ - percent_score - Current percentage of the subject achieved so
+ far
+ - consent_timestamp - Time when the consent has been given.
Note, this is a
+ denormalization introduced for the educational application. For clinical trials apps,
+ in contrast, a complete, separate "Enrollment" package will be necessary and would
+ capture consent information. Actually, it's not clear that even for education apps that
+ this belongs here, since a consent will happen only once for a given assessment while
+ the user may complete the assessment during multiple sessions (if save&resume is enabled
+ for instance). In fact, I've removed this from the graffle (SK).
+
+
+
- - session_id
- - section_id
- - section_status
+ - Assessment Section Data (as_section_data) tracks the
+ state of each Section in the Assessment. Attributes include:
+
+ - section_data_id
+ - cr::name - Identifier, format "$session_id-$last_mod_datetime"
+
+ - session_id
+ - section_id
+ - subject_id
+ - staff_id
+ - section_status
+
+
+
+ - Assessment Item Data (as_item_data) is the heart
+ of the data collection piece. This is the "long skinny table" where all
+ the primary data go -- everything other than "scale" data ie calculated
+ scoring results derived from these primary responses from subjects.
+ Attributes include:
+
+ - item_data_id
+ - session_id
+ - cr::name - identifier in the format "$item_id-$subject_id"
+ - event_id - this is a foreign key to the "event" during which this assessment is
+ being performed -- eg "second term final" or "six-month follow-up visit" or "Q3 report". Note:
+ adding this here is a denormalization justified by the fact that lots of queries will depend
+ on this key, and not joining against as_sessions will be a Very Good Thing since if a given
+ data submission occurs through multiple sessions (the save&resume situation).
+ - subject_id
+ - staff_id
+ - item_id
+ - Possible Extension (nope, this is a definite, IMHO -- SK): item_status - Status of the answer. This
+ might be
+ "unanswered, delayed, answered, final". This can be put together with is_unknown_p - defaults to "f" -
+ important to clearly
+ distinguish an Item value that is unanswered from a value that means
+ "We've looked for this answer and it doesn't exist" or "I don't know
+ the answer to this". Put another way, if none of the other "value"
+ attributes in this table have values, did the subject just decline to
+ answer it? Or is the "answer" actually this: "there is no answer". This
+ attribute toggles that clearly when set to "t".
+ - choice_id_answer - references as_item_choices
+ - boolean_answer
+ - numeric_answer
+ - integer_answer
+ - text_answer -- presumably can store both varchar and text datatypes -- or do we want to separate
+ these as we previously did?
+ - timestamp_answer
+ - content_answer - references cr_revisions
+ - signed_data - This field stores the signed entered data, see above and
+ below for explanations
+ - percent_score
+
+ - To do: figure out how attachment answers should be supported; the Attachment
+ package is still in need of considerable help. Can we rely on it here?
+
+
+
+ - Assessment Scales : As discussed above, this will for the time being be handled by
+ external grading-scoring-evaluation packages. Assessment will only work
+ with percentages internally. It might be necessary to add scales into
+ assessment as well, but we will think about this once the time arrives, but we think that
+ a more elegant (and appropriate, given the OpenACS toolkit design) approach will be to define
+ service contracts to interface these packages.
+
+
+ - Assessment Annotations provides a
+ flexible way to handle a variety of ways that we need to be able to
+ "mark up" an Assessment. Subjects may modify a response they've already
+ made and need to provide a reason for making that change. Teachers may
+ want to attach a reply to a student's answer to a specific Item or make
+ a global comment about the entire Assessment. This will be achieved by
+ using the General Comments System of OpenACS
-
-
- - Assessment Item Data (as_item_data) is the heart
-of the data collection piece. This is the "long skinny table" where all
-the primary data go -- everything other than "scale" data ie calculated
-scoring results derived from these primary responses from subjects.
-Attributes include:
- - item_data_id
- - session_id
- - cr::name - identifier in the format "$item_id-$subject_id"
- - subject_id
- - staff_id
- - item_id
- - choice_id_answer - references as_item_choices
- - boolean_answer
- - numeric_answer
- - integer_answer
- - text_answer
- - timestamp_answer
- - content_answer - references cr_revisions
- - signed_data - This field stores the signed entered data, see
-below for explanation
- - percent_score
+ - Signing of content allows
+ to verify that the data submitted is actually from the person it is
+ pretended to be from. This assumes an public key environment where the
+ public key is stored along with the user information (e.g. with the
+ users table) and the data stored in as_item_data is additionally stored
+ in a signed version (signed with the secret key of the user). To verify
+ if the data in as_item_data is actually verified the system only has to
+ check the signed_data with the public key and see if it matches the
+ data.
- - Possible Extension: item_status - Status of the answer. This
-might be
-"unanswered, delayed, answered, final". This can be put together with is_unknown_p - defaults to "f" -
-important to clearly
-distinguish an Item value that is unanswered from a value that means
-"We've looked for this answer and it doesn't exist" or "I don't know
-the answer to this". Put another way, if none of the other "value"
-attributes in this table have values, did the subject just decline to
-answer it? Or is the "answer" actually this: "there is no answer". This
-attribute toggles that clearly when set to "t".
-
-
- - Assessment Scales will for the time being be handled by
-an external grading / evaluation package. Assessment will only work
-with percentages internally. It might be necessary to add scales into
-assessment as well, but we will think about this once the time arrives.
-
-
- - Assessment Annotations provides a
-flexible way to handle a variety of ways that we need to be able to
-"mark up" an Assessment. Subjects may modify a response they've already
-made and need to provide a reason for making that change. Teachers may
-want to attach a reply to a student's answer to a specific Item or make
-a global comment about the entire Assessment. This will be achieved by
-using the General Comments System of OpenACS
-
-
- - Signing of content allows
-to verify that the data submitted is actually from the person it is
-pretended to be from. This assumes an public key environment where the
-public key is stored along with the user information (e.g. with the
-users table) and the data stored in as_item_data is additionally stored
-in a signed version (signed with the secret key of the user). To verify
-if the data in as_item_data is actually verified the system only has to
-check the signed_data with the public key and see if it matches the
-data.
-
-
-
-
+
+