EvalWEB Instructional Design

Formative Evaluation and Revisions

Introduction

This project analyzed, designed, and developed an instructional unit to teach eighth grade students to critically evaluate the suitability of individual world wide web pages for middle school research use. The world wide web enables people to easily and inexpensively publish materials for a global audience. Since there is usually no editing stage in online publishing, material which is published on the internet has a higher probability of being inaccurate, incomplete, or misrepresentative of the subject. As access to the world wide web increases in the schools, and as students increase their use of it as a resource for research, they will need to evaluate the information they find to ensure that reliable sources are being used.

The instruction consists of an interactive world wide web site (which is located at http://www.hudson.edu/hms/comp/evalweb/). The web site shows students examples of pages which are unsuitable or of questionable value for research use, pointing out their inadequacies. The students are asked specific questions about other sample web sites, for which feedback is given. The instruction includes a pre-test and a post-test, both of which are taken online, using example web pages. The test results are submitted electronically to the teacher.

Methods

The main evaluative instrument for this instruction was a pilot test. The instructional materials were tried on four classes of eighth grade students, including 78 students. The data from students' pre-tests and post-tests, along with personal observations of students participating in the instruction were used evaluate the instruction. Since my course, which is nine weeks long, is taught to over 350 students annually, a pilot including 78 students provided an adequate cross section of the targeted learners.

In attempt to obtain more qualitative data, one student from each of the four classes was closely observed while the student participated in the instruction. The focus students were selected based on their prior success in the class. One student was chosen with an A average, two students with B averages, and one with a C average. Three of these students are male, and one is female. This provides data from a reasonable cross section of the classes. The students were not told that they were being individually observed to discourage them from acting unnaturally during the instruction and testing.

While observing, it was noted when the student begins each section of the instruction, any questions asked by the student, and student body language which may indicate understanding, frustration, or success. This data was then compared with the students' performance on the tests.

The unit was begun with a brief explanation of why the unit is important, and how it fits into the class as a whole. The pretest was then administered, giving the students one 42 minute period to complete it. The next day, students took the instruction, using the interactive web-based instructional tool. Again, they were given one forty-two minute period. The third day, students completed the post-test. Because they had great difficulty completing the pre-test in one period, the post-test was shortened to eliminate lack of time as a factor. Specifically, sections four and five were removed, meaning the students were assessed on their evaluations of two web sites instead of four. Since most of the students did not complete those sections of the pre-test, those sections were not graded on it. For both the pre-test and the post-test, students who have not finished the test within the first 35 minutes were asked to submit their incomplete tests. For these students, unanswered questions were not be used in the assessment, because time should not be a factor in assessing student performance. Following the instruction, students began a unit in which they have to do independent research online for a class presentation they will give.

Results and Discussion

The average score on the pre-test was 60%, with a standard deviation of 13%. The highest score was 84%, while the lowest was 32%. On questions measuring prerequisite skills, 71% were answered correctly. From the questions asked during the assessment, it is clear that some students did not know the meaning of the word "bias" though they did understand the idea of opinion. A few students appeared to have trouble identifying mistakes in spelling and grammar. Several students understood the concept of a web page's address, but did not know the meaning of "URL," and therefore could not identify the URL of a web page.

Most of the remaining questions were answered correctly 20-50% of the time on the pre-test. One question was answered correctly more often ("Does this page indicate when it was last updated?"). The questions about links to other sites were answered correctly less often, possibly indicating a confusion between links to other pages on the same site and links to other web sites.

On the post-test, the average score was 70%, with a standard deviation of 12%. The highest score was 93%, and the lowest was 47%. This represents an increase of about 10% for the students, with those at the lower end improving slightly more than those at the top. While these results do indicate an increase in student achievement, they are at least 10% below expectations.

The following table represents student achievement by objective cluster. Cluster A includes the assessment of a web page's domain and the identification of personal web pages. Cluster B includes the assessment of a web page's content. Cluster C teaches students to assess the author and his or her credibility. Cluster D involves the assessment of the revision date of a web page. Cluster E includes the assessment of a document's links. Cluster F includes the assessment of the web page as a whole, after the first five clusters have been completed. For more information, see the Instructional Design.

Cluster % of Answers Correct Increase
Pre-Test Post-Test (Decrease)
Cluster A 57% 82% 25%
Cluster B 54% 46% (8%)
Cluster C 69% 75% 6%
Cluster D 71% 67% (4%)
Cluster E 22% 42% 20%
Cluster F 46% 43% (3%)
Whole Test 60% 70% 10%

In the first cluster, students appear to have done reasonably well. The significant increase in the percent of questions answered correctly indicates that learning did indeed take place there. Cluster E included significant gains, but students still performed poorly. Scores on specific questions within that cluster indicate that students had trouble on both the pre- and post-tests correctly identifying links to other sites. When correctly identified, the ability to draw appropriate conclusions based on that information rose significantly between the pre-test (31% correct) and the post-test (67% correct).

Cluster C saw modest gains in achievement. Students increased in their ability to identify contact information for the author, but decreased slightly in their ability to assess the author's expertise and the overall assessment of the author's credibility. This may be due to confusion created when no credential information is available about the author. In such cases, the author's credentials are unknown, whereas the students may be indicating that the absence of such information means the author is not a credible source.

Clusters D and F both saw slight decreases in student achievement. Cluster D involved the recognition of time-sensitive material and implications of a date of revision on the reliability of a web page. Student were far better able to identify time-sensitive information on the post-test (87% versus 56%). On the other hand, students declined significantly in their ability to identify a date of revision in a web page (92% on the pre-test compared to 74% on the post-test). Upon further review of the tests, it was discovered that those pages on the pre-test without revision dates fell in the later parts of the test (which were not completed by most students) whereas the pages without revision dates on the post-test appeared earlier in the test. This may indicate that the pre-test score is artificially high, since it is easier to identify a revision date that is there than it is to conclude there isn't one.

Cluster F is arguably the most important of the group. This is where the students, armed with the assessments of the various aspects of the web pages, finally draw a conclusion about the suitability of the page. It could be argued that the test questions do not adequately measure the objective, because the students determine whether a page is suitable for research as part of a complete assessment of the page. Since the answer to this question depends on the answers to the previous questions, students who have made mistakes on previous sections could make decisions about the site which are incorrect but consistent with their answers to previous questions. For example, if a student mistakenly states that the page's content matches the domain name when it actually doesn't, they may conclude that the page can be used for research with caution when in fact it is unsuitable for research. It's clear that a better assessment of this cluster is needed.

Cluster B saw the worst performance, dropping 8% between the pre-test and the post-test. This cluster includes the assessment of the content of the page, including whether the document is well-written, whether it contains evidence of author bias, and whether it coincides with prior knowledge. This drop is attributable to the students' unexpected failure to see the humor in an article on Dihydrogen monoxide. The article points out the dangers of the chemical, implicating it as a cause of death in many accidents and a major component of acid rain. It was expected that students would suspect the document, because it claims the chemical is widely available, and the students have not heard about its danger from other sources. Dihydrogen monoxide is actually water, a point which few students realized. This page was incorrectly classified as one of reliable content by more than half of the students. Since there were few questions asked for this cluster of objectives, this bad question was enough to skew the results.

Looking at the individual students observed during the instruction, the average pre-test score was 69% and the average post-test score was 75%. Of the four, however, two students declined between the pre-test and the post-test. The first student observed, who had an A average in the class, was not at all interested in the instruction. He chose to skip most of the instructional portions, completing only the sections where he was asked to provide answers, and where he was given feedback. Even after suggesting he read the materials, his primary objective was to simply finish as quickly as possible. His post-test score was 73%, significantly lower than his class average of 94%.

The second student seemed to take the instruction much more seriously. With a B average in the class, this student is highly motivated by multimedia, and has been anxiously awaiting the class' venture onto the world wide web. He seemed to follow intently, reading the instruction carefully. In a few cases, he did not read the associated example web pages as carefully, but that didn't seem to affect his progress. His post-test score of 93% was significantly higher than his pre-test score of 53%, and slightly higher than his class average of 91%.

The third student struggles in the class. Her class average of 82% has been earned through hard work, patience, and determination. She is not well motivated to use computers and technology, but she does want to get a good grade in the class, so she tries hard. Throughout the instruction, she read carefully, frequently missing the answers in the sections where she's asked to try her skills. She didn't quite finish the instruction, but was more than halfway through the last section (cluster F) at the end of the 42-minute period. On the post-test, she missed all of the questions on cluster F. Her score of 73% was still a slight increase over her pre-test score of 68%, and probably would have been higher had she finished the instruction.

The fourth student is intelligent but a behavior problem. His 90% average in the class could be higher if he spent more time in class on-task. He had trouble focusing his attention on the instruction for more than a few minutes. After fifteen minutes of instruction, he was off task, after having completed most of the section on evaluating content. When asked to move back to the instruction after he started disturbing other students, he worked much more quickly, skipping the instruction and concentrating only on those sections where he was asked to answer questions. He finished the final four sections of instruction in fifteen minutes. His score on the post-test was a 60%, which is significantly lower than the 74% he received on the pre-test.

Looking at these four cases, it is apparent that students who complete the instruction as it is intended do better than those who skip the instruction in favor of the interactive questions. Clearly, some greater incentive is needed for the students to complete the instruction as intended.

Recommendations

The analysis of the data provided by the pre- and post-test scores of the 78 participants, along with the item analyses of the two tests and the individual observations of four students participating in the instruction leads to several recommendations for improving the instruction:

  1. Students had problems defining "bias" and "URL" even though they seemed to be familiar with the concepts. Add some instruction explaining the term "URL," because its definition should be understood by students using the world wide web, so they can accurately document their sources. Replace references to "bias" with references to "option," which is a word with which the students are more familiar.
  2. Better explain the difference between links within a web site and links to other web sites. If an author provides links between two of his pages on his own site, that does not add credibility to his work. If he provides links to other sites on the same subject, it may. Students were generally confused between these two cases.
  3. A better explanation of "author's credentials" is needed. Since the credentials of the author are so infrequently included on each of his or her pages, a deeper level of complexity is probably appropriate. The page may have links to other pages which contain information about the author, for example. An author's credibility may also be inferred from the text of the page, but not explicitly stated. Also, when the author is an organization, the credibility might be evident but not explicitly stated, as in the case of USA Today.
  4. The first web page used in the post-test should be replaced by another page with similar characteristics. The students' inability to recognize the page as a humorous one caused them to make several mistakes on the post-test. The page could still be used as an example within the instruction. The page used to replace it in the post-test should be one which is well-written, but contains obvious factual mistakes.
  5. The assessment for cluster F should give students a web page, along with conclusions from the first five clusters for that page, and ask them to draw a conclusion. The current questions assume that the student has correctly assessed all five categories, which may not be the case.
  6. The instruction should be redesigned to better motivate the students. The use of grades is not sufficient, nor is the intrinsic student motivation from wanting to find accurate information online. The use of shorter explanations with more examples of web pages may help. The example pages should be shorter and more interesting for the students. The use of clipart, graphics, and visual design elements, omitted because of the time necessary to create them, should be included in the product.
  7. An error in the coding for the assessments caused some student essays to be erroneously truncated. The data available indicates that this is related to the students' use of the apostrophe in their essays. This problem should be investigated and fixed. The lost essays were not used in the assessment or the formative evaluation.
  8. Some students refused to read the instruction presented, choosing instead to skip ahead to the parts of the instruction requiring a user response. When feedback is given on such responses, the instruction should determine whether the student's responses are correct or incorrect, and provide appropriate feedback. As it is, only the correct answers are displayed along with the student answers. Students who score below a certain level on each section should receive remedial instruction. Only students who receive appropriate scores on the practice exercises should continue with the instruction.
  9. The pre-test and post-test should be shortened. Even though they will not provide data which is as reliable, the tests must be short enough to allow the students to complete them in a class period. This was done out of necessity on the post-test, and should also be done one the pre-test.
  10. The numerous typographical, spelling, and grammatical mistakes in the instruction should be corrected.

Despite their poor performance on the post-test, it is believed that most students learned something about evaluating web sites from this instruction. Though there are improvements to be made, the instruction is still worthwhile. Further improvements will make it a more valuable tool, and hopefully one which will be used by eighth graders for the next few years.

Return to the EvalWEB Instructional Design.

EvalWEB Instructional Design -- last updated 8 January 1998 by J. Schinker.
The finished project is located at http://www.hudson.edu/hms/comp/evalweb.

© 2001-2005 Albert L. Ingram, Ph.D.