EvalWEB Instructional Design
Formative Evaluation and Revisions

Introduction
This project analyzed, designed, and developed an
instructional unit to teach eighth grade students to
critically evaluate the suitability of individual world
wide web pages for middle school research use. The world
wide web enables people to easily and inexpensively
publish materials for a global audience. Since there is
usually no editing stage in online publishing, material
which is published on the internet has a higher
probability of being inaccurate, incomplete, or
misrepresentative of the subject. As access to the world
wide web increases in the schools, and as students
increase their use of it as a resource for research, they
will need to evaluate the information they find to ensure
that reliable sources are being used.
The instruction consists of an interactive world wide
web site (which is located at
http://www.hudson.edu/hms/comp/evalweb/). The web site
shows students examples of pages which are unsuitable or
of questionable value for research use, pointing out
their inadequacies. The students are asked specific
questions about other sample web sites, for which
feedback is given. The instruction includes a pre-test
and a post-test, both of which are taken online, using
example web pages. The test results are submitted
electronically to the teacher.
Methods
The main evaluative instrument for this instruction
was a pilot test. The instructional materials were tried
on four classes of eighth grade students, including 78
students. The data from students' pre-tests and
post-tests, along with personal observations of students
participating in the instruction were used evaluate the
instruction. Since my course, which is nine weeks long,
is taught to over 350 students annually, a pilot
including 78 students provided an adequate cross section
of the targeted learners.
In attempt to obtain more qualitative data, one
student from each of the four classes was closely
observed while the student participated in the
instruction. The focus students were selected based on
their prior success in the class. One student was chosen
with an A average, two students with B averages, and one
with a C average. Three of these students are male, and
one is female. This provides data from a reasonable cross
section of the classes. The students were not told that
they were being individually observed to discourage them
from acting unnaturally during the instruction and
testing.
While observing, it was noted when the student begins
each section of the instruction, any questions asked by
the student, and student body language which may indicate
understanding, frustration, or success. This data was
then compared with the students' performance on the
tests.
The unit was begun with a brief explanation of why the
unit is important, and how it fits into the class as a
whole. The pretest was then administered, giving the
students one 42 minute period to complete it. The next
day, students took the instruction, using the interactive
web-based instructional tool. Again, they were given one
forty-two minute period. The third day, students
completed the post-test. Because they had great
difficulty completing the pre-test in one period, the
post-test was shortened to eliminate lack of time as a
factor. Specifically, sections four and five were
removed, meaning the students were assessed on their
evaluations of two web sites instead of four. Since most
of the students did not complete those sections of the
pre-test, those sections were not graded on it. For both
the pre-test and the post-test, students who have not
finished the test within the first 35 minutes were asked
to submit their incomplete tests. For these students,
unanswered questions were not be used in the assessment,
because time should not be a factor in assessing student
performance. Following the instruction, students began a
unit in which they have to do independent research online
for a class presentation they will give.
Results
and Discussion
The average score on the pre-test was 60%, with a
standard deviation of 13%. The highest score was 84%,
while the lowest was 32%. On questions measuring
prerequisite skills, 71% were answered correctly. From
the questions asked during the assessment, it is clear
that some students did not know the meaning of the word
"bias" though they did understand the idea of
opinion. A few students appeared to have trouble
identifying mistakes in spelling and grammar. Several
students understood the concept of a web page's address,
but did not know the meaning of "URL," and
therefore could not identify the URL of a web page.
Most of the remaining questions were answered
correctly 20-50% of the time on the pre-test. One
question was answered correctly more often ("Does
this page indicate when it was last updated?"). The
questions about links to other sites were answered
correctly less often, possibly indicating a confusion
between links to other pages on the same site and links
to other web sites.
On the post-test, the average score was 70%, with a
standard deviation of 12%. The highest score was 93%, and
the lowest was 47%. This represents an increase of about
10% for the students, with those at the lower end
improving slightly more than those at the top. While
these results do indicate an increase in student
achievement, they are at least 10% below expectations.
The following table represents student achievement by
objective cluster. Cluster A includes the assessment of a
web page's domain and the identification of personal web
pages. Cluster B includes the assessment of a web page's
content. Cluster C teaches students to assess the author
and his or her credibility. Cluster D involves the
assessment of the revision date of a web page. Cluster E
includes the assessment of a document's links. Cluster F
includes the assessment of the web page as a whole, after
the first five clusters have been completed. For more
information, see the Instructional Design.
| Cluster |
% of Answers Correct |
Increase |
|
Pre-Test |
Post-Test |
(Decrease) |
| Cluster A |
57% |
82% |
25% |
| Cluster B |
54% |
46% |
(8%) |
| Cluster C |
69% |
75% |
6% |
| Cluster D |
71% |
67% |
(4%) |
| Cluster E |
22% |
42% |
20% |
| Cluster F |
46% |
43% |
(3%) |
| Whole Test |
60% |
70% |
10% |
In the first cluster, students appear to
have done reasonably well. The significant
increase in the percent of questions answered
correctly indicates that learning did indeed take
place there. Cluster E included significant
gains, but students still performed poorly.
Scores on specific questions within that cluster
indicate that students had trouble on both the
pre- and post-tests correctly identifying links
to other sites. When correctly identified, the
ability to draw appropriate conclusions based on
that information rose significantly between the
pre-test (31% correct) and the post-test (67%
correct).
Cluster C saw modest gains in achievement.
Students increased in their ability to identify
contact information for the author, but decreased
slightly in their ability to assess the author's
expertise and the overall assessment of the
author's credibility. This may be due to
confusion created when no credential information
is available about the author. In such cases, the
author's credentials are unknown, whereas the
students may be indicating that the absence of
such information means the author is not a
credible source.
Clusters D and F both saw slight decreases in
student achievement. Cluster D involved the
recognition of time-sensitive material and
implications of a date of revision on the
reliability of a web page. Student were far
better able to identify time-sensitive
information on the post-test (87% versus 56%). On
the other hand, students declined significantly
in their ability to identify a date of revision
in a web page (92% on the pre-test compared to
74% on the post-test). Upon further review of the
tests, it was discovered that those pages on the
pre-test without revision dates fell in the later
parts of the test (which were not completed by
most students) whereas the pages without revision
dates on the post-test appeared earlier in the
test. This may indicate that the pre-test score
is artificially high, since it is easier to
identify a revision date that is there than it is
to conclude there isn't one.
Cluster F is arguably the most important of
the group. This is where the students, armed with
the assessments of the various aspects of the web
pages, finally draw a conclusion about the
suitability of the page. It could be argued that
the test questions do not adequately measure the
objective, because the students determine whether
a page is suitable for research as part of a
complete assessment of the page. Since the answer
to this question depends on the answers to the
previous questions, students who have made
mistakes on previous sections could make
decisions about the site which are incorrect but
consistent with their answers to previous
questions. For example, if a student mistakenly
states that the page's content matches the domain
name when it actually doesn't, they may conclude
that the page can be used for research with
caution when in fact it is unsuitable for
research. It's clear that a better assessment of
this cluster is needed.
Cluster B saw the worst performance, dropping
8% between the pre-test and the post-test. This
cluster includes the assessment of the content of
the page, including whether the document is
well-written, whether it contains evidence of
author bias, and whether it coincides with prior
knowledge. This drop is attributable to the
students' unexpected failure to see the humor in
an article on Dihydrogen monoxide. The article
points out the dangers of the chemical,
implicating it as a cause of death in many
accidents and a major component of acid rain. It
was expected that students would suspect the
document, because it claims the chemical is
widely available, and the students have not heard
about its danger from other sources. Dihydrogen
monoxide is actually water, a point which few
students realized. This page was incorrectly
classified as one of reliable content by more
than half of the students. Since there were few
questions asked for this cluster of objectives,
this bad question was enough to skew the results.
Looking at the individual students observed
during the instruction, the average pre-test
score was 69% and the average post-test score was
75%. Of the four, however, two students declined
between the pre-test and the post-test. The first
student observed, who had an A average in the
class, was not at all interested in the
instruction. He chose to skip most of the
instructional portions, completing only the
sections where he was asked to provide answers,
and where he was given feedback. Even after
suggesting he read the materials, his primary
objective was to simply finish as quickly as
possible. His post-test score was 73%,
significantly lower than his class average of
94%.
The second student seemed to take the
instruction much more seriously. With a B average
in the class, this student is highly motivated by
multimedia, and has been anxiously awaiting the
class' venture onto the world wide web. He seemed
to follow intently, reading the instruction
carefully. In a few cases, he did not read the
associated example web pages as carefully, but
that didn't seem to affect his progress. His
post-test score of 93% was significantly higher
than his pre-test score of 53%, and slightly
higher than his class average of 91%.
The third student struggles in the class. Her
class average of 82% has been earned through hard
work, patience, and determination. She is not
well motivated to use computers and technology,
but she does want to get a good grade in the
class, so she tries hard. Throughout the
instruction, she read carefully, frequently
missing the answers in the sections where she's
asked to try her skills. She didn't quite finish
the instruction, but was more than halfway
through the last section (cluster F) at the end
of the 42-minute period. On the post-test, she
missed all of the questions on cluster F. Her
score of 73% was still a slight increase over her
pre-test score of 68%, and probably would have
been higher had she finished the instruction.
The fourth student is intelligent but a
behavior problem. His 90% average in the class
could be higher if he spent more time in class
on-task. He had trouble focusing his attention on
the instruction for more than a few minutes.
After fifteen minutes of instruction, he was off
task, after having completed most of the section
on evaluating content. When asked to move back to
the instruction after he started disturbing other
students, he worked much more quickly, skipping
the instruction and concentrating only on those
sections where he was asked to answer questions.
He finished the final four sections of
instruction in fifteen minutes. His score on the
post-test was a 60%, which is significantly lower
than the 74% he received on the pre-test.
Looking at these four cases, it is apparent
that students who complete the instruction as it
is intended do better than those who skip the
instruction in favor of the interactive
questions. Clearly, some greater incentive is
needed for the students to complete the
instruction as intended.
Recommendations
The analysis of the data provided by the pre-
and post-test scores of the 78 participants,
along with the item analyses of the two tests and
the individual observations of four students
participating in the instruction leads to several
recommendations for improving the instruction:
- Students had problems defining
"bias" and "URL" even
though they seemed to be familiar with
the concepts. Add some instruction
explaining the term "URL,"
because its definition should be
understood by students using the world
wide web, so they can accurately document
their sources. Replace references to
"bias" with references to
"option," which is a word with
which the students are more familiar.
- Better explain the difference between
links within a web site and links to
other web sites. If an author provides
links between two of his pages on his own
site, that does not add credibility to
his work. If he provides links to other
sites on the same subject, it may.
Students were generally confused between
these two cases.
- A better explanation of "author's
credentials" is needed. Since the
credentials of the author are so
infrequently included on each of his or
her pages, a deeper level of complexity
is probably appropriate. The page may
have links to other pages which contain
information about the author, for
example. An author's credibility may also
be inferred from the text of the page,
but not explicitly stated. Also, when the
author is an organization, the
credibility might be evident but not
explicitly stated, as in the case of USA
Today.
- The first web page used in the post-test
should be replaced by another page with
similar characteristics. The students'
inability to recognize the page as a
humorous one caused them to make several
mistakes on the post-test. The page could
still be used as an example within the
instruction. The page used to replace it
in the post-test should be one which is
well-written, but contains obvious
factual mistakes.
- The assessment for cluster F should give
students a web page, along with
conclusions from the first five clusters
for that page, and ask them to draw a
conclusion. The current questions assume
that the student has correctly assessed
all five categories, which may not be the
case.
- The instruction should be redesigned to
better motivate the students. The use of
grades is not sufficient, nor is the
intrinsic student motivation from wanting
to find accurate information online. The
use of shorter explanations with more
examples of web pages may help. The
example pages should be shorter and more
interesting for the students. The use of
clipart, graphics, and visual design
elements, omitted because of the time
necessary to create them, should be
included in the product.
- An error in the coding for the
assessments caused some student essays to
be erroneously truncated. The data
available indicates that this is related
to the students' use of the apostrophe in
their essays. This problem should be
investigated and fixed. The lost essays
were not used in the assessment or the
formative evaluation.
- Some students refused to read the
instruction presented, choosing instead
to skip ahead to the parts of the
instruction requiring a user response.
When feedback is given on such responses,
the instruction should determine whether
the student's responses are correct or
incorrect, and provide appropriate
feedback. As it is, only the correct
answers are displayed along with the
student answers. Students who score below
a certain level on each section should
receive remedial instruction. Only
students who receive appropriate scores
on the practice exercises should continue
with the instruction.
- The pre-test and post-test should be
shortened. Even though they will not
provide data which is as reliable, the
tests must be short enough to allow the
students to complete them in a class
period. This was done out of necessity on
the post-test, and should also be done
one the pre-test.
- The numerous typographical, spelling, and
grammatical mistakes in the instruction
should be corrected.
Despite their poor performance on the
post-test, it is believed that most students
learned something about evaluating web sites from
this instruction. Though there are improvements
to be made, the instruction is still worthwhile.
Further improvements will make it a more valuable
tool, and hopefully one which will be used by
eighth graders for the next few years.
Return to the EvalWEB Instructional
Design.

EvalWEB Instructional
Design -- last updated 8 January 1998 by J.
Schinker.
The finished project is located at
http://www.hudson.edu/hms/comp/evalweb.
|
|