« Browse & Search Index
OCR'd Text & Stop Words: Introduction
Because of the complexity of the content in British Online Archives, which includes the OCR'd text from thousands of images, certain common keywords are ignored by the search engine.
The OCR Process
Microform Academic Publishers use optical character recognition (OCR) software to generate searchable text files from the scanned images, particularly of printed or typed text. The content of these OCR'd text files is then added to the description field of the metadata record for each image. However, the software works best on images of clean documents and, because most of the images in British Online Archives are of pages which may be, for example, discoloured carbon copies of documents typed in wartime on poor quality paper, frequent errors in character recognition do occur (e.g. 'Atlrintis' for 'Atlantis', 'commerito.:ry' for 'commentary', or 'appreci~ti~' for 'appreciation').
You should note that, while efforts have been made to minimise the effect of these errors by automatically stripping out obvious nonsense strings of characters, many inevitably remain.
Stop Word List
Any stop word entered in your search will be ignored. These stop words include conjunctions, pronouns, prepositions and other incidental words such as 'also', 'their', 'below' and 'mostly'.
If you are having trouble finding relevant items, it is worth checking the list below to make sure you are not using stop words in your search.
The following are all the stop words defined in British Online Archives, which are selected from the MySQL default list:
a's
about
above
after
again
ain't
all
almost
alone
along
already
also
although
always
am
among
amongst
an
and
another
any
anybody
anyhow
anyone
anything
anyway
anywhere
apart
are
aren't
around
as
aside
at
away
be
because
been
before
behind
below
beside
besides
between
beyond
both
but
by
did
do
does
doesn't
doing
don't
done
during
each
eg
either
else
elsewhere
enough
et
etc
even
ever
every
everybody
everyone
everything
everywhere
ex
except
for
formerly
forth
from
he
he's
hence
her
here
here's
hereafter
hereby
herein
hereupon
hers
herself
him
himself
his
hither
how
howbeit
however
i'd
i'll
i'm
i've
ie
if
in
inasmuch
inc
indeed
insofar
instead
into
inward
is
isn't
it
it'd
it'll
it's
its
itself
latter
latterly
lest
let
let's
likely
little
ltd
many
maybe
me
meanwhile
merely
more
moreover
most
mostly
much
must
my
myself
namely
nd
nearly
neither
never
nevertheless
next
no
nobody
non
none
noone
nor
not
now
nowhere
of
off
often
oh
ok
okay
on
once
only
onto
or
other
others
otherwise
ought
our
ours
ourselves
out
outside
over
per
perhaps
plus
que
quite
qv
rather
rd
re
really
same
shall
she
should
shouldn't
since
so
some
somebody
somehow
someone
something
sometime
sometimes
somewhat
somewhere
soon
sub
such
t's
th
than
thanx
that
that's
thats
the
their
theirs
them
themselves
then
thence
there
there's
thereafter
thereby
therefore
therein
theres
thereupon
these
they
they'd
they'll
they're
they've
this
thoroughly
those
though
through
throughout
thru
thus
to
together
too
toward
towards
truly
under
unless
until
unto
up
upon
us
very
via
viz
vs
was
wasn't
we
we'd
we'll
we're
we've
were
weren't
what
what's
whatever
when
whence
whenever
where
where's
whereafter
whereas
whereby
wherein
whereupon
wherever
whether
which
while
whither
who
who's
whoever
whom
whose
why
with
within
without
won't
would
would
wouldn't
yes
yet
you
you'd
you'll
you're
you've
your
yours
yourself
yourselves
|