BHL Survey 2010, evaluation of the results

15 March - 03 May 2010


Main tasks adressed in this survey

1 - Not to lose our current user groups (for which it is necessary to understand who they are)

2 -
To improve our service so that our current user groups can use BHL in a greater extent and more comfortably than at present

3 -
To attract new user groups (for which it is necessary to understand what the users need among the things we could provide)

Total number of participants: 1113 (successful answers for Question 16, the other questions had less)


Question 1: use frequencies

Question 1: How often do you use the Biodiversity Heritage Library BHL http://www.biodiversitylibrary.org

 

Most participants of the survey used BHL less than 2 times in a week. This group accounts for approximately 20 % of the page requests at BHL.

1/3 of the participants were frequent users, the others occasional users or had not used BHL before. The "frequent users" group (A and B) constituted by far the most important section among the participants in terms of usage.

There were only minor differences between participants before and after 06 April (the survey was open 16 Mar to 03 May). We tested this to rule out the possibility that after a few weeks responses would be restricted to occasional users.

This question was important for the evaluations of several questions of the survey.
Frequent users have more experience with BHL functions, their voice should be considered as more important when questions are asked about the current quality of BHL.
Occasional users could eventually convert into more frequent users, if BHL is improved, so their special needs and desires must not be neglected. The answers of this group are important when questions about new ideas for new functionalities are asked. See also Q2: occasional users do not work often with digitized literature, so improving the service has limitations. 

Total number: 1026
Frequent users total: 339
Participants after 06 April: 303


Method of evaluation of the following questions:

Credit points in these questions were calculated with the following method:
Answers were always given at a 5-point-scale (total agreement to total disagreement), with a neutral option (no opinion or "I don't understand") which was not considered in the evaluation.
I calculated the % proportions of scale points 1-5 (totalling 100 all five taken together), then multiplied the result by 4 for scale point 1 (total agreement), by 3 for point 2 (moderate agreement), by 2 for point 3 (middle option), by 1 for point 4 (moderate disagreement) and by 0 for point 5 (total disagreement), and finally dividided the result by 2.
This yielded a maximum value of 200 credit points for total agreement by all participants, and a minimum value of 0 credit points for total disagreement by all participants. Values around 150 corresponded to relatively strong agreement, 100 for middle options, 50 to relatively strong disagreement. Green coloured bars reflect more or less strong levels of agreement, orange and yellow bars reflect more or less strong levels of disagreement.

Example:
Question 2, bullet point 2D "for researching biology":
Results:
81 "almost always" + 194 "often" + 253 "sometimes" + 204 "rarely" + 230 "never" (+ 5 "I don't know", not considered) (= total 967 answers for this bullet point)
% Proportions: 8.4 + 20.2 + 26.8 + 21.2 + 23.9 = 100 %
Credit points: ((8.4 x 4) + (20.2 x 3) + (26.8 x 2) + 21.2) / 2 =
84.0  


Question 2: use functions

Question 2: How often do you use the Biodiversity Heritage Library for the following?

[Scale: almost always, often, sometimes, rarely, never, I don't know/understand]

Results Question 2:
The highest value was obtained for the control question "work with digitized literature" (177 credits in frequent users).

BHL is mostly used for veryfying nomenclatural questions (141 credits in frequent users, a very high value), less frequently for researching the publication history of species and for verifying the correct citations of literature sources. BHL is substantially less frequently used to find illustrations, and even less for researches on the biology behind the taxonomic name.

Many users download files at high resolution (124 credits), some provide links to BHL sources (for which they need stable URLs) or use BHL for data mining. The bullet point "fulfil a library user's request" obtained the lowest ranking.

These answers were more or less expected.
The biology behind a name of a species is significantly less frequently requested than the names and acts as such. Information on the biology is substantially more quickly outdated, and for looking for information on the biology it is necessary to consult more recent literature. Indirectly this means that BHL should provide more literature after 1920.
Finding illustrations is not as highly requested as text (86 credits by al users), but must not be neglected. The lower rating might have to do with plate numbers not being given in the page level metadata - this makes it difficult even for specialists to find illustrations. Even if you know on which plate the animal is figured, it will take you a long time to find it. General interest readers (28 persons) were significantly more interested in finding illustrations than average (132 credit points).

Bullet points H (stable URLs), I (data mining) and J (library requests) addressed particular interest groups, who responded more positively to the points than average (the "all users" value for database providers in bullet point H was 64). The low rates of affirmation does not mean that the service was not requested. It means that the service is only requested by particular target user groups.

Occasional users would not do different things with BHL, but just less frequently in every single bullet point.

Total number (bullet point A): 1026
Frequent users total (bullet point A): 366


Question 3: levels of satisfaction

Question 3: We would like to know how satisfied you are with various features of the Biodiversity Heritage Library.

[Scale: strongly agree, agree, neither agree or disagree, disagree, strongly disagree, I don't know/understand]

Users were moderately content with all mentioned features, differences between items were finely tuned.
Frequent users were generally slightly more satisfied with the functions.
The highest levels of satisfaction were recorded for the PDF downloading function (entire book) and - surprisingly - for the scan quality. Slightly lower but still very high levels of satisfaction were registered for the search function (bullet points A and B yielded exactly the same values, the only difference was that A obtained slightly higher values for "I don't understand"). The levels of agreement with OCR download functions, high resolution images downloads and the taxon finder function ranked lower. Still lower support had the create-my-own-PDF function and the data mining access.
The online viewer had by far the lowest level of agreement.

This question had the highest ratings of answers "I don't understand/I don't know". This had partly to do with "first" users not knowing much about BHL. Frequent users knew substantially better what was asked here.
Ratings in % for "I don't understand/I don't know" for bullet points A-J:
All users: 7, 6, 15, 6, 6, 11, 23, 14, 23, 33
Frequent users: 0, 0, 7, 0, 0, 5, 18, 7, 16, 26

Scan quality. Scan quality was rated as fine or good. Being surprised about this result we asked the community why they have not selected lower ratings of satisfaction for current BHL functions. Being specialists ourselves we know that the scan quality of current BHL libraries (Smithsonian, Harvard, Missouri, Natural History Museum London) are far below those of other providers outside the current BHL group, and that image qualities of the plate figures by those providers are certainly not sufficient if one really needs to work with these figures.

We obtained feedback that many people were so thankful that BHL provides this free service at all, that there simply did not arise the idea to complain about quality. When users were asked more directly about the scan quality several experienced users affirmed that several things in the scanning process could indeed be improved. We will see if ratings will increase next time.

Conclusions

This question was important to be compared with answers after having improved the BHL web presentation. These were things were improvements are on its way.

Ease of reading a book in the online viewer had the lowest rating. The online viewer would need improvement most urgently.

 

Total number (bullet point A): 1023
Frequent users total (bullet point A): 373


Question 4: PDF download reasons

Question 4: In this question we would like to understand why you need to download a PDF file.

[Scale: almost always, often, sometimes, rarely, never, I don't know/understand]

PDFs are downloaded for a variety of reasons.
Highest rankings were recorded for shortcomings in the online viewer and the BHL search function. Computer offline was also important. All other reasons were less important, but none was of minor importance.
Frequent users has slightly higher concerns for long lasting free access to BHL contents, which corresponds to higher rates of persons who consulted BHL contents without downloading PDF files. Occasional users needed PDFs for printing more frequently than frequent users, and they tended to find books again even more rapidly on their own harddisk than frequent users.

4E was unique in this block in that 25 % did not understand this question. This concerned also the German survey (25 % also there), where the meaning of the term "searchable" was even more finely tuned and misunderstandings would be excluded. The results suggest that searching within the full text of a digitized PDF file is not done by many participants.

4C: Skeptical users (fearing that free access may not be long lasting) were unevenly distributed among participants (average 77 credits). Frequent users were generally more skeptical (82 credits). Surprisingly large differences were recorded between languages.
English-language participants (344 persons) were the least skeptical (52 credits) (North Americans had 55 credits), French, Italians and South Americans (140 participants) were more skeptical than average (93-95 credits), German-language participants (143 persons) were very skeptical (103 credits), most skeptical were eastern Europeans and Russians (45 persons, 108 credits). Librarians (92 persons) were much less skeptical (49 credits), this had influence on the low value for English participants in general.

4H: suggests that in roughly 30-35 % of the cases PDFs are not downloaded.

Conclusions

Users download PDF files for a variety of reasons. Some reasons can be neutralized by improving BHL functions (PDF reader against online viewer, making it easier to find a book again by improving the search functions) or simply by time (long lasting free access), others not (computer offline).
Bullet point H (I don't download PDFs) received surprisingly high rates, even higher by frequent users. It will be important to compare the rating for this bullet point with future surveys.

 

Total number (bullet point A): 1015
Frequent users total (bullet point A): 350


Question 5: referrers

Question 5: We would like to know from which website you come to the Biodiversity Heritage Library.

[Scale: almost always, often, sometimes, rarely, never, I don't know/understand]

Most users seem to have bookmarked BHL, especially frequent BHL users.
Important referrers were Google, Wikipedia, EoL, occasionally also library catalogs.
Other paths were rarely used. Bing, Yahoo and others can be neglected, the BHL blog seems to have only few participants, Species 2000 was inserted to obtain a negative calibration (Species 2000 does not provide links to BHL). We were surprised that positive responses were obtained at all. It is possible that Species 2000 is used by some participants who get to BHL indirectly via other providers to which Species 2000 provides links.
Frequent users have more commonly bookmarked BHL than the others, and use Internet Archive more frequently, as well as library catalogs. Frequent users use Google exactly as frequently as do occasional users.

We compared the results with those of Google Analytics in the same time period (15 March - 03 May 2010). The results differed markedly:

Direct traffic: 18.4 %
Google search engine: 49.7 %
Other search engines: 2.5 %
Referring sites: 29 %
This striking difference is difficult to explain. Google Analytics is a service provided by Google, a commercial company which makes money with such tools. It is also possible that the results are correct and that Google counts an extremely high number of useless clicks as successful traffic (people who came from Google, saw immediately that BHL did not provide what they were looking for and left the site again quickly).

Conclusions

It will be more important than previously expected to develop strategies for higher rankings in Google. Even for frequent users the Google search engine is much more important than we had thought. Other search engines can be neglected, but it will be important to keep an eye on these, too, since Google's star might be sinking some day.
Internet Archive is also important. We received feedback that users tend to look up material found at BHL in Internet Archive in the hope to find the same work in higher quality, for example a Google book.

Other search engines and the BHL blog yielded only slightly higher rates than Species2000, so those were close to zero.

Library catalogues was difficult to evaluate more closely for a better understanding of this point. North Americans had 46 credits, South Americans 24 and Germans 25.

Total number (bullet point A): 1000
Frequent users total (bullet point A): 367


Question 6: search methods

Question 6: Help us to understand your preferred method of searching for books online. Please rate each of the following search strategies.

[Scale: totally prefer, very much prefer, moderately prefer, slightly prefer, not at all prefer, I don't know/understand]

Most users searched for the author. But not always and exclusively.
There were hardly any differences between frequent and occasional users. Those who were used to the current BHL default function tended to rank this method higher. But they also ranked higher the wildcard option search for titles and authors.
Google-like search was not preferred, even less by frequent users. Google returns too many insignificant results and is not able to search for an exact sequence of letters, incorrect spellings are automatically corrected (it is not possible to search for an uncommon spelling of a name), these are shortcomings in the Google search function and annoying for professional BHL users. BHL users seem to know exactly what they are looking for.
Not many users are friends of advanced search options with Boolean terms like "and", "or", "not" etc. This means that they prefer an effective and quick default search function. If they do not find immediately what they have been looking for, they prefer to start a new search with other keywords instead of using a Boolean search function. Some comments in the freetext questions however suggest that some users are quite hapy with Boolean search terms - it would be convenient to provide these as an additional option.
Scientific names of species and genera are extremely often searched for. This means that linking taxonomic names with literature sources, as done by uBio tools (taxon finder) or AnimalBase is very important and highly requested.
Common/vernacular names of animals and plants are very rarely looked for (50 credits). A special analysis of the general interest readers and artists (29 persons) gave a value of 95 credits - probably significantly higher but still not extremely much.

Conclusions

The preferred default search function following these results would be one like that: author, year and some words of the title, with a wildcard option, yielding as few results as possible, and an independent seach function where taxonomic names would be found.

The current BHL default should be maintained as a possible option "exact letter combination in title", so that "relle des mo" could be inserted and return only extremely few results (of a title "Histoire naturelle des mollusques").
Boolean terms are not preferred and should only be offered as an extra function on request. The default search function should be able to understand the word "and" as a word belonging to the title.

Total number (bullet point A): 1055
Frequent users total (bullet point A): 372


Question 7: future developments

Question 7: Help us prioritize developments for the future of the Biodiversity Heritage Library.

[Scale: very important, quite important, moderately important, slightly important, not at all important, no opinion, I don't understand]

Frequent users tended to rank priorities for improvements slightly higher.
Non-English users tended to rank priorities higher.

Highest priorities had proposals to submit requests for scanning literature, improvements of the online viewer and high resolution downloads.
Proposals for improving metadata were rated higher by non-English participants (121 credits by all non-English users, 92 by English occasional users).

Login functions ranked considerably lower, the majority rated these as not very important. Highest rates had 7K (saving favorites), obviously in line with responses from other questions that the search function in BHL is not optimal.
Weakest values had search by collection, quicker download of low resulution PDFs and (extremely low) tagging content with keywords.

Conclusions and thoughts

Submit requests for scanning literature was the most preferred item for improving BHL.
The question is how to realize that. We would need a compound catologue from which titles could be selected. This would be the bidlist's catalogue which would not give results in the default search function, but only in an extra search function "bidlist/list of desired titles".
Requests for download of high resolution images go in line with improving the create-my-own-pdf service. The big problem is that high resolution images will mostly be requested from colour plates - which in turn have not been marked in the page-level metadata. So there is a question how to realize this. Suggesting metadata improvements is the next important item - and a prerequisite for downloading high-res images on demand.

Quick download of low resolution PDFs had low ratings. This is on contrast with personal feedback, and possibly based on misunderstandings. Users desire a well readable text, independent from the image resolution. Experienced users criticized that text pages scanned by BHL libraries were often brown on tan (Smithsonian style, also Harvard, London, Missouri and the others), instead of black on white. The tan is not needed for understanding the text, neither is the brown colour of the letters. It would be possible to convert these pages into black-on-white (Bielefeld style), this would automatically concord with a significant reduction in file size, and also in much faster loading times in the online viewer.

Login functions ranked generally lower. Users prefer a powerful service by default, not restricted to users who are logged in. Login has several disadvantages, many think that login is boring to manage. Occasional users tend to forget their passwords. Even if the password is known, it takes time to login and enter the password. "Another account, another password", said one participant in the freetext answers.
Many participants may also have feared that login is the first step for ceasing the free service. But the results suggest that this was not so. The ratings for bullet points 7A, 7K and 7L did not differ in the "skeptical users" group. We analysed 223 persons who responded positively (radio button options 1 or 2) to Question 4C "I am skeptical that free access will be long lasting". These had 99 credit points in 7A, 103 in 7K and 95 in 7L, so no visible differences.

Total number (bullet point B): 1029
Frequent users total (bullet point B): 358


Questions 7 and 16: default portal language

Question 7: Help us prioritize developments for the future of the Biodiversity Heritage Library.

[Scale: very important, quite important, moderately important, slightly important, not at all important, no opinion, I don't understand]

Question 16: My language is that of the country where I am living/working.

All user groups ranked English default considerably higher than local language default (except the 37 Spanish participants who ranked 111/109). German native speakers ranked local language extremely low. English native speakers ranked English-default slightly lower than non-English users.
No visible difference was spotted between frequent and occasional users.

Total number (Q7 bullet point G): 1039

13 % of the participants worked in countries where a different language was official.
This applied less to Italian and English native speakers, and considerably more to German, Spanish and French native speakers.
The results suggest that from all points of view and in all countries, English should be the default language of the BHL portal.

Total number (Q16): 1113


Question 8: APIs

Question 8: Are you aware that BHL allows you to download all of the data on taxonomic names and book information (such as titles and authors) through the use of APIs (application programming interfaces) and other exportable formats?

APIs were only used by very few participants. Only 30 persons were recorded who actually used APIs. 48 % responded that they did not know that BHL offers APIs.
Frequent users knew slightly better that BHL offered APIs.
English native speakers understood better what APIs were and more frequently knew that BHL offered APIs, but only 6 % (12 recorded persons) actually used these APIs.
The ratings for A and B among Germans, French and Italians were below 10 %, less than average (17.5 %).
53 % of the Italian, Spanish and French participants did not understand this question, many more than average (35 %).

Total number: 1071
Frequent users total: 364


Questions 9 and 10: freetext answers

Question 9: What features do you find most helpful to use in other digital library websites such as Google Scholar, Google Book Search, Gallica, Botanicus, AnimalBase, etc.?

[Free text answer]

Question 10: Please provide any additional comments you may have about the Biodiversity Heritage Library.

[Free text answer]

These questions were mainly thought to detect new ideas and items we have missed to ask in the questions above.
We received several 100 answers. Most participants either praised our work, or gave a comment that they were not able to give a comment here, or repeated or refined/explained in more detail aspects or subjects raised in the above questions.
The latter comments can be important to understand better the results of the above questions.
Most concerned the search options. The problem with the scan quality presentation was more explicitly explained, several users expressed that they would prefer to read a textbook "black on white" instead of "brown on tan".

New ideas that were brought up by several participants were restricted to the following points:
Content: the need to fill gaps in serial runs, and the need to expand the digitized contents to paleontological works.
Better access to articles of serials: metadata should be present for articles, articles should be searchable/be returned in result sets.
Some other items that were brought up concerned functions that are already available at BHL but users justs did not know them, which we identified primarily as a problem of communication.


Question 11: user profile, profession

Question 11: In the context of your research needs, what best describes your profession?

[Check-boxes that allow respondents to select multiple options]

Most participants were bioscientists, either paid or unpaid (many professionals added comments that they were retired, others worked full-time but were unpaid due to the lack of funding, and did not miss to complain about that). Multiple answers were possible in this checkbox question.
The proportion of unpaid amateur researchers was considerably higher among frequent users (16 % vs. 21 %).
Teachers, librarians and students were moderately important groups. Students were more important in the frequent users group.
Database providers, librarians and students had higher proportions among frequent users.
All other target user groups participated in extremely low numbers (artists and publishers had less than 10 persons).
Special figure to highlight the more generally interested readers:

Library staff was a special target user group largely restricted to North America.
All languages: all users: 11 %, frequent users: 15 %
English natives: all users 19 %, frequent users 25 %
non-English natives: all users 3 %, frequent users 5 %
See also the library staff figure under Question 14.

Total number of answers: 1335
Frequent users: 431


 Question 12: user profile, specialisation

Question 12: My special group of organisms:

[Select one]

This question was important to know which kind of literature should be digitized.

Of those who worked on special groups, most participants had only one special group (97 % of the frequent users). Zoology had 54 %, botany 40 % (among the frequent users). Other organisms (algae, lichen, fungi, bacteria) taken together had ratings below 5 % of all users (bacteria only 0.2 %).
Botanists worked mostly on angiosperms (36 % of the frequent users).
Zoologists were specialized in insects (19 % of the frequent users), molluscs (16 %), vertebrates (12 %), and others (8 %). The proportion of entomologists was lower than it could be expected from the number of species they have to deal with (75-80 % of the animals). Most entomologists worked on Coleoptera, but the other insect groups were also important, there were no exceptions.
It is possible that BHL is considerably weaker in providing insect literature, than for other animal groups. It is also possible that the importance of pre-1900 literature is lower in insects than it is in vertebrates and molluscs.

Total number: 917
Frequent users total: 311


 Question 13: user profile, disciplines

Question 13: My special interest is:

[Check-boxes that allow respondents to select multiple options]

Checkbox options allowed multiple answers. Figures for all users and frequent users exclude the participants of the German survey (because the German survey had a scrollbox instead of checkbox options).
The "Other, please specify" option was an open textbox, employed to more potential user groups.

The participants were interested in various different fields of biodiversity research.
More than 80 % were interested in taxonomy, systematics and nomenclature (91 % of the frequent users). Many participants selected more than one option (average 3.0 options were selected, 2.8 by frequent users, 3.3 % by German users when they had checkbox options).
Next to taxonomy, participants were mainly interested in biogeography, morphology, history of science, evolution and nature conservation, more rarely in paleontology and molecular biology, only very few in physiology and other fields (informatics, ethnobotany, bibliography, horticulture, archaeology/anthropology, developmental biology and ethology - these disciplines would probably have yielded more responses, had we explicitly given these as options).

We observed differences between various user groups.
Frequent users were much more interested in taxonomy (and nomenclature) than occasional users (only 9 % of the frequent users were not interested in taxonomy, 18 % of all users, 36 % in the 166 participants of the "I have not used BHL before" group). Besides taxonomy, only in history of science the proportion was higher in frequent users (26 %) than in the occasional users group (20 % in all users).
In other words, BHL is also consulted by people interested in biogeography, ecology, evolution and nature conservation, but visibly less frequently. Those who are interested in taxonomy and history of science consult BHL more frequently than the others.

Regional differences in fields of interest:

Germans, South Americans and Italians had a broader range of fields of interest than average (and selected more checkboxes - this is why the average in each line is not zero), North Americans selected less checkboxes than average in this question.
North Americans were slightly less interested than average in biogeography, ecology and nature conservation, slightly more in evolution and phylogeny.
South Americans were more interested than average in evolution, morphology and biogeography.
Eastern central Europeans and Russians were much more interested in ecology, also more than average in nature conservation and paleontology, and less than average in history of science, taxonomy and evolution.
Germans (when they had checkbox options) were more interested than average in biogeography, morphology, nature conservation and paleontology.
Italians had their special interests in biogeography, ecology, taxonomy and physiology, and were much less interested than average BHL users in evolution and phylogeny.

Primary field of interest:

In the German survey no checkbox answers were possible and participants were forced to select only one item in a scrollbox. This allowed us to determine the primary field of interest of these researchers.
Many participants (38, = 33 % of the 116 German participants) felt forced to add other fields of interest in the free text box below, much more than in the other surveys.
73 % of the Germans saw themselves primarily as being interested in taxonomy/nomenclature (64 %) and morphology (9 %).
The third most important field was paleontology (8 %), which ranked much lower in the overall image above (3-5 %).
The proportions derived from the German survey were taken to recalculate the others surveys, and to answer the question what was their primary field of interest?. Among the frequent users group we would thus expect that 74 % see their primary interest in taxonomy and nomenclature, 8 % would have their primary interest in morphology, 5 % in evolution/phylogeny, 5 % in paleontology, 2 % in history of science, ecology and molecular biology, and only 1 % in biogeography and nature conservation.

This suggests that we have four major independent groups among BHL users, accounting for 93 % of the audience.
1 - Taxonomists (74 %)
2 - Researchers studying morphology, presumably species identification (8 %)
3 - Researchers studying evolutionary biology and phylogeny (5 %)
4 - Paleontologists, paleobotanists (5 %) 

Total number of answers: 2212
Frequent users answers total: 731
Total number of users excluding German survey: 920
Total number of frequent users excl. German survey: 311
German users English and international survey: 58 persons, 191 answers
German survey participants: 116

Fields of interest and organism groups

Special analysis: distribution of disciplines (fields of interest) among specialists of certain organism groups.
Five groups of specialists were selected for a closer analysis to know more about the distribution of the fields of interest among bioscientists: fishes (29 persons), birds (34 persons), molluscs (119 persons), coleopteran insects (73 persons) and angiosperm plants (262 persons).

Table: proportions listed by specialists, "all" means all users average (from the above Q13 figure), in bold proportions recorded above the average values.

Taxonomy:   all 83 %, insects 100 %, plants   89 %, molluscs 85 %, fishes   72 %, birds    62 %
Morphology: all 31 %, fishes   38 %, insects  26 %, molluscs 25 %, plants   23 %, birds    15 %
Biogeography:  all 39 %, insects  45 %, fishes   38 %, birds    38 %, molluscs 38 %, plants   33 %
Ecology:    all 29 %, birds    29 %, molluscs 29 %, insects  26 %, plants   22 %, fishes   21 %
Nature conservation: all 20 %, fishes   31 %, plants   21 %, birds    18 %, insects  16 %, molluscs 15 %
Evolution:  all 26 %, molluscs 29 %, plants   25 %, fishes   21 %, insects  21 %, birds    18 %
Paleontol.: all 12 %, molluscs 27 %, fishes   10 %, birds     9 %, insects   7 %, plants    6 %
History of science: all 20 %, birds    24 %, plants   17 %, molluscs 16 %, fishes   10 %, insects  10 %

Verbal interpretation of these data:
Taxonomy: strongest in insects, above average in plants and molluscs, much less in birds and fishes.
Morphology: above average in fishes, the others slightly below, very weak in birds.
Biogeography: highest rating in insects, but not much lower in the others, least in plants.
Ecology: most interesting for birds and molluscs, but not much less for the others.
Nature conservation: most important in fishes, average in plants, slightly less in the others.
Evolution and phylogeny: molluscs slightly above average, the others slightly below.
Paleontology: most interesting and much above average in molluscs, much less in the other groups. Should also be high in dinosaurs and trilobites.
History of science: most important in birds, less interesting in the others.

Fields of interest and professionality

Special analysis: distribution of special interests among professional and amateur bioscientists. This analysis has the general problem that the border limit between amateur/hobby sciensists and professional scientists is badly defined. Retired scientists did not know how to define themselves (considering their skills they should have selected professional, but since they were unpaid many selected amateur/hobby scientist).

Professional bioscientists (paid):
- Taxonomy: 85.9 %
- Morphology: 32.8 %
- Biogeography: 38.2 %
- Ecology: 26.3 %
- Nature conservation: 16.4 %
- Evolution and phylogeny: 29.4 %
- Molecular biology: 12.4 %
- Paleontology: 12.0 %
- History of science: 14.0 %

Amateur/hobby bioscientists (unpaid):
- Taxonomy: 90.7 %
- Morphology: 30.6 %
- Biogeography: 41.5 %
- Ecology: 30.6 %
- Nature conservation: 17.6 %
- Evolution and phylogeny: 16.1 %
- Paleontology: 13.0 %
- Molecular biology: 6.2 %
- History of science: 20.7 %

Major differences (> 5 %) are marked in bold. There were hardly any differences between the two groups, except that professional scientists were more interested in phylogeny/evolution and molecular analyses (presumably because they have the funds to study on a molecular basis), and that amateur scientists were even more interested in taxonomy, systematics and nomenclature than the professionals. History of science had a higher rating in amateur scientists, possibly due to the fact that many retired scientists defined themselves as amateurs.

Fields of interest and general interest users

Special analysis: distribution of special interests among general interest readers and artists (29 persons) (all = all users for comparison).
- Taxonomy: all 83 %, gen. int. = 48 %
- Morphology: all 31 %, gen. int. = 28 %
- Physiology: all 4 %, gen. int. = 17 %
- Biogeography: all 39 %, gen. int. = 52 %
- Ecology: all 29 %, gen. int. = 41 %
- Nature conservation: all 20 %, gen. int. = 41 %
- Evolution and phylogeny: all 26 %, gen. int. = 24 %
- Paleontology: all 12 %, gen. int. = 24 %
- History of science: all 20 %, gen. int. = 40 %

The differences are striking (but caution with the low number of persons who defined themselves as general interest readers in this survey). General interest readers were much less than our average audience interested in taxonomy/systematics/nomenclature, and much more in nature conservation, history of science, ecology, paleontology, biogeography and physiology. The needs of this group would be met in a greater extent if more modern literature containing information on ecology and the conservation status, and more paleontological litrature would be provided.

Conclusions

Specialists of various groups of organisms use BHL for slightly different reasons. Those who are interested in birds use BHL more for gaining information on the ecology and history of science, in fishes information on the morphology of the species is very important, malacologists have a broader variety of interests, entomologists are almost all taxonomists who are much interested in biogeography, and botanists are mainly interested in taxonomy and systematics.

If the data represent the community adequately, then this would suggest for example, that more ornithologists could be attracted if more modern literature would be available with ecological information, more ichthyologists could be attracted by including red data lists and other information on nature conservation, and more entomologists could be attracted by more efficient analyses of the digitized literature in terms of geographical data.


 Question 14: user profile, regions

Question 14: The region where I am working is:

[Select one]

Most participants came from North America and Europe. Europeans tended to consult BHL slightly more frequently than North Americans.

Within Europe, participants came from various countries in Britain and central and southern Europe (Germany, Netherlands, France, Spain and Italy). Less participation was recorded from Scandinavia, eastern Europe, Greece and Turkey.
Most eastern Europeans were occasional users. The proportion of frequent users was unusually high among participants from Spain and Italy.

We detected two main groups of users: bioscientists and librarians. Their distribution was globally unevenly distributed.

Librarians came mostly from North America, some from Europe and Australia, while bioscientists came from various different regions, many from Europe, one-third from North America, also many from South America.

I found no explanation why the proportion of North Americans was so high in the library staff user group.

Total number: 892
Frequent users total: 310


Question 15: user profile, languages

Question 15: My native language is:

[Select one]

50 % of the participants were English native speakers, 50 % spoke other languages. There were only weak differences among frequent and occasional users.
Among the most frequently recorded other languages were German (16 % all users, 15 % frequent users), French (7 and 9 %), Spanish (5 and 6 %), Italian (5 %), Dutch (4 and 5 %) and Portuguese (3 %). Czech and Russian had 1-2 %, Scandinavian languages together 1 %, Chinese 1 %, all others together 5 and 3 %.

We had anticipated such a distribution, so we translated the survey into all frequently spoken languages except Dutch, covering 86 % of the participants. Obviously we underestimated only the importance of the Dutch language.

Total number: 1082
Frequent users total: 360


Survey history: answers by date

Total number of answered surveys: 1877 (1563 until 06 April)
Total number of successfully answered surveys: 1063 (= 57 %) (759 until 06 April (= 49 %))

Due to a bug in the surveymonkey program, 50 % of the answers until 06 April were not recorded by the program (and lost). We did not find the reason of this bug, the surveymonkey support did not know the exact reason either. It had to do with the various different language versions and with the fact that some questions had to be skipped in the course of the language choices. After 06 April we set up one single English survey with some multilanguage components and removed the skip options. This brought the solution, the success rates increased immediately to more than 90 %.

Total number of answered surveys after 06 April: 314
Total number of successfully answered surveys: 304 (Question 1) (= 97 %)


 Survey languages: levels of understanding

We analysed the survey in the objective to get to know if and to which extent non-English participants had more difficulties in understanding the questions of the survey.
Each bullet point of questions 2, 3, 4, 5, 6 and 7 had the option to select "I don't know/I don't understand". (We should perhaps have inserted somewhere a bullet point that gave no sense at all, to get a negative calibration to see how many actually liked to admit that they did not understand).
34.5 % selected option D in Question 8 "I don't understand/I don't know what APIs are". Higher rates were recorded for this bullet option among Italian, Spanish and French participants, but this was combined with knowledge on APIs and did not exclusively refer to language.

Highest levels for not-understanding were recorded for bullet points 3J (data mining, 33 % of all participants), 4E (full text not searchable in online viewer, 25 %), 3I (create-my-own-PDF, 23 %), 3G (download OCR file, 23 %), 3C (taxon name finding functionality, 15 %), 3H (download high resolution images, 14 %), 3F (download PDF, 11 %), all others were below 10 %. Surprisingly, the equivalent bullet points in Question 7 did not obtain high rates for I don't know/I don't understand (worst understood in this question was bullet point 7F (tagging BHL content with keywords like Flickr, 6 %)).
Bullet points 2B-2G, 6A and 7G obtained below 1 % and were best understood.
Average levels of non-understanding regarded by question were the following:
Question 2 (2 %), Question 3 (14 %), Question 4 (3 %, excluding 4E), Question 5 (2 %), Question 6 (2 %), Question 7 (2 %). Question 3 "how satisfied are you with BHL features" was worst understood.

Levels of understanding of the English survey was analysed by mother languages of participants. These analyses yielded extremely weak results, hardly any difference was detected between English and non-English native speakers (negative values indicate that the level of understanding was lower, = the proportion of persons who selected I don't understand was higher):

Difference English native speakers against average: -0.1 %
Difference Eastern Europeans and Russians against average: -2.8 %
Difference German native speakers against average: +2.6 %
Difference French native speakers against average: -3.5 % 

This suggests that French and Eastern Europeans had slightly more difficulties in understanding the English questions, that English native speakers did not understand the bullet points better than non-English natives, and that - surprisingly - the Germans understood the questions better than the others.

The survey was modified at 06 April: until 06 April six language versions had been offered (English, German, Italian, French, Spanish and Portuguese), after 06 April all bullet points were exclusively available in English. The French and German responses were analysed for differences before and after the change.

Germans: German survey (113 persons) against English/international survey (35 persons): +4.0 %
French: French survey (17 persons) against English/international survey (50 persons): -8.0 %

This suggests that 8 % of the French did not understand the questions if these were asked in English (but attention, the numbers for the French surveys were low). Surprisingly, the Germans seemed to have understood the questions better if they were asked in English instead of in German... whatever this implies...

The general conclusion is that language skills of participants had not been a significantly limiting factor for the understanding of the survey. German and French native speakers enjoyed the opportunity to use the German and French surveys, but this did not mean that they understood the questions better than if they had been asked in English.


Main final conclusions

1 - The search function must be improved (and we have a precise guide how).
2 - The
set of results must be refined, metadata must be improved.
3 - The
default language of the portal should be English in all countries.
4 - The
online viewer is important and must be improved.
5 - The main target user group is
taxonomists (55 % zoologists, 40 % botanists).
6 - Scientific users come from
Europe (45 %), North America (35 %), South America (8 %) and Australia (5 %).
7 - For attracting new user groups it is indispensable to scan more recent literature, published after 1920.

8 - The
Google search engine is an important referrer.
9 - Google books and Google scholar are important competitors.


Comparison with the previous BHL-Europe survey (Oct-Nov 2009)

- The main results and conclusions were confirmed.
- The BHL-Europe survey gave us slightly more detailed instructions (search functions, website design) that can be used by our technicians. In the view of the present results the BHL-Europe summary section can serve as a good guideline, it is up to date.
- Some issues raised in the BHL-Europe survey have already been improved in the meantime (for example speed).


Outlines for the next survey

1 - We can repeat some questions and compare responses, to see if eventual improvements will have been acknowledged.
2 - We will be able to see if new target user groups will have been attracted.
3 - We would not need to set up several different language versions of the survey, but it would be convenient to allow freetext responses in various languages.
4 - To know more about our potentials to attract general interest readers, and to know more about our limits, we might ask a question, from which time period do you currently use material digitized by BHL, and which time period would you need material.


Data compiled by Francisco Welter-Schultes, in collaboraton with Bianca Libscomb, mainly for the presentation at the Vienna meeting 26 May 2010

Links to AnimalBase contributions for   Meeting Berlin May 2009 (distribution of languages in early zoological literature)

Meeting Leiden Aug 2009 (comparisons of viewers, proposals for improving the portal)    

Meeting Prague Nov 2009 (evaluation of the BHL-Europe internal survey 2009, with instructions for portal design)