Center for Electronic Texts in the Humanities

Center for Electronic Texts in the Humanities

CETH (Center for Electronic Texts in the Humanities) has moved, virtually. Our new email address is:

Subject: Russian corpora available by ftp/gopher/www

Recently I have been spending a lot of time distributing Russian text corpora that I have collected; I have about 14 MB of various literary and non-literary texts, and word has gotten out. I'm happy to do this, but one-by-one distribution is not very efficient! I have now found a home for them on the ftp/gopher/www server at Via ftp, they are in the directory pub/central_eastern_europe/russian/corpora; explicit directions for retrieval by all three methods are given below. Along with the texts I have posted an inventory of files (which will be updated periodically as I acquire and post more texts), an ascii character map of the Cyrillic coding used, and a set of bitmapped Mac fonts that I use to display these files. Questions about the texts and their preparation should be addressed to me; technical questions about the server and file retrieval can be addressed to the person handling the /russian directory, Dr. Jan Labanowski, at

I would be delighted to receive any and all additional Russian corpora, or news of where more can be found. World Wide Web:

     George Fowler                       GFowler@Indiana.Edu [Email]
     Dept. of Slavic Languages           (812) 855-2829 [office]
     Ballantine 502                      (317) 726-1482 [home]
     Indiana University                  (812) 855-2624/-2608/-9906 [dept.]
     Bloomington, IN  47405  USA         (812) 855-2107 [dept. fax]

Alex Catalogue of Electronic Texts on the World-Wide Web!

The Alex Catalogue of Electronic Texts on the Internet is now available on the World-Wide Web at or gopher:// or gopher://

Alex helps users to find and retrieve the full-text of documents on the Internet. It currently indexes over 1800 books and shorter texts by author and title, incorporating texts from Project Gutenberg, Wiretap, the On-line Book Initiative, the Eris system at Virginia Tech, the English Server at Carnegie Mellon University, Project Bartlesby, CCAT, the on-line portion of the Oxford Text Archive, and many others. Alex includes no serials.

Project Gutenberg's own FTP site is at You can login as anonymous; use your email address as the password. All the etext are located in pub/etext in specific directories designated by the year in which an etext was released. An index of available etext is in the files INDEX100.GUT and INDEX200.GUT (mrcnext is a UNIX machine, so it is case specific; you must use the capital letters).

Please note!! mrcnext is no longer the default server for the Project Gutenberg Etexts!! Please use, as explained in detail below:

ftp or ftp
login:  anonymous
password:  yourname@your.machine
cd pub
cd etext
cd gutenberg
cd etext95  [or 94, 93, 92, 91 or 90.  70's and 80's are in /etext90]
get filename  (be sure to set bin, if you get the .zip files)
get more files
To subscribe to the Project Gutenberg Newsletter:
Send the following message from your login:

sub gutnberg Firstname Lastname

send to:  listserv@uiucvmd (bitnet) (internet)

Or to volunteer, use this message.

sub gutvol-l Firstname Lastname

A new WWW edition of Laurence Sterne's _A Sentimental Journey through France and Italy_ is now available from Stony Run, Richard Bear's home page. It makes heavy use of entities, nested font commands, and

, so it may not look quite right, especially in the French passages, if you are not using Netscape.

Stony Run: Sterne:

Richard Bear

  1. OBI, the Online Book Initiative. ftp to, directory: obi Also via gopher to, choose the OBI from menus. A nice point about this, apart from having lots of stuff, is that they call their directories by the name of the author (eg Charles.Dickens) rather than the year the book happened to be put on the net as the GP does (etext/93, etext/94 etc. -- *terribly* helpful... :-) )

  2. OTA, the Oxford Text Archives. C'mon, Humanists, we all forgot it exists? They used to charge for service/postage in the days when they shipped you magnetic tapes, but since they made their stuff available for ftp they charge only for things that are still in copyright (which I assume does not include our friend Dickens). Last time I looked up the statistics they had 1,300 titles in 28 languages, I'm sure it's more now. Ftp to, directory: ota, there's a list of what's on it in the file textarchive.list. The files themselves are in subdirs. of ota; there's also an "info" file in the ota directory which presumably tells you how to tame this beastie.

  3. CETH, Center for Electronic Texts in the Humanities: joint project of Rutgers and Princeton. When I heard about it, it had just a small collection and its aim was both to provide etexts and to provide software for text analysis. Sorry, don't have an address but I'm sure Elaine can help me out here, 'cos one of the things it did have was the texts of the Brown Univ. Women Writers' Project. I doubt it's got Dickens but it's getting to be late at night and my editorial judgement is working slower than my fingers; I figured Humanists in general might like to know about it.

Incidentally, while we're on the subject, there's a project to catalog e-text projects: CPET, Catalog of Projects in Electronic Text, run by Georgetown Univ.'s Center for Text and Technology (CTT -- I just discovered that PCMCIA "really" stands for People Can't reMember Computer Industry Acronyms, and I'm starting to get their point.) Includes info. on several hundred projects, arranged by subject, all in the humanities. Ftp to, directory: cpet_projects_in_electronic_text Yes, this is a vax and those are underline characters. You can also try gophering to Georgetown U.'s gopher, it's supposed to be available from there too.


On behalf of the BNC Consortium, OUCS is very happy to announce that we expect to start distributing copies of the long-awaited and British National Corpus to licence holders during the week beginning 22 May.

This corpus contains 100 million words, from over 4000 different texts carefully selected to give maximal coverage of the varieties of modern British English, both spoken and written. The corpus is automatically tagged for part of speech, using the CLAWS stochastic parser developed at UCREL, and marked up in SGML, following the TEI Guidelines for corpus encoding.

The corpus is currently available under academic licence within the European Union only. The first release, comprising three CDs and a detailed technical manual, currently costs under 200 pounds. For full details, including ordering and licensing information, please see our web pages at or write to the address below.

INSTITUUT VOOR NEDERLANDSE LEXICOLOGIE On-line access to 27 million Words Dutch Newspaper Corpus for non-commercial purposes.

The Institute for Dutch Lexicology INL offers you the possibility to consult a text corpus of over 27 million words of Dutch newspaper text, by the international computer network. In 1994, a 5 Million Words Corpus with diversified composition has been made accessible in a similar way.

The retrieval system is essentially the same as that for the 5 Million Words Corpus 1994. It allows you to search for single words or for word patterns, including some predefined syntactic patterns that can be changed by the user. Searches concern the levels of word form, part of speech (POS), and head word, both separately and in combination by use of Boolean operators and proximity searches. During the search, data concerning frequency and distribution over the texts are provided at several levels. The output most often is a list of items, or a series of concordances (words in context) with a variable, user-defined textual context. Sorting facilities may support your analysis of the output data. With some limitations due to copyright, the output of your searches can be transfered to your own computer by e-mail. It is not allowed to transfer complete texts or substantial text parts.

Most of the data has not been corrected, neither on the level of the text, nor on the level of POS and headword. POS and headword have automatically been assigned to the word forms in the electronic text by lingware developed at the INL.

The provider of the texts has given permission for use of the materials for non-commercial, research purposes only.

Please note that for an optimal use of the retrieval system, the use of a VT 220 (or higher) terminal, or an appropriate terminal-emulator (e.g. Kermit) is recommended. In order to get access to this corpus, an individual user agreement has to be signed. An electronic user agreement form can be obtained from our mailserver Mailserv@Rulxho.Leidenuniv.NL. Type in the body of your e-mail message: SEND [27MLN95]AGREEMNT.USE. For access to the 5 Million Words Corpus 1994, a separate user agreement is required, which can be obtained from the same mailserver, by the message SEND [5MLN94]AGREEMNT.USE .

Please make a hard copy of the agreement form, sign it, keep a copy yourself, and return a signed copy to: Institute for Dutch Lexicology INL, P.O. Box 9515, 2300 RA Leiden. Fax: 31 71 27 2115.

After receipt of the signed user agreement, you will be informed about your username and password.

If you need additional information, please send an e-mail message to Helpdesk@Rulxho.Leidenuniv.NL, or send a fax to Mrs. dr. J.G. Kruyt.

We have the following CD-ROMs which may be of interest to the list :

TITLE                           PRICE (UK sterling)

20,000 leagues under the sea     24
American poetry                  31
Bookshelf (dict.,thes.,quotes)   27
Bronte Sisters                   19
Christmas carol                  19
Classic Library                  19
Collins electronic dictionary    59
Complete bookshop                19
Concise Oxford dictionary        43
Crucible                         99
Dickens                          25
Don Quixote                      19
Electronic Home Library          19
Fall of the house of Usher       19
Famous novels                    39

Ken Gourlay
3 Hayfield
Edinburgh EH12 8UJ

Tel & fax +44 (0)131 339 5374 (24 hours)
Internet Worldwide Web :

Home page

There are two new texts in the Edmund Spenser Home Page:
Edmund Spenser's doleful dirge Daphnaida [1591,1596] is now available on the Edmund Spenser Home Page.

URL of home page:
URL of Daphnaida:

Prothalamion Colin Clout comes home againe


Richard Bear

New publication at the CETEDOC: the Thesaurus Pseudo-Dionysii Areopagitae, versiones latinae cum textu graeco

See: Thanks in advance.

Jean Schumacher

The Victorian Women Writers Project is an electronic collection of texts by British women writers of the late Victorian period. Currently, the collection includes works by Louisa Bevington, Amy Levy, Eliza Keary, Maud Keary and Dollie Radford, with works by Mathilde Blind, Dinah Maria Mulock Craik and Louise Guiney in preparation. Currently, the collection includes volumes of poetry and verse drama, with plans to include other literary and critical texts in the future. Considerable attention will be given to the accuracy and completeness of the texts, and to accurate bibliographical descriptions of them. The Victorian Women Writers Project is supported by Indiana University's Library Electronic Text Resource Service (LETRS) and is available for use through the World Wide Web at .

Perry Willett
General Editor, Victorian Women Writers Project
Main Library
Indiana University

Coleridge and Wordsworth's landmark 1798 Lyrical Ballads has been updated from ASCII to html and is now acessible from the URL: Richard Bear

Leibniz in WWW critical edition [Reply-To:]

First-ever critical edition made expressly for the Net. It's Leibniz. URL:

The etext of Gay's Beggar's Opera has been updated to html with linked notes.

The TLG's new web page can be accessed at We invite suggestions for further information which we might provide in order to assist TLG users.
Theodore F. Brunner, Director
Thesaurus Linguae Graecae

The Consortium for Latin Lexicography would like to announce the Home Page for the Electronic Thesaurus Linguae Latinae, located at:

These web pages describe the planned development of a TLL in electronic form. We hope to continue to publish progress reports on the Electronic TLL at this site as work proceeds.

For more information on these web pages, the Electronic TLL project, or the Consortium for Latin Lexicography, please contact CLL Director Patrick Sinclair at or CLL Systems Analyst Ann DeVito at

The Consortium for Latin Lexicography would like to announce that the Home Page for the Electronic Thesaurus Linguae Latinae has moved. The new URL is:

The Duke Papyrus Archive on the World Wide Web at has now virtually completed the task, which began in September of 1992, of making the Duke papyri more accessible. Available are records and images of all 1373 inventory numbers of papyri in the Duke University Collection. (About 200 images remain to be added.) The approximately 2000 images of these texts are presented in three ways: a "thumbnail," a 72 dpi image and a 150 dpi version. All images are linked to catalogue records.

on-line hypertext edition of the Diderot-d'Alembert _Encyclopedie_ in the ARTFL database at the Univ. of Chicago.

I have mounted a small sample of the Encyclopedie with a couple of experimental images. Additional images for the Encyclopedie can be found at:
We have begun full data entry and hope to have the first volumes ready sometime the in the summer.

While I'm at it, let me plug our exhibition of Renaissance Dante in Print, 1472-1629 which contains some 450 images of every Italian edition of Dante printed during the Renaissance:

In its current form, the site contains an English-Latin HTML edition of DesCartes' "Meditations on First Philosophy".

The URL is:

The texts contain only navigational links. We encourage anyone interested to download these texts and to create annotated editions and then to share their new editions by loading them on their Web servers. To facilitate this collaboration we have added a page "DesCartes' Myriogon" where we will provide links to editions based upon our primary sources.

Dictionnaire de l'Academie francaise: Base Academie Echantillon en ligne.

Composante du Projet d'informatisation des huit editions completes du Dictionnaire de l'Academie francaise, la Base Academie Echantillon vient d'etre mise en ligne sur l'internet a l'adresse suivante:

La Base Echantillon comprend un choix d'articles indexes, les memes pour chaque edition, un index des mots-clefs metalinguistiques, un index des occurrences cachees, les pages de titre en images GIF et des notices explicatives. Cette base est concue a la fois comme modele propose a la critique et comme outil de travail didactique, linguistique et metalexicographique. Le soussigne invite tout commentaire et toute correction.

Russon Wooldridge
University of Toronto


On-line Sample Database of the Dictionnaire de l'Academie francaise.

A component of the Dictionnaire de l'Academie francaise Database Project (computerization of the eight complete editions, 1694-1935), the Sample Academie Database has been put on the internet at the following address:

The Sample Database includes a selection of articles, the same for each edition, an index of metalinguistic keywords, an index of hidden occurrences, GIF images of the title pages and explanatory notes. It is conceived both as a model for criticism and as a didactic, linguistic and metalexicographical resource.

The undersigned invites comments and corrections.

Russon Wooldridge
Department of French, Trinity College
University of Toronto, Toronto M5S 1H8, Canada
Tel: 1-416-978-2885 -- Fax: 1-416-978-4949

In late 1995, the UM Humanities Text Initiative mounted the most recent and now complete version of the Patrologia Latina Database. While the PLD is restricted to access by UM faculty, staff, and students, the web-based support resources such as the list of authors by volume and (for other implementors) the editorial policy, are unrestricted. These resources and search screens can be found at
A sample GIF of a result screen is at:

For more information on the HTI and access to publicly available collections, please use

John Price-Wilkin

UM HTI American Verse:

The University of Michigan Humanities Text Initiative, along with the University of Michigan Press, is proud to announce the release of a new textual resource, the American Verse Project. American Verse is a growing collection of texts encoded in SGML using the TEI Guidelines. The collection is made accessible in SGML, dynamically rendered HTML, and as a searchable database. As with all of the other Humanities Text Initiative resources, simple word and phrase searches are supported, as well as proximity searches, and searches for verses or paragraphs containing two or three words/phrases. The project uses an unusual model for rights for a project involving a university press: no restrictions or costs are placed individual and research use of the materials practical restrictions and cost; the texts are available for sale to other publishers and agencies who wish to provide access to the texts from their own system. We will continue to expand the collection as time and resources allow and hope to add ten more volumes in the next month.

The following (10) texts have been added to the American Verse Project collection, bringing the total collection to 35. As before, all are part of a searchable collection; also, each can be browsed in HTML or can be retrieved in its entirety in SGML (TEI encoding). In this release we include two more works by African-American women, and will soon release three more works (noted at the end of this announcement).

We are also, with this release, including a list of nearly 400 American poets who have published material before 1920. We will continue to add names to the list and hope to gradually expand the list to include bibliographies for the poets and to link to other materials on the 'net which are not a part of the American Verse Project.

The trilingual HTML edition of Rene Descartes' "Meditations on First Philosophy" is now available at:

The texts are:

1) The 1641 Latin

2) The 1647 Duc de Luynes French Translation [corrected by Descartes]

3) The 1901 John Veitch English Translation

Paragraph by paragraph cross-navigational links are provided. The paragraphs have also been numbered -- using the Latin edition for "paragraph authority" -- to facilitate references to these texts.

Further information about the Linguistic Data Consortium and its available corpora can be accessed on the Linguistic Data Consortium WWW Home Page at Information is also available via ftp at under pub/ldc; for ftp access, please use "anonymous" as your login name, and give your email address when asked for password.

Edmund Spenser's 1591 Complaints has been completed in html and is now acessible from the URL:

The Center for Electronic Texts in the Humanities (CETH) is pleased to announce the availability on the World-Wide Web of three pilot projects in SGML markup according to the guidelines of the TEI (Text Encoding Initiative). These are the first in what we hope will be a continuing series of projects to demonstrate various ways of using TEI encoding to create Humanities resources.

The projects' front page is at URL:

A new WWW edition of Sir Philip Sidney's pageant The Lady of May is now available at the URL: It includes an introduction, which can be skipped over with a click :-), and clickable notes.

Date: Sun, 26 May 1996
From: Humanist
To: Humanist Discussion Group
Subject: 10.0056 ARTEM: new project in e-text
X-To: Humanist

Humanist Discussion Group, Vol. 10, No. 56.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
Information at

  [1]   From:
        Subject: new project

This is to announce a new project launched by the "Centro Linceo Interdisciplinare" of the "Accademia Nazionale dei Lincei" [via della Lungara, 10 - 00165 Roma]

The project, named "Archivio Testuale Multimediale" (ARTEM), will pursue three main goals:

1) To build a repository of electronic texts in Italian language, selected on the basis of the best editorial reliability, and fully encoded according to the best standards available. The repository will be freely accessible in www network.

2) To link the repository to other similar ones, offering the same scientific reliability.

3) To build a catalogue of existing electronic texts in Italian language, providing a statement of their editorial reliability and encoding methodology, and stating if and how they are available.

Special attention is devoted to the problems of encoding, following the SGML procedures, according to the standards proposed by TEI. The previous analysis of textual features, to obtain the full list of elements to encode, will be declared and discussed.

Collaboration is evisaged with the Oxford Text Archive, Princeton's CETH, the Tresor de la Langue Francaise, the Institut fur deutsche Sprache of Mannheim, and all academic Institutions dealing with electronic texts and interested in this project.

All those interested in the project, and especially those who can provide information on existing e-texts in Italian, may contact the following e-address:

Tito Orlandi,
Accademia dei Lincei,
and Universita di Roma La Sapienza

Return to H-CLC Homepage