Text Analysis Tools and Resources



Scholar:

AN ONLINE LISTSERV FOR TEXT ANALYSIS
AND NATURAL LANGUAGE APPLICATIONS

Sponsored by Queens College and the City University of New York

Funded by the Andrew W. Mellon Foundation

Director: Joseph Raben jqrqc@cunyvm.cuny.edu

Associate Directors:
Clement Dunbar cadlc@cunyvm.cuny.edu
Peter Batke l64a3779@jhuvm.bitnet

Technical Advisor:
Lusi N. Altman lnaqc@cunyvm.cuny.edu


TEXT Technology
The Journal of Computer Text Processing

With volume 4 (1994), TEXT Technology has continued its unanticipated level of success in a new format with perfect binding and 7-by-9-inch pages. Now published four times each year, the journal includes articles and reviews about all facets of using computers for the creation, processing, communication, and analysis of texts. It is designed for academic and corporate researchers, writers, editors, and teachers. The quarterly journal contains timely reviews of books and software, discussions of applications for the analysis of literary works and other texts, bibliographic citations, and much more.

Recent issues of TEXT Technology have contained articles about analyzing the novels of Jane Austen, collating variant texts, programming in Icon and SPITBOL-386, as well as a directory of electronic text centers, and reviews of FrameMaker and of five new word processors including WordPerfect for DOS, for Windows, and for the Macintosh.

Submissions of articles are welcome. They should be sent to the Editor as ASCII files via email to JohnsonE@columbia.dsu.edu. Writers of book or software reviews are encouraged to contact the Editor before submitting reviews. Authors will normally receive notices of acceptance and referees' comments promptly via email. Yearly subscription rates remain unchanged: in the U.S., Individuals: $45.00; Institutions: $72.00. Canadian orders add $7.00; all other nations add $15.00 (all prices U.S. funds).

To subscribe using a MasterCard or Visa credit card, send name and address, card number and expiration date via email to

LangnerS@columbia.dsu.edu.

To subscribe by regular mail, send credit card information, check, or institutional purchase order to TEXT Technology

               114 Beadle Hall
               Dakota State University
               Madison, SD 57042-1799 USA.

The Center for Electronic Texts in the Humanities invites you to visit the CETH WWW server at
http://www.ceth.rutgers.edu

for information about Electronic Texts in the Humanities, Cataloging and Documenting Electronic Texts, The Rutgers Inventory of Machine-Readable Texts in the Humanities, SGML and the Text Encoding Initiative, The CETH Summer Seminar, ETEXTCTR Archives, Bibliography for Humanities Computing and more.

The CETH-newsletter is also available on our new WWW server at http://cethmac.princeton.edu.
CETH also moderates etextctr@lists.princeton.edu which is a general discussion list about electronic texts and electronic text centers. Earlier this year, Etextctr started to post brief reviews of recent articles in library and humanities computing publications. Etextctr archives and the directory of electronic text centers are also on our WWW server.

TACT 2,1 gamma may be obtained by Gopher to gopher.epas.utoronto.ca in the subdirectory, Centre for Computing in the Humanities. You may have to dig around a little (our gopher is still under construction and things get moved around form time to time), but it's in there either under "Humanities research material" or "software".

You may also use anonymous ftp to epas.utoronto.ca (long as "anonymous" and use your Internet or Bitnet address as password). Move into /pub/cch/tact/tact2.1gamma directory with the "cd" command. Make sure the file type is set to "binary". Then "get" each of the compressed TACT files. When you download these files to your microcomputer, also be sure to set the file type to "binary".

You can also order TACT (software and manual) by post from

        Centre for Computing in the Humanities
        130 St. George Street, 14th Floor
        Toronto, Ontario  M5S 1A5  Canada

Note, prepayment of $30 Canadian or $25 US is required. Cheques are acceptable if drawn on a US or Canadian account. Otherwise, please remit payment with an international money order.

Please let me know if you have further questions.

In the meantime, if you'd like to try out TACTweb to get a sense of what it is like, you can connect Netscape, Lynx or Mosaic to the TACTweb demonstration site. The URL is:

http://tactweb.humanities.mcmaster.ca/
TACTweb is experimental software that allows an individual to publish TACT textual databases on the Web. It requires an IBM PC on the Internet running Windows 3.1 (we haven't tried TACTweb with Windows 95 yet), some public-domain software, and the skills of a single moderately advanced Windows user. Any individual with this equipment can make personally created TDBs available to any other Web user. By using WWW forms these users get access to many of the interactive services that TACT provides them -- but without requiring them to use TACT itself, or have a copy of the published TACT database on their own machine. They can formulate queries against a database using the same query language used in TACT/UseBase, and can get results that look something like those produced by UseBase in response. Because the WWW Forms language acts as the interface, the TACTweb user doesn't need to learn how to interact with TACTweb other than to learn how formulate the queries. Because they gain access to the database via a WWW browser such as Netscape, Mosaic or Lynx, they can use it whether they are a PC, Macintosh or Unix user.

Tom Horton asks about text analysis software. Those interested might want to look at TACTweb. TACTweb is a version of TACT that is a cgi that can be connected to a PC based web server. To try it or to learn how you can set up your own copy, check out the following URL:

http://tactweb.mcmaster.ca/

All the best,

Geoffrey Rockwell

NEW RELEASE from the

                     LINGUISTIC DATA CONSORTIUM
                             and the
                   CENTRE FOR LEXICAL INFORMATION

This message announces the Second Release of the CELEX CD-ROM with lexical data from the Dutch Centre for Lexical Information and the Linguistic Data Consortium.

This CD-ROM contains an enhanced, expanded version of the German lexical database (2.5), featuring approximately 1000 new lemma entries, revised morphological parses, verb argument structures, inflectional paradigm codes, and a corpus type lexicon. A complete PostScript version of the German Linguistic Guide is also included, in both European A4-format and American Letter format. For German, the total number of lemmas included is now 51,728, while all their inflected forms number 365,530.
Institutions that have membership in the LDC during the 1995 or 1996 Membership Years will be able to receive CELEX for research purposes only at no additional charge, in the same manner as all other text and speech corpora published by the LDC.

Non-members can receive a copy of CELEX for research purposes only for a fee of $150. If you would like to order a copy of this corpus, please email your request to ldc@unagi.cis.upenn.edu, or fax it to (215) 573-2175. If you need additional information before placing your order, or would like to inquire about membership in the LDC, please send email or call (215) 898-0464.

Further information about the LDC and its available corpora can be accessed on the Linguistic Data Consortium WWW Home Page at URL http://www.cis.upenn.edu/~ldc. More information specific to CELEX can be accessed via hyperlinks from this Home Page. Information is also available via ftp at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use "anonymous" as your login name, and give your email address

ETEXTCTR Review provides abstracts of current articles from journals of interest to those working with electronic texts in a research setting. If you would like to contribute to ETEXTCTR Review or recommend an article for review, write to Mary Mallery, Moderator of ETEXTCTR, at e-mail: <mallery@gandalf.rutgers.edu>.

The Organizers of the 5th Biannual Conference of the International Society for the Empirical Approach to Literature accept abstracts of papers for presentation at the conference August 21-25, 1996 in Banff, Alberta. Please see conference Call for Papers at http://www.ualberta.ca/ARTS/ricl.html With my best regards,

Steven TOTOSY de ZEPETNEK Ph.D.
Adjunct Professor of Comparative Literature Department of Modern Languages and Comparative Studies University of Alberta, Edmonton, Alta., T6G 2E6 Ph.: 403-492-4776; Fax: 403-492-5662; Home Ph.: 403-438-6486 E-mail: stotosy@gpu.srv.ualberta.ca

This is to announce

          Who's Who in the Metamorphoses of Ovid:
            The Analytical Onomasticon Project

at the URL http://www.epas.utoronto.ca/~mccarty/wlm/Onomasticon/, which describes research in progress toward a comprehensive reference work, in print and electronic form, to persons and places in Ovid's Met. (Those who know Ovid will understand how problematic the idea of a "person" is in his poem, and so, I hope, will be particularly interested to find out how they can be computerized.)

The Web document includes an overview of the Project, brief discussion of its literary critical basis, explanation of the approach it takes, summary of the tagging scheme for the electronic text, sample indexes, and a simulation of an electronic-book version of the Onomasticon using Netscape "frames".

>I am compiling an annotated list of concordance packages and would >appreciate assistance in identifying worthy items and their notable >attributes.

Dans le domaine des logiciels pour Mac il me semble qu'Hyperbase d'Etienne Brunet devrait etre mentionne. Cf.:

http://lolita.unice.fr/~brunet/hyperbase.html

Cordialement,

  Remi JOLIVET                  Tel. +41 21 692 30 07
  Faculte des Lettres           Fax. +41 21 692 30 45
  Linguistique              Courrier electronique:
  BFSH2/4087                rjolivet@ulys.unil.ch
  CH-1015 Lausanne
Remi.Jolivet@ling.unil.ch

I have been developing a project for quite some time on visaulizing text in three-dimensional space. It has some of the properties of a concordance and you might find it helpful. The web site for the project is:
http://www.tc.cornell.edu/~tonyg/Language3D/Language.Viz1.html

The site looks and works best on a workstation of some kind but on a good PC it should also work fine. I would appreciate your perspective on the project and of those in the humanist list. Can some posting be made with regards to the Web site? I hope you find the work of some value and interest.

Respectfully,

Antonio Gonzalez-Walker
Cornell Universtiy


I am working on a Windows concordance program, which I hope to publish in a couple of weeks time. I would like to have the program tested on a variety of systems and so I am making a beta-version available. Probably the best way to get hold of the program is to start by linking to a section of a Corpus Linguistics page: http://www.ruf.rice.edu/~barlow/corpus.html#Software.

I know about concordance use in linguistics and in language teaching, but I am ignorant about its use in the humanities in general. Are concordance programs being used in literature studies? In what way? Any pointers to articles, etc. would be appreciated.

Michael Barlow
Rice University/Athelstan


WordCruncher for Windows is shipping. For more information, please contact Johnston & Company. Please note that we have moved our offices to:

        Johnston & Company
        Electronic Publishers & Consultants
        P.O. Box 6627
        Bloomington, IN 47407
        (812) 339-9996
        (812) 339-9997 (fax)
        e-mail: johnston@ansel.intersource.com

WordNet is an online lexical reference system. Word forms in WordNet are represented in their familiar orthography; word meanings are represented by synonym sets (synset) - lists of synonymous word forms that are interchangeable in some context. Two kinds of relations are recognized: lexical and semantic. Lexical relations hold between word forms; semantic relations hold between word meanings.

To learn more about WordNet, read "Five Papers on WordNet", available via anonymous ftp and in printed form. WordNet is available in several different packages, based on computer platform. This message contains instructions for obtaining the WordNet system and "Five Papers on WordNet".
We are also establishing a 'contrib' directory. If you have a package that you would like to have considered for addition, please send email to wordnet@princeton.edu.
WordNet is available via ftp, as described below, or you may use and/or ftp WordNet using a World Wide Web browser such as Mosaic or Netscape. Our URL is: "http://www.cogsci.princeton.edu/~wn/". We will add links to user applications and papers. Please send email to the above address.


Announcing a NEW RELEASE from the

LINGUISTIC DATA CONSORTIUM:

THE PENN TREEBANK PROJECT

Release 2

Detailed questions about the corpus may be sent to treebank@unagi.cis.upenn.edu, while questions and requests for obtaining Treebank Release 2 should be sent to ldc@unagi.cis.upenn.edu

International Quantitative Linguistics Association (IQLA)

In the last years, Quantitative Linguistics has undergone a rapid and promising development, with respect to both theory and application, and quantitative methods are constantly gaining importance in all branches of language and text research.
IQLA, Universitaet Trier, FB II, LDV, D-54286 Trier, Germany e-mail: koehler@LDV01.Uni-Trier.de

Journal:

The official organ of the Association is:

The Journal of Quantitative Linguistics Swets & Zeitlinger, P.O. Box 825,
NL-2160 SZ Lisse, The Netherlands

Membership:

IQLA membership includes subscription to The Journal of Quantitative Linguistics. The actual IQLA annual fees are:

60 US $ (including JQL) for non-student members 20 US $ for students
200 US $ (including JQL) for institutions

There are plenty of other collation programs around, and they generally try to address this problem of dealing with many texts. (If you've only got two texts, after all, it may be less work to collate them by eye, in view of the amount of work involved to enter them into the computer accurately.) A look through the MLA Bibliography, or even just back issues of Computers in the Humanities and Literary & Linguistic Computing, will turn some up. Because it's very new, I'll mention the recent version for the Mac of Peter Shillingsburg's CASE program, which is more oriented towards working with prose texts than many collation programs are; it's available at

ftp://ftp.adfa.oz.au/pub/mac/MacCASE

Readers of this list may be interested to know that I have linked the full texts of some articles that I have published about computers, writing, and literary study to my Web home page. To read them, connect with

http://www.dsu.edu/~johnsone/eric.html

click on "scholarship" then select one of the highlighted titles.

http://www.en.utexas.edu/~cwrl/index.html

CWRL-- an electronic journal devoted to the intersections of computers, writing, rhetoric, and literature.

CWRL is published by the Computer Writing Research Lab, a facility of the Division of Rhetoric and Composition, at the University of Texas at Austin CWRL publishes articles that address computer-aided pedagogy in the fields of Rhetoric, Composition and Literature and is available solely on the World Wide Web.

As of Thursday afternoon, the UC Irvine Online Critical Theory Resource (CTR) is available on the World Wide Web. You may access it from the UCI Libraries' home page <http://www.lib.uci.edu> by following the links to UCI Sources, Special Collections, and the UCI Online Critical Theory Resource. Alternatively, you can jump directly to the Special Collections Page by pointing your web browser to <http://sun3.lib.uci.edu/~scctr/>.
Tom Horton asks about text analysis software. Those interested might want to look at TACTweb. TACTweb is a version of TACT that is a cgi that can be connected to a PC based web server. To try it or to learn how you can set up your own copy, check out the following URL:

http://tactweb.mcmaster.ca/

All the best,
Geoffrey Rockwell


In my Web site you can find a document on ways to use CONC 1.76 for the Macintosh with Chinese texts, and a link to download the option file with the settings I use for this purpose:

http://vega.unive.it/~pregadio/conc/conc.html

Those settings work with Japanese texts as well.

Fabrizio Pregadio


Many thanks to those Humanists who sent me notes about concordancers and text-analytic software. I have incorporated all the items into my page, again at the URL

http://www.cch.epas.utoronto.ca:8080/cch/1001h/06soft.html

Corrections and suggestions about the contents of this page are most welcome. Note that I have incorporated a local copy of Etienne Brunet's page on HyperBase, with his blessing. (Transmission from Nice to Toronto is very, very poor during most of my waking hours; I assume others in N. America have the same problem with trans-Atlantic communication via the Web.)

During the summer of 1996, Dakota State University will offer CHUM 650 Computing for the Humanities: a course that students can complete by receiving and sending materials via Internet. The three-semester-hour course is offered for graduate credit.

INSTRUCTOR: Eric Johnson, Ph.D.

COURSE DESCRIPTION:
A study of computer applications in the humanities such as analysis of texts, arranging data from research, and formatting for printing and desktop publishing.

The focus of the course in 1996 will be on analysis of texts using computer programs created by Prof. Johnson. The programs and instructions for their use will be provided to all enrolled students.
Students should register for the course prior to May 15. They may register by completing a form on the Web at:

http:/www.dsu.edu/distance-ed/interapp.html

or by sending email to

dsuinfo@columbia.dsu.edu

ADDITIONAL INFORMATION:
Answers to frequently asked questions about CHUM 650 can be found on the Web at

http://www.dsu.edu/~johnsone/chumfaq.html Information can be requested from the Admissions Office and Registrar by sending email to

dsuinfo@columbia.dsu.edu

A Web page similar to the description you are reading can be found at

http://www.dsu.edu/~johnsone/chum.html

Anyone seeking information about the CHUM 650 may, of course, send email to the instructor, Eric Johnson, at

johnsone@jupiter.dsu.edu

LETRS (The Library Electronic Text Resource Center) at Indiana University has a collection of guides to electronic texts and tools (including one for TACT) available on its web pages at:

<URL http://www.indiana.edu/~letrs/text-tools/softwareoverview.html>

EPITEST is an experimental software for the structural analysis of action structures in literary texts. It is based on concepts central to formalist and structuralist narratology, including the models developed by Vladimir Propp, Claude Bremond, Tzvetan Todorov, Algirdas Greimas and Thomas Pavel. The aim of EPITEST is to facilitate the detailed differential analysis of action-structures in larger samples of pre-encoded literary texts. A detailed, updated description of the EPITEST-project including graphs is now available at the following address:

URL: http://ourworld.compuserve.com/homepages/Jan_C_Meister/homepage.htm

Any comments, criticism, hints and questions will be greatly appreciated !

Thanks,

Prof.Dr. Jan Christoph Meister
Arbeitsstelle zur Sozialgeschichte der Literatur Literaturwissenschaftliches Seminar
Universitaet Hamburg
Von Melle Park 6
20 146 Hamburg
Tel / Fax: 04532-7166
E-Mail: 100 306.3565@ Compuserve.Com or fs6a029@server2.rrz.uni-hamburg-de

Enjoy Nathan S. Kline, M.D., "Factifuging", at the URL http://www.princeton.edu/~mccarty/misc/factifuging.html

Return to H-CLC Homepage