The Stoa Consortium » Blog Archive » BICS Supplement 122: THE DIGITAL CLASSICIST 2013

From the Stoa Consortium blog:

We are very pleased to announce the publication of the latest Digital Classicist volume, The Digital Classicist 2013, published by the Institute of Classical Studies, London as part of their BICS series.

SUPP-122-flyer_Page_1_Image_0001
This edited volume collects together peer-reviewed papers that initially emanated from presentations at Digital Classicist seminars and conference panels.

For full details see the publisher’s site and the promotional flyer.

Please ask your library to order a copy.

The Stoa Consortium » Blog Archive » BICS Supplement 122: THE DIGITAL CLASSICIST 2013.

I was one of the peer reviewers for this volume, so I am very happy to see it being published.

Unicode Code Converter

http://rishida.net/tools/conversion/

You can paste in text in Unicode, the tool will convert to several different codes that you might need for development.

This is the above text in decimal code points:
89 111 117 32 99 97 110 32 112 97 115 116 101 32 105 110 32 116 101 120 116 32 105 110 32 85 110 105 99 111 100 101 44 32 116 104 101 32 116 111 111 108 32 119 105 108 108 32 99 111 110 118 101 114 116 32 116 111 32 115 101 118 101 114 97 108 32 100 105 102 102 101 114 101 110 116 32 99 111 100 101 115 32 116 104 97 116 32 121 111 117 32 109 105 103 104 116 32 110 101 101 100 32 102 111 114 32 100 101 118 101 108 111 112 109 101 110 116 46

The tool might be extremely useful if you get text encoded in Unicode and you want to know how it is encoded. If you are dealing with extended Greek, it might be helpful for finding cases of accented letters encoded in two different sets: Greek & Greek extended. For instance:

μ U+03BC: GREEK SMALL LETTER MU (Greek and Coptic)
α U+03B1: GREEK SMALL LETTER ALPHA (Greek and Coptic)
λ U+03BB: GREEK SMALL LETTER LAMDA (Greek and Coptic)
α U+03B1: GREEK SMALL LETTER ALPHA (Greek and Coptic)
κ U+03BA: GREEK SMALL LETTER KAPPA (Greek and Coptic)
ί U+03AF: GREEK SMALL LETTER IOTA WITH TONOS (Greek and Coptic)
ε U+03B5: GREEK SMALL LETTER EPSILON (Greek and Coptic)
ς U+03C2: GREEK SMALL LETTER FINAL SIGMA (Greek and Coptic)

versus

μ U+03BC: GREEK SMALL LETTER MU (Greek and Coptic)
U+0020: SPACE (Basic Latin)
α U+03B1: GREEK SMALL LETTER ALPHA (Greek and Coptic)
U+0020: SPACE (Basic Latin)
λ U+03BB: GREEK SMALL LETTER LAMDA (Greek and Coptic)
U+0020: SPACE (Basic Latin)
α U+03B1: GREEK SMALL LETTER ALPHA (Greek and Coptic)
U+0020: SPACE (Basic Latin)
κ U+03BA: GREEK SMALL LETTER KAPPA (Greek and Coptic)
U+0020: SPACE (Basic Latin)
ί U+1F77: GREEK SMALL LETTER IOTA WITH OXIA (Greek Extended)
U+0020: SPACE (Basic Latin)
ε U+03B5: GREEK SMALL LETTER EPSILON (Greek and Coptic)
U+0020: SPACE (Basic Latin)
ς U+03C2: GREEK SMALL LETTER FINAL SIGMA (Greek and Coptic)

Τhe difference is in the GREEK SMALL LETTER IOTA which has two accented variations in Unicode, WITH TONOS (which is what is being inputted by most operating systems over the keyboard when Greek is selected) and WITH OXIA, which is found in some electronic texts in the Web. Most fonts display this with the same glyph, so confusion is the name of the day for those who are not aware of the problem. You can read more in Nick Nicholas’s pages about Unicode. Limiting yourself to using the fonts Nick is suggesting is rather impractical, and practically the WITH TONOS characters are the ones generally in use.

One Era’s Nonsense, Another’s Norm: Diachronic study of Greek and the Computer

This paper sets out to explore how and why digital editions of texts or text-versions could facilitate a truly diachronic study of the Greek language. It points out shortcomings of existing digital infrastructure and argues in favour of a general shift of focus towards linguistic analysis of transmitted texts with the help of electronic corpora that primarily model medieval manuscripts rather than modern editions.

Published in: Digital Research in the Study of Classical Antiquity, edited by Gabriel Bodard (King’s College London, UK) and Simon Mahony (University College London, UK), Ashgate 2010, ISBN 978-0-7546-7773-4 £ 55.00

For full details and the publishers blurb, see:
http://www.ashgate.com/isbn/9780754677734

You can download the PDF-file here (with thanks to Ashgate for allowing self-archiving of my contribution).

 

Photo by Nemo (Pixabay)

Four useful online tools/resources for Translation into Modern Greek

1

Monolingual Greek dictionary (Use to explore the exact meaning of words, find examples of use and some information on register): Λεξικό της κοινής νεοελληνικής (Ινστιτούτο Νεοελληνικών Σπουδών – Ίδρυμα Μανόλη Τριανταφυλλίδη)
Πύλη για την ελληνική γλώσσα: http://www.greek-language.gr/ (Follow the link: Νέα Ελληνική > Εργαλεία > Ηλεκτρονικά λεξικά on the front page) or bookmark the URL
http://www.greek-language.gr/greekLang/modern_greek/tools/lexica/index.html

Tips:
Use the buttons on the top left to toggle between different headword views.
Check the button “Aναζήτηση και στο σώμα των λημμάτων” to search within the body of each lemma.
Use the “Σύνθετη αναζήτηση” (link to the left of the main search box) for more sophisticated, combined queries (Τύπος λήμματος = part of speech, κλιτικό παράδειγμα = declension paradigm, παράδειγμα χρήσης = example of use, etc.)

2

Lexiscope: Compound online tool (Hyphenator, Speller, Lemmatizer, Morphological Lexicon/Parser and Thesaurus)
http://www.neurolingo.gr/online_tools/lexiscope.htm
You can perform a limited number of searches as an unregistered user.

3

Online English-Greek/Greek-English dictionary
Offered by the online portal in.gr: http://www.in.gr (Search button at the bottom left of the page)
Tip:
To easily access the lexicon bookmark the following URL in your web-browser: http://www.in.gr/dictionary/lookup.asp?Word= This will get you directly to the interface of the lexicon (ignore the message and type your query in the search box.
The search-box accepts both Greek and English headwords and displays results accordingly.

4

IATE Multilingual term base:
http://iate.europa.eu/iatediff/SearchByQueryLoad.do?method=load

Multilingual database of technical terms; technical vocabulary from different domains

Wordle – Beautiful Word Clouds

Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The images you create with Wordle are yours to use however you like. You can print them out, or save them to the Wordle gallery to share with your friends.

Wordle – Beautiful Word Clouds

Blogged with the Flock Browser

Tools for converting Beta code to Unicode

Betacode description:

http://www.tlg.uci.edu/BetaCode.html

 


 

Online tools:

1. Sean Redmond’s Greek Font to Unicode converter: http://www.jiffycomp.com/smr/unicode/

CGI based conversion tool, supports cut&paste.

2. Cental (Centre du traitement automatique du langage) Beta Code to Unicode Converter: http://130.104.253.20/beta2uni/

Lets you upload and convert whole files from the TLG CD ROM to Unicode.

3. Michael Neuhold’s greekconverter: http://members.aon.at/neuhold/antike/grkconv.html (inactive?)

Java-Applets and downloadable Java-Classes for converting between beata code and other encodings.

Applications, JAVA-Classes usw.

1. Epidoc collaborative: Transcoder: http://sourceforge.net/projects/epidoc

Java based converter for plain text files.

2. Lucius Hartmann’s BetaCodeConverter bzw. GreekKeysConverter (Mac OS): http://www.lucius-hartmann.ch/programme/

MacOs applicaton, converts RTF and TXT files from and to to many encodings.

3. Antioch classical languages utility von Ralph Hancock: http://www.users.dircon.co.uk/~hancock/antioch.htm

VBA based conversion utility.

4. Burkhard Meißner’s View and Find: http://www2.hsu-hh.de/hisalt/projects/viewfind.htm

View & Find is a MS-DOS program to interact with, decode, extract, search and automatically index the beta code files on the Thesaurus Linguae Graecae E and Packard Humanities Institute #5.3 and #7 CD ROMs. (thanks to B. Meißner for the info)

5. betautf8 – a fast, flexible beta code to unicode (utf8) file converter: http://www2.hsu-hh.de/hisalt/projects/betautf8.htm (thanks to B. Meißner for the info)

TLG and PHI search engines supporting Unicode

 

1. Diogenes: http://www.dur.ac.uk/p.j.heslin/Software/Diogenes/index.php

Perl based, cross platform search engine for the PHI and TLG CD Roms.

2. Workplace Pack vom SilverMountain Software: http://www.silvermountainsoftware.com/workpack.html

Unicode aware search engine program for the TLG CD ROM.

Creating a database for the ‘Grammar of Medieval Greek’ project

“Creating a database for the ‘Grammar of Medieval Greek’ project”, in: Ι. Μαυρομάτης (εκδ.), Πρακτικά του διεθνούς συνεδρίου Neograeca Medii Aevi VI: Πρώιμη νεοελληνική δημώδης γραμματεία. Γλώσσα, παράδοση και ποιητική, Οκτώβριος 2005, Ιωάννινα (in press)

The main goal of the “Grammar of Medieval Greek project” is to produce a com­prehensive Grammar of Medieval Greek in book form;1 an electronic publication of the material collected in the process, or in the Grammar itself, is not planned for the time being. However, from its beginning the research project relies heavily upon the use of electronic resources; this is a reasonable decision when one has to collect and organize large amounts of data. Nowadays it is also often considered as a prerequisite for funding a large-scale research project. This paper aims at de­scribing all issues that are related to the creation of a custom-built electronic database and tries not to concentrate on technical aspects (as the interested read­er can find a full description of technical matters elsewhere) but on issues con­cerning modelling of data and research methodology.

You can download here the PDF-file

Concordancers and alternatives for MacOs X

In my present job I am heavily involved with language description: reading through loads of texts, identifying interesting linguistic features, storing them in a custom-build database. That’s good for some phenomena that you can not easily identify with other means; sometimes you just have to use the computer and scan a large amount of texts for an ending or some other easily identifiable pattern. That’s where you need a concordance problem, and that’s where I have a problem with MacOs X.

There simply isn’t a decent concordancing program that runs natively in MacOs X; if there is one and I ‘ve missed it, please let me know! Yes, there are some decent concordancing programs for Windows and yes, I could use them with Parallels or dual booting – it’s just that I a) I am not prepared to purchase a Windows license just for running a concordance program and b) it won’t integrate so good with my current workflow. I still haven’t experimented using any WINE derivates or CrossOver with all concordancing programs (a first try with AntConc for Windows didn’t really work) .

In the times of Mac Os 9, life was much easier. Conc (from SIL) was brilliant (even my wife enjoyed using it); it still runs in some Macs that support Classic but it won’t support Unicode, so this is not an option (and my main Mac in the office runs Mac Os 10.5).

Ideally the perfect concordancing program would

  • fully support Unicode
  • operate on multiple files
  • be aware of a referencing scheme (so that you know where in your texts the string you have identified occurs)

Conc could two the last two – why doesn’t someone at SIL rewritte it for Intel?

So what now? What are the options for someone like me who desperately needs to search a large (ca. 2 mio words) corpus of Medieval Greek texts (I do this kind of job to cover the necessities of life – I am enjoying it sometimes but not always…). Here a list of programs I am currently using:

Laurence Anthony’s AntConc

AntConc is written in Perl and runs in a Mac under Apple’s X11. Installation is straightforward, performance not brilliant but marginally acceptable. AntConc is brilliant as a concordancer (click here for a review) and covers all my needs (regular expressions, word lists, normal and reverse sorting).

A couple of screenshots:












There is one serious flaw though, which relates to X11: improper support of MacOs X keyboard layouts. I can’t input accented Greek characters in AntConc’s search box; there are workarounds (like using the list of words to search in advanced mode) but nothing very straightforward for a not-so-much-organized person in a hurry like myself (who relates heavily on using regular expressions because of the diversity of spelling conventions in his corpus). If I can’t integrate a tool in my workflow, I tend not to use it… A further weak point of AntConc is exporting found datasets; you get flat text files with no structure at all.

Having said that, AntConc is a great concordancing tool and it’s free. It’s great on the PC but on the Mac it is quite time consuming to use.

Jedit: a programmer’s text editor

Well, this is not a concordancer but a text editor written in Java. It’s free and highly configurable and of course supports Unicode and regular expressions. What makes it interesting for my needs is what it’s called a “Hyper Search”: you perform a search and you get the results presented in a separate window – one line of text a time. You click on the result and you are being transferred to the actual passage in the file. Perfect for my needs, almost like a concordance program. It also works with multiple files and I can input my Greek with as many accents as I like in the search box. It operates on text files, so HTML and XML files are covered. Jedit was long time my favorite, until I ‘ve found something that does exactly the same but with added features. Interestingly enough, it’s almost identically called: Jedit X.

Jedit X from Artman 21

Jedit X has the same functionalities with Jedit X but with two major improvements: it’s a native Cocoa application, optimized for Leopard and, it supports a huge range of formats, including RTF & RTFD (RTF with pictures), MS Word, Open Office and others. This means that you can use Jedit X to perform regular expression based searched in a bunch of Word documents – your search results are displayed in a separate window, you click on a result and the file opens in JeditX with the search result highlighted. Not bad at all for a shareware program that will cost you 29$! It integrates perfectly with Leopard, can display HTML files as Rich text documents and performs multi-file searches with a very well designed interface. In short, the perfect tool for a not so techie person, that works with Word files and wants to search information in them. Perfect for my needs (I am a techie person but love the easy solution).

Conclusion:
Only AntConc is a full flavored concordance program, Jedit and Jedit X are just alternatives. If your needs are covered with an application that performs regular expression searches across multiple files and your source files are something else then flat text files (or, like me, you don’t always bother converting everything to XML or are not prepared to spend halve your life explaining to your colleagues why they should convert everything to XML), Jedit X is the perfect solution. Until someone rewrites Conc for Intel based Macs, that is.

Update:

Casual conc by Yasu Imao

Casual conc is a native, Ruby + Ruby Cocoa based, Unicode compliant, concordancer for MacOs 10.5. Here are a couple of screenshots:











My first impressions are only positive: it handles Greek very well, sorting is fine, all major functionalities are available. It would be even better if it could handle XML files directly but I am sure that the application will only improve in the future.

Blogged with the Flock Browser

Tags: , ,