Concordancers and alternatives for MacOs X

11. Απριλίου 2008, Author: Notis Toufexis

In my present job I am heavily involved with language description: reading through loads of texts, identifying interesting linguistic features, storing them in a custom-build database. That’s good for some phenomena that you can not easily identify with other means; sometimes you just have to use the computer and scan a large amount of texts for an ending or some other easily identifiable pattern. That’s where you need a concordance problem, and that’s where I have a problem with MacOs X.

There simply isn’t a decent concordancing program that runs natively in MacOs X; if there is one and I ‘ve missed it, please let me know! Yes, there are some decent concordancing programs for Windows and yes, I could use them with Parallels or dual booting – it’s just that I a) I am not prepared to purchase a Windows license just for running a concordance program and b) it won’t integrate so good with my current workflow. I still haven’t experimented using any WINE derivates or CrossOver with all concordancing programs (a first try with AntConc for Windows didn’t really work) .

In the times of Mac Os 9, life was much easier. Conc (from SIL) was brilliant (even my wife enjoyed using it); it still runs in some Macs that support Classic but it won’t support Unicode, so this is not an option (and my main Mac in the office runs Mac Os 10.5).

Ideally the perfect concordancing program would

  • fully support Unicode
  • operate on multiple files
  • be aware of a referencing scheme (so that you know where in your texts the string you have identified occurs)

Conc could two the last two – why doesn’t someone at SIL rewritte it for Intel?

So what now? What are the options for someone like me who desperately needs to search a large (ca. 2 mio words) corpus of Medieval Greek texts (I do this kind of job to cover the necessities of life – I am enjoying it sometimes but not always…). Here a list of programs I am currently using:

Laurence Anthony’s AntConc

AntConc is written in Perl and runs in a Mac under Apple’s X11. Installation is straightforward, performance not brilliant but marginally acceptable. AntConc is brilliant as a concordancer (click here for a review) and covers all my needs (regular expressions, word lists, normal and reverse sorting).

A couple of screenshots:

There is one serious flaw though, which relates to X11: improper support of MacOs X keyboard layouts. I can’t input accented Greek characters in AntConc’s search box; there are workarounds (like using the list of words to search in advanced mode) but nothing very straightforward for a not-so-much-organized person in a hurry like myself (who relates heavily on using regular expressions because of the diversity of spelling conventions in his corpus). If I can’t integrate a tool in my workflow, I tend not to use it… A further weak point of AntConc is exporting found datasets; you get flat text files with no structure at all.

Having said that, AntConc is a great concordancing tool and it’s free. It’s great on the PC but on the Mac it is quite time consuming to use.

Jedit: a programmer’s text editor

Well, this is not a concordancer but a text editor written in Java. It’s free and highly configurable and of course supports Unicode and regular expressions. What makes it interesting for my needs is what it’s called a «Hyper Search»: you perform a search and you get the results presented in a separate window – one line of text a time. You click on the result and you are being transferred to the actual passage in the file. Perfect for my needs, almost like a concordance program. It also works with multiple files and I can input my Greek with as many accents as I like in the search box. It operates on text files, so HTML and XML files are covered. Jedit was long time my favorite, until I ‘ve found something that does exactly the same but with added features. Interestingly enough, it’s almost identically called: Jedit X.

Jedit X from Artman 21

Jedit X has the same functionalities with Jedit X but with two major improvements: it’s a native Cocoa application, optimized for Leopard and, it supports a huge range of formats, including RTF & RTFD (RTF with pictures), MS Word, Open Office and others. This means that you can use Jedit X to perform regular expression based searched in a bunch of Word documents – your search results are displayed in a separate window, you click on a result and the file opens in JeditX with the search result highlighted. Not bad at all for a shareware program that will cost you 29$! It integrates perfectly with Leopard, can display HTML files as Rich text documents and performs multi-file searches with a very well designed interface. In short, the perfect tool for a not so techie person, that works with Word files and wants to search information in them. Perfect for my needs (I am a techie person but love the easy solution).

Only AntConc is a full flavored concordance program, Jedit and Jedit X are just alternatives. If your needs are covered with an application that performs regular expression searches across multiple files and your source files are something else then flat text files (or, like me, you don’t always bother converting everything to XML or are not prepared to spend halve your life explaining to your colleagues why they should convert everything to XML), Jedit X is the perfect solution. Until someone rewrites Conc for Intel based Macs, that is.


Casual conc by Yasu Imao

Casual conc is a native, Ruby + Ruby Cocoa based, Unicode compliant, concordancer for MacOs 10.5. Here are a couple of screenshots:

My first impressions are only positive: it handles Greek very well, sorting is fine, all major functionalities are available. It would be even better if it could handle XML files directly but I am sure that the application will only improve in the future.

Blogged with the Flock Browser

Tags: , ,

Αρέσει σε %d bloggers: