toufexis.info

The TLG and copyright

Posted by in Blog

This window comes up when you search for a word or browse a text on the Thesaurus Linguae Graecae website. I haven’t visited the site for a while, so I can’t really tell when this was introduced. I am just wondering if it is technically possible to disallow copying of the text while browsing the TLG. Colleagues have reported that the TLG suspended their access to the site due to suspicious browsing behavior but I have no picture about how the mechanisms work or if they really exist in the first place.

My issue is this: Maybe I am wrong, but the introduction of this pop-up window shows that users do try to copy the texts digitised by the TLG for their own use and that the TLG-Project is trying to secure its rights to the electronic texts it makes available. It would be interesting to know why users try to do this. If it is because they want to use other digital tools that the TLG doesn’t offer, why not let them do so? If it’s a cost-related issue, why not introduce a download fee or something similar? Or do a user survey and try to build the tools users really want. Why not allow users to pass the text to other concordancers available on the net, like the Voyeur tools? There must be a way to combine the sustainability of the TLG-project with the actual needs of the user community… What do you think?

2

Concordancers and alternatives for MacOs X

Posted by in Blog

In my present job I am heavily involved with language description: reading through loads of texts, identifying interesting linguistic features, storing them in a custom-build database. That’s good for some phenomena that you can not easily identify with other means; sometimes you just have to use the computer and scan a large amount of texts for an ending or some other easily identifiable pattern. That’s where you need a concordance problem, and that’s where I have a problem with MacOs X.

There simply isn’t a decent concordancing program that runs natively in MacOs X; if there is one and I ‘ve missed it, please let me know! Yes, there are some decent concordancing programs for Windows and yes, I could use them with Parallels or dual booting – it’s just that I a) I am not prepared to purchase a Windows license just for running a concordance program and b) it won’t integrate so good with my current workflow. I still haven’t experimented using any WINE derivates or CrossOver with all concordancing programs (a first try with AntConc for Windows didn’t really work) .

In the times of Mac Os 9, life was much easier. Conc (from SIL) was brilliant (even my wife enjoyed using it); it still runs in some Macs that support Classic but it won’t support Unicode, so this is not an option (and my main Mac in the office runs Mac Os 10.5).

Ideally the perfect concordancing program would

  • fully support Unicode
  • operate on multiple files
  • be aware of a referencing scheme (so that you know where in your texts the string you have identified occurs)

Conc could two the last two – why doesn’t someone at SIL rewritte it for Intel?

So what now? What are the options for someone like me who desperately needs to search a large (ca. 2 mio words) corpus of Medieval Greek texts (I do this kind of job to cover the necessities of life – I am enjoying it sometimes but not always…). Here a list of programs I am currently using:

Laurence Anthony’s AntConc

AntConc is written in Perl and runs in a Mac under Apple’s X11. Installation is straightforward, performance not brilliant but marginally acceptable. AntConc is brilliant as a concordancer (click here for a review) and covers all my needs (regular expressions, word lists, normal and reverse sorting).

A couple of screenshots:












There is one serious flaw though, which relates to X11: improper support of MacOs X keyboard layouts. I can’t input accented Greek characters in AntConc’s search box; there are workarounds (like using the list of words to search in advanced mode) but nothing very straightforward for a not-so-much-organized person in a hurry like myself (who relates heavily on using regular expressions because of the diversity of spelling conventions in his corpus). If I can’t integrate a tool in my workflow, I tend not to use it… A further weak point of AntConc is exporting found datasets; you get flat text files with no structure at all.

Having said that, AntConc is a great concordancing tool and it’s free. It’s great on the PC but on the Mac it is quite time consuming to use.

Jedit: a programmer’s text editor

Well, this is not a concordancer but a text editor written in Java. It’s free and highly configurable and of course supports Unicode and regular expressions. What makes it interesting for my needs is what it’s called a «Hyper Search»: you perform a search and you get the results presented in a separate window – one line of text a time. You click on the result and you are being transferred to the actual passage in the file. Perfect for my needs, almost like a concordance program. It also works with multiple files and I can input my Greek with as many accents as I like in the search box. It operates on text files, so HTML and XML files are covered. Jedit was long time my favorite, until I ‘ve found something that does exactly the same but with added features. Interestingly enough, it’s almost identically called: Jedit X.

Jedit X from Artman 21

Jedit X has the same functionalities with Jedit X but with two major improvements: it’s a native Cocoa application, optimized for Leopard and, it supports a huge range of formats, including RTF & RTFD (RTF with pictures), MS Word, Open Office and others. This means that you can use Jedit X to perform regular expression based searched in a bunch of Word documents – your search results are displayed in a separate window, you click on a result and the file opens in JeditX with the search result highlighted. Not bad at all for a shareware program that will cost you 29$! It integrates perfectly with Leopard, can display HTML files as Rich text documents and performs multi-file searches with a very well designed interface. In short, the perfect tool for a not so techie person, that works with Word files and wants to search information in them. Perfect for my needs (I am a techie person but love the easy solution).

Conclusion:
Only AntConc is a full flavored concordance program, Jedit and Jedit X are just alternatives. If your needs are covered with an application that performs regular expression searches across multiple files and your source files are something else then flat text files (or, like me, you don’t always bother converting everything to XML or are not prepared to spend halve your life explaining to your colleagues why they should convert everything to XML), Jedit X is the perfect solution. Until someone rewrites Conc for Intel based Macs, that is.

Update:

Casual conc by Yasu Imao

Casual conc is a native, Ruby + Ruby Cocoa based, Unicode compliant, concordancer for MacOs 10.5. Here are a couple of screenshots:











My first impressions are only positive: it handles Greek very well, sorting is fine, all major functionalities are available. It would be even better if it could handle XML files directly but I am sure that the application will only improve in the future.

Blogged with the Flock Browser

Tags: , ,

0