Initiative to write a coptic dictionary app

Nofri everyone,

I was thinking recently of writing a cross-platform Coptic-Arabic/English English/Arabic-Coptic Dictionary app. I started typing down the words from "Moawad Dawood" but it took me a log time to finish one page of 1000+ pages. Our congregation is small so won't expect much help from here.

Does anyone know where to find a very comprehensive dictionary in formats such as Excel?

Or if we can start a group open source project with a bigger number of people, and each finishes a page a day, the apps can be shipped before the end of the summer?

Any suggestions will be highly appreciated. :)

Ougai




Comments

  • Ekhrestos anesty
    @tenacpi
    I'm so glad you chose Muawad Dawood's dictionary, you prove to seriously care about the authenticity of the source. However, it seems you can read Arabic, but remember not many people too. I'm so sorry that I cannot promise to help or engage in any projects until at least the end of this year. God bless your and everyone else's efforts.. I'd indeed like to help in any way whatsoever but cannot do typing as such..
    Oujai
  • Typing (or retyping) a dictionary will take years. It won't work. You need to find a suitable OCR or some sort of automation program. Secondly, using Excel is a bad idea since Excel is not designed for database storage or input. The excel file will get too big and get corrupted. A program with Sql server or something else will work better. Finally, Crum's dictionary is already available in database format under Marcion. If you're willing to overlook the fact that the program is designed for Gnostic works, then you'll do just fine with Crum's dictionary. 

    Ideally, we would want a collaborative effort that uses multiple dictionaries. I have seen words not found in Dawood's dictionary but in other dictionaries and vice versa. Ideally, a Coptic wictionary or Wiki site would work and it should be incorporated with the current Coptic Wiki site. (My only problem is that is hard to format in Wiki format for a dictionary. But Wiki offers the best integration and search features.)
  • I wasn't thinking of making an excel sheet but rather to convert to CSV that I can deserialize.. OCR wnon't woek because of the coptic font, I couldn't find anyone that would handle a custom font.
    I honestly have a hard time using Crum's. 


  • edited May 2014
    Thank you for all your hard work, I'm sure this project will be a blessing for the whole community.

    Unfortunately I don't understand Arabic, so I can't be of any practical assistance; however I may be able to offer some technical assistance.

    Excel/Open Office CSV formats are good for data that is normally presented as a table; with rows & columns. Dictionary data is not normally displayed in this way, and so it makes the data entry kind of awkward.

    I definitely understand the desire to leverage the CSV as medium for *SQL importing; however there are plenty of ways to load data into SQL beyond CSV.

    Typically, I've seen dictionaries stored as XML. The "standard" format is called XDXF; however I'm not too sure that's going to make data entry any less awkward than using CSVs.

    JSON might be an option for structured data, and it would allow ETL into SQL or NoSQL data stores pretty easily using common technologies like Javascript, PHP, Python, Ruby, etc.

    If you were going to use one of these storage formats it might be easier to create a simple WYSIWYG interface for your volunteers to enter data, which can then either speak to the SQL server directly or store it as an intermediate format like XML, CSV, JSON, etc.

    If you go that route, I'd definitely suggest setting up some kind of backup method whether it's raw SQL dumps or intermediate storage as XML or JSON.

    As for OCR, I've seen some work done on Greek & Coptic OCR;  I'd be more concerned about the Arabic. Regarding Coptic OCR, you can find some info about it at http://www.moheb.de/ocr.html

    I hope you can forgive my boldness, but I would like to make two requests: 

    1) Please encode the Coptic as Unicode instead of using the common transliteration fonts. This is the correct way to handle Coptic, and you're going to have to store the Arabic in Unicode anyway... Moheb.de also has some open source Unicode Coptic Fonts available, and I've written a basic HTML Unicode Coptic Keyboard using one of them if you need it.

    2) Please consider licensing the final database under some kind of open data license, so everyone can benefit from it. Coptic technology is neglected and the more resources we have, the better! Two good options are: 


    A license that includes a "Share-Alike" clause or provision might be ideal, since it would mean any derivatives of the database would also be open licensed.

    Thank you again, and may God bless your efforts!
Sign In or Register to comment.