UNIPEN Frequently Asked Questions (FAQ)

  1. Database-related questions
  2. Miscellaneous questions
  3. Benchmark-related questions
  4. Format-related question
  5. Software-related questions



1. Database-related questions

1.1 Where is this huge database and can I get it?

HERE!

Furthermore, since nothing is more perishable than an unused test set, there will be future UNIPEN calls for data.

The UNIPEN data format is free, and there exist public samples on:

      ftp://ftp.nici.kun.nl/pub/UNIPEN/forum

1.2 What is this UNIPEN consortium?

The consortium consists of international research groups at 19 universities and research institutes, and research groups at 21 companies, worldwide. The total is thus 40 members:
      1 (Belgium)
      1 (Korea)
      1 (Spain)
      2 (Canada)
      2 (France)
      2 (Italy)
      2 (Japan)
      2 (Netherlands)
      3 (UK)
      4 (Israel)
      6 (Germany)
     14 (USA)

1.3 How many data are in the current UNIPEN collection?

The totals (counted October 1995) are: 40 institutions, donating over 5 million characters, from more than 2200 writers.

1.4 How would I set up my own UNIPEN-compatible collection?

There are some collection tips for word-oriented recognition. However, the rules for character-based or page-based collection are slightly different.

The most important rule is: good annotation. Describe the writer, the writing conditions, the devices used and their settings, the software, the content of the handwritten material, etc. The UNIPEN format offers many keyworded fields to include such information.

Top of UNIPEN FAQ


2. Miscellaneous questions

2.1 How can I refer to UNIPEN in my publications?

Guyon, I., Schomaker, L., Plamondon, R., Liberman, M. & Janet, S. (1994). UNIPEN project of on-line data exchange and recognizer benchmarks, Proceedings of the 12th International Conference on Pattern Recognition, ICPR'94, pp. 29-33, Jerusalem, Israel, October 1994. IAPR-IEEE.

There is a PostScript versions of this paper.

2.2 How can I learn more about UNIPEN?

(From the original E-mail announcement on Scrib-L, Mon, 06 Oct 1997 13:51:41 +0200)

"This is to announce:

           open-unipen@unipen.nici.kun.nl
a mailing list for researchers who are interested in the UNIPEN data format but who are not a member of the UNIPEN consortium.

Awaiting the release of the UNIPEN/NIST database for the public, the UNIPEN data format can already be used and processed by public domain software tools (http://unipen.nici.kun.nl/uptools3/). Some public data already exists: (ftp://ftp.nici.kun.nl/pub/UNIPEN/forum/).

The main entrance for UNIPEN in general is http://unipen.nici.kun.nl/

The open-unipen list allows for the exchange of ideas, data, and software. The directory ftp://ftp.nici.kun.nl/pub/UNIPEN/forum/ can be extended by uploads of data and software.

The unmoderated list is organized by Lambert Schomaker (schomaker@computer.org) at the Nijmegen Institute for Cognition and Information (NICI), The Netherlands.

Subscription to the open-unipen mailing list:

Send an E-mail message to:

           Majordomo@unipen.nici.kun.NL
Containing the line:
           subscribe open-unipen [your.email@adress]

We hope that this service will meet the interests of on-line handwriting recognition researchers who cannot get to the official UNIPEN data."

Archives of the open-unipen list are kept Here

Top of UNIPEN FAQ


3. Benchmark-related questions

3.1 Why these benchmarks?

Currently there is no 'calibration' of on-line handwriting recognizer performance. From experience, it has become clear that the typical academic performances which are claimed have little to do with real-life performance in pen-computing applications. The idea is to give RECmarks to existing on-line handwriting recognition systems.

3.2 What are these benchmarks?

UNIPEN Benchmark overview

Benchmark Description

1a

isolated digits

1b

isolated upper case

1c

isolated lower case

1d

isolated symbols (punctuations etc.)

2

isolated characters, mixed case

3

isolated characters in the context of words or texts

4

isolated printed words, not mixed with digits and symbols

5

isolated printed words, full character set

6

isolated cursive or mixed-style words (without digits and symbols)

7

isolated words, any style, full character set

8

text: (minimally two words of) free text, full character set
Handwritten version

Note that only Benchmark #8 is a realistic, application-oriented test, because the word segmentation problem must also have been solved by the recognizer. No manual word segmentation is allowed in test Benchmark #8.

The benchmarks will be arbitrated by the National Institute of Standards in Technology (NIST) in the USA.

3.3 When are these benchmarks?

A first round has been planned in the beginning of 1998.

Top of UNIPEN FAQ


4. Format-related questions

4.1 What is the UNIPEN format definition?

The UNIPEN format was developed by Isabelle Guyon at AT&T in a cooperation with the industrial UNIPEN members with the goal of having a format that suited all or most requirements. It can be imagined that this was not a trivial task, given the broad range of interests and backgrounds. At the time, no data standard existed at all, everyone had their own in-house format and databases, and there were a few device-dependent protocols and formats. So finally, we obtained the UNIPEN format for on-line handwriting data. Data formats can always be improved, but now we have a standard on which most experts agree, we will stick to it for quite a while!

Read the UNIPEN Version 1.0 Definition

4.2 What is the most powerful attribute of the UNIPEN format? (See screendumps!)

The most powerful aspect of the UNIPEN format comes from the fact that it allows multiple views on the same coordinates. This is achieved by defining hierarchical levels. Furthermore, the 'views', which are defined by a set of .SEGMENT records can be .INCLUDEd from separate files. A good example is given in this handwritten text on the topic of BSE (mad cow disease). The UNIPEN format also allows for the representation of Kanji.

4.3 How do I count the delineations?

The basic segment units in UNIPEN used for counting the delineations are actually "PEN_STREAMS".
   < PEN_STREAM > ::= <.PEN_DOWN> | <.PEN_UP>
Numbering starts at zero.

Example

        .SEGMENT WORD 0-4 ? "fives"
        .PEN_DOWN
            123 567 
            123 567 
            123 567 
            123 567 
        .PEN_UP
            123 890
            123 890
            123 890
        .PEN_DOWN
            123 567 
            123 567 
            123 567 
            123 567 
        .PEN_UP
            123 890
            123 890
            123 890
        .PEN_DOWN
            123 567 
            123 567 
            123 567 
            123 567 

Top of UNIPEN FAQ


5. Software-related questions

5.1 Where can I find software to process data in UNIPEN format?


Top of UNIPEN FAQ

To the UNIPEN homepage


schomaker@computer.org