[Opendnssec-develop] HSMs use UTF-8 characters
Rick van Rein
rick at openfortress.nl
Tue May 20 14:20:50 UTC 2014
Hi Matthijs and Sion,
I am working on the libhsm code, auditing it. One thing I am running into is character sets. PKCS #11 uses RFC 2279 strings (older UTF-8 style) and the other code assumes ASCII.
There are two ways out of this:
- only support ASCII — thus constraining token labels and PIN codes
- pass UTF-8 codes to the libhsm-user as wide characters
When we only wish to support ASCII, we should reject other content, or remove character codes > 0x80 because we do not interpret them along the lines of RFC 2279.
When we decide to support rfc2279, we should use the facilities in C to represent strings in Unicode, using wchar_t. This type is supported with a lot of compiler functions, including printf (“%ls”, my_wide_string). It is defined in a compiler-dependent manner, but must be able to carry all compiler-supported locales.
We cannot ignore UTF-8 like we have to date. There are a few openings for potential abuse, possibly in token labels or entered PINs:
* Describe the ‘\0’ character in an UTF-8 code of more than one byte, none of which is 0x00, and cause confusion elsewhere
* Place a more-bytes-to-follow code before the ‘\0’ (ASCII NUL) that ends a C-string — except when using a (bad but imagineable) UTF-8 interpreter
* Strings may be provided under RFC 2279 and interpreted under RFC 3629 or ASCII (which are both stricter, a subset of RFC 2279)
I think we should continue to accept the UTF-8 coding of PKCS #11 but then communicate to libhsm using programs with wchar_t instead of char, and change the routines that print it to %ls instead of %s, and perhaps a few other changes are needed to integrate with the locale. Does this sound like the right choice?
Cheers,
-Rick
More information about the Opendnssec-develop
mailing list