www.digitalmars.com Home | Search | D | Comments
Last update Mon Aug 22 2005
D
Language
Phobos
Comparisons


object

std
 std.base64
 std.boxer
 std.compiler
 std.conv
 std.ctype
 std.date
 std.file
 std.format
 std.gc
 std.intrinsic
 std.math
 std.md5
 std.mmfile
 std.openrj
 std.outbuffer
 std.path
 std.process
 std.random
 std.recls
 std.regexp
 std.socket
 std.socketstream
 std.stdint
 std.stdio
 std.cstream
 std.stream
 std.string
 std.system
 std.thread
 std.uri
 std.utf
 std.zip
 std.zlib

std.windows

std.linux

std.c
 std.c.stdio

std.c.windows

std.c.linux

std.utf

Encode and decode UTF-8, UTF-16 and UTF-32 strings. For more information on UTF-8, see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8.

Note: For Win32 systems, the C wchar_t type is UTF-16 and corresponds to the D wchar type. For linux systems, the C wchar_t type is UTF-32 and corresponds to the D utf.dchar type.

UTF character support is restricted to (0 <= character <= 0x10FFFF).

class UtfError
Exception class that is thrown upon any errors. The members are:
idx
Set to the index of the start of the offending UTF sequence.

alias ... dchar
An alias for a single UTF-32 character. This may become a D basic type in the future.

bit isValidDchar(dchar c)
Test if c is a valid UTF-32 character. Returns true if it is, false if not.

dchar decode(char[] s, inout uint idx)
dchar decode(wchar[] s, inout uint idx)
dchar decode(dchar[] s, inout uint idx)
Decodes and returns character starting at s[idx]. idx is advanced past the decoded character. If the character is not well formed, a UriError is thrown and idx remains unchanged.

void encode(inout char[] s, dchar c)
void encode(inout wchar[] s, dchar c)
void encode(inout dchar[] s, dchar c)
Encodes character c and appends it to array s.

void validate(char[] s)
void validate(wchar[] s)
void validate(dchar[] s)
Checks to see if string is well formed or not. Throws a UtfError if it is not. Use to check all untrusted input for correctness.

char[] toUTF8(char[] s)
char[] toUTF8(wchar[] s)
char[] toUTF8(dchar[] s)
Encodes string s into UTF-8 and returns the encoded string.

wchar[] toUTF16(char[] s)
wchar* toUTF16z(char[] s)
wchar[] toUTF16(wchar[] s)
wchar[] toUTF16(dchar[] s)
Encodes string s into UTF-16 and returns the encoded string. toUTF16z is suitable for calling the 'W' functions in the Win32 API that take an LPWSTR or LPCWSTR argument.

dchar[] toUTF32(char[] s)
dchar[] toUTF32(wchar[] s)
dchar[] toUTF32(dchar[] s)
Encodes string s into UTF-32 and returns the encoded string.

Feedback and Comments

Add feedback and comments regarding this page.
Copyright © 1999-2005 by Digital Mars, All Rights Reserved