C++
本土化 | Localizations

std::codecvt

STD::编解码器

Defined in header
template< class InternT, class ExternT, class State > class codecvt;

类std::codecvt封装字符串(包括宽字节和多字节)从一种编码到另一种编码的转换。执行的所有文件I/O操作std::basic_fstream<CharT>使用std::codecvt<CharT, char,std::mbstate_t>在溪流中注入的区域面。

二次

二次

继承图

标准库提供了四个独立的%28区域设置无关%29的专门化:

在标头中定义<locale>

*。

STD::codecvt<char,char,std::mbstate[医]t>恒等转换

STD::codecvt<char16[医]T,char,std::mbstate[医]T>自C++11%29以来UTF-16和UTF-8%28之间的转换

STD::codecvt<char32[医]T,char,std::mbstate[医]T>自C++11%29以来UTF-32和UTF-8%28之间的转换

STD::codecvt<wchar[医]T,char,std::mbstate[医]t>系统%27s本机宽度与单字节窄字符集之间的转换

此外,在C++程序中构造的每个locale对象都实现了它自己的%28 locale特定于这四种专门化的%29版本。

成员类型

Member typeDefinition
intern_typeInternT
extern_typeExternT
state_typeState

成员函数

(constructor)constructs a new codecvt facet (public member function)
(destructor)destructs a codecvt facet (protected member function)
outinvokes do_out (public member function)
ininvokes do_in (public member function)
unshiftinvokes do_unshift (public member function)
encodinginvokes do_encoding (public member function)
always_noconvinvokes do_always_noconv (public member function)
lengthinvokes do_length (public member function)
max_lengthinvokes do_max_length (public member function)

成员对象

Member nameType
id (static)std::locale::id

受保护成员函数

do_out virtualconverts a string from internT to externT, such as when writing to file (virtual protected member function)
do_in virtualconverts a string from externT to internT, such as when reading from file (virtual protected member function)
do_unshift virtualgenerates the termination character sequence of externT characters for incomplete conversion (virtual protected member function)
do_encoding virtualreturns the number of externT characters necessary to produce one internT character, if constant (virtual protected member function)
do_always_noconv virtualtests if the facet encodes an identity conversion for all valid argument values (virtual protected member function)
do_length virtualcalculates the length of the externT string that would be consumed by conversion into given internT buffer (virtual protected member function)
do_max_length virtualreturns the maximum number of externT characters that could be converted into a single internT character (virtual protected member function)

继承自STD::编解码器[医]底座

Member typeDefinition
enum result { ok, partial, error, noconv };Unscoped enumeration type

Enumeration constantDefinition
okconversion was completed with no error
partialnot all source characters were converted
errorencountered an invalid character
noconvno conversion required, input and output types are the same

下面的示例使用一个地区读取UTF-8文件,它在codecvt<wchar中实现UTF-8转换[医]T,char,mbstate[医]并使用std::codecvt的标准专门化之一将UTF-8字符串转换为UTF-16。

二次

#include <iostream> #include <fstream> #include <string> #include <locale> #include <iomanip> #include <codecvt> // utility wrapper to adapt locale-bound facets for wstring/wbuffer convert template<class Facet> struct deletable_facet : Facet { template<class ...Args> deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {} ~deletable_facet() {} }; int main() { // UTF-8 narrow multibyte encoding std::string data = u8"z\u00df\u6c34\U0001f34c"; // or u8"zß水?" // or "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9f\x8d\x8c"; std::ofstream("text.txt") << data; // using system-supplied locale's codecvt facet std::wifstream fin("text.txt" // reading from wifstream will use codecvt<wchar_t, char, mbstate_t> // this locale's codecvt converts UTF-8 to UCS4 (on systems such as Linux) fin.imbue(std::locale("en_US.UTF-8") std::cout << "The UTF-8 file contains the following UCS4 code points: \n"; for (wchar_t c; fin >> c; ) std::cout << "U+" << std::hex << std::setw(4) << std::setfill('0') << c << '\n'; // using standard (locale-independent) codecvt facet std::wstring_convert< deletable_facet<std::codecvt<char16_t, char, std::mbstate_t>>, char16_t> conv16; std::u16string str16 = conv16.from_bytes(data std::cout << "The UTF-8 file contains the following UTF-16 code points: \n"; for (char16_t c : str16) std::cout << "U+" << std::hex << std::setw(4) << std::setfill('0') << c << '\n'; }

二次

产出:

二次

The UTF-8 file contains the following UCS4 code points: U+007a U+00df U+6c34 U+1f34c The UTF-8 file contains the following UTF-16 code points: U+007a U+00df U+6c34 U+d83c U+df4c

二次

另见

Characterconversionslocale-defined multibyte(UTF-8, GB18030)UTF-8UTF-16
UTF-16mbrtoc16 / c16rtomb(with C11's DR488)codecvt<char16_t, char, mbstate_t>codecvt_utf8_utf16<char16_t>codecvt_utf8_utf16<char32_t>codecvt_utf8_utf16<wchar_t>N/A
UCS2c16rtomb(without C11's DR488)codecvt_utf8<char16_t> codecvt_utf8<wchar_t>(Windows).codecvt_utf16<char16_t> codecvt_utf16<wchar_t>(Windows).
UTF-32mbrtoc32 / c32rtomb.codecvt<char32_t, char, mbstate_t> codecvt_utf8<char32_t> codecvt_utf8<wchar_t>(non-Windows).codecvt_utf16<char32_t> codecvt_utf16<wchar_t>(non-Windows).
system wide:UTF-32(non-Windows)UCS2(Windows)mbsrtowcs / wcsrtombs use_facet<codecvt <wchar_t, char, mbstate_t>>(locale).NoNo

codecvt_basedefines character conversion errors (class template)
codecvt_bynamecreates a codecvt facet for the named locale (class template)
codecvt_utf8 (C++11)(deprecated in C++17)converts between UTF-8 and UCS2/UCS4 (class template)
codecvt_utf16 (C++11)(deprecated in C++17)converts between UTF-16 and UCS2/UCS4 (class template)
codecvt_utf8_utf16 (C++11)(deprecated in C++17)converts between UTF-8 and UTF-16 (class template)

© cppreference.com

在CreativeCommonsAttribution下授权-ShareAlike未移植许可v3.0。

http://en.cppreference.com/w/cpp/locale/codecvt