「文字コード」の版間の差分

提供: ArchWiki
ナビゲーションに移動 検索に移動
38行目: 38行目:
 
== トラブルシューティング ==
 
== トラブルシューティング ==
   
  +
Encoding problems are usually due to two programs communicating with different encodings, with one side typically not using UTF-8, resulting in [[Wikipedia:mojibake|mojibake]].
* MP3 ファイルのエンコーディングの問題を修正するには、{{Pkg|mp3unicode}} を使用してください。
 
   
  +
{{Warning|It is highly recommended to set the codeset of your locale to {{ic|.UTF-8}}. Otherwise, conversion from UTF-8 to non-Unicode encoding can result in a loss of information. }}
=== 抽出されたファイルのエンコーディングが間違っている場合 ===
 
   
  +
=== Incorrect archive encoding ===
Windows の古いバージョン(XP、Vista、および 7)は、圧縮されたファイルの内容に対して異なるエンコーディングを使用します。展開するには、以下のコマンドを使用してください。
 
   
  +
On older versions of Windows (XP, Vista, and 7), File Explorer uses different encoding when creating a zip archive with certain locales. To extract properly, use ''unzip -O'' followed with the target encoding. E.g: CP936 is a common encoding in old versions of Windows.
$ unzip -O CP932 ''file.zip''
 
  +
  +
$ unzip -O CP936 ''file.zip''
  +
  +
If unsure about the needed charset, dry-run without extraction by adding the {{ic|-l}} flag:
  +
  +
$ unzip -lO SJIS ''file.zip''
  +
  +
Japanese versions of Windows encode ZIP archives with Shift-JIS. Use ''shift-jis'':
  +
  +
$ unzip -O shift-jis nihongo.zip
  +
  +
Chinese versions of Windows encode ZIP archives with ''gbk'':
  +
  +
$ unzip -O gbk ''file''.zip
  +
  +
Alternatively, use {{AUR|unzip-natspec}} for auto-detecting the targeted encoding.
  +
  +
=== Incorrect file name encoding ===
  +
  +
Use {{Pkg|convmv}} for encoding-conversion {{ic|mv}}:
  +
  +
$ convmv -f ''SOURCE_ENCODING'' -t UTF-8 --notest --nosmart file
  +
  +
{{ic|--notest}} is used to dry-run without moving. After figuring out the original encoding using {{ic|-f}} (e.g: for Chinese {{ic|GBK}}), remove it to proceed with the move operation. Note that {{ic|--smart}} skips conversion if it is already UTF8-encoded. Use {{ic|convmv --list}} to find the supported encodings.
  +
  +
=== Incorrect file encoding ===
  +
  +
Use the {{ic|iconv}} command to convert the format. For example:
  +
  +
$ iconv -f ''SOURCE_ENCODING'' -t UTF-8 -o new-file origin-file
  +
  +
{{ic|-f}} specifies the original encoding and {{ic|-t}} specifies the output encoding. Use {{ic|iconv -l}} to query all supported encodings and {{ic|-o}} to specify the output file.
  +
  +
==== Vim ====
  +
  +
If the locale is UTF-8, opening other char-encoded files may be garbled. You can add a fallback adding to ''vimrc'' a line similar to:
  +
  +
set fileencodings=utf8,cp936,gb18030,big5
  +
  +
Alternatively, you can explicitly set it by {{ic|:set fileencoding{{=}}ansi}}. Vim will do the conversion via ''iconv'' automatically. See {{ic|:h charset-conversion}}.
  +
  +
=== Incorrect MP3 ID3 tag encoding ===
  +
To modify the MP3 file tag, convert using {{Pkg|python-mutagen}} or {{Pkg|mp3unicode}}:
  +
  +
$ mid3iconv -e ''SOURCE_ENCODING'' XXX.mp3
  +
  +
If file modification is undesired, you can tweak the behavior of media players. For players that use [[GStreamer]] as the backend, such as [[Rhythmbox]] and totem, set the [[environment variables]]:
  +
  +
export GST_ID3_TAG_ENCODING=GBK:UTF-8:GB18030
  +
export GST_ID3V2_TAG_ENCODING=GBK:UTF-8:GB18030
  +
  +
For Beep media player, select MPEG Audio plugin in {{ic|preferences > plugins > media > title >}} then tick {{ic|Disable ID3v2 and Convert non-UTF8 ID3 tags to UTF8}}, and choose the correct encoding.
  +
  +
Quod Libet player supports tag editing and setting ID3v2 encoding. This can be set in {{ic|~/.quodlibet/config}}:
  +
  +
{{hc|~/.quodlibet/config|2=
  +
...
  +
id3encoding = gbk
  +
...
  +
}}
  +
  +
{{Note|Quod Libet supports utf8 encoding by default.}}
  +
  +
=== Incorrect mount encoding ===
  +
  +
Generally, the mounted character set is different from the locales, which can be set by modifyinig [[fstab]]. If the locale is utf8, modify the line to:
  +
  +
{{hc|/etc/fstab|2=
  +
...
  +
/dev/sdxx /media/win ntfs defaults,iocharset=utf8 0 0
  +
}}
  +
  +
If the locale is GBK, it should be:
  +
  +
{{hc|/etc/fstab|2=
  +
...
  +
/dev/sdxx /media/win ntfs defaults,iocharset=cp936 0 0
  +
...
  +
}}
  +
  +
=== Incorrect Samba encoding ===
  +
  +
When using Arch as a [[Samba]] server, adding the following line to {{ic|/etc/samba/smb.conf}} can solve the garbled problem of Windows clients:
  +
  +
{{hc|/etc/samba/smb.conf|2=
  +
...
  +
unix charset=gb2312
  +
...
  +
}}
  +
  +
=== Incorrect FTP encoding ===
  +
  +
If you use UTF8 locale, the downloaded file name from a non-Unicode-encoded server might be garbled. For lftp, make the following settings under {{ic|.lftp/rc}}:
  +
  +
{{hc|.lftp/rc|
  +
...
  +
set ftp:charset "gbk"
  +
set file:charset "UTF-8"
  +
...
  +
}}
  +
  +
For gftp, you can do the following settings in {{ic|.gftp/gftprc}}:
  +
  +
{{hc|.gftp/gftprc|2=
  +
...
  +
remote_charsets=gb2312
  +
...
  +
}}
  +
  +
However, the downloaded file name is still garbled and needs to be patched and compiled. The patch address is: https://www.teatime.com.tw/%7Etommy/linux/gftp_remote_charsets.patch

2024年8月16日 (金) 23:26時点における版

関連記事

文字コードは、バイトを読み取り可能な文字に解釈するプロセスです。UTF-8 は 2009 年以降主要なエンコーディングとなっており、事実上の標準として推進されています[1]

UTF-8

ターミナル

以下は UTF-8 をサポートするいくつかのターミナルのリストです。

Gnome-terminal または rxvt-unicode

これらのアプリケーションは、UTF-8 ロケールから起動する必要があります。そうでないと UTF-8 サポートがなくなります。上記の指示に従って en_US.UTF-8 ロケール(またはお使いのローカルの UTF-8 代替)を有効にし、デフォルトのロケールに設定した後、再起動してください。

URL encoding

URIs accept US-ASCII characters only and use percent-encoding to encode non-ASCII characters. This can result in very long and human-unreadable URIs.

In Firefox, it is possible to copy decoded URLs by enabling the browser.urlbar.decodeURLsOnCopy flag in about:config, or by inserting a space to the start of the URL, then selecting it (with the space) and copying it. However, this trick doesn't work on Chromium, and there is no equivalent flag. Alternatively, select starting at the end of the URL until right after the https:// part, then copy.

For command line usage, you can use python-urllib3 to translate encoded URLs from stdin.

$ python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read().strip()))"

トラブルシューティング

Encoding problems are usually due to two programs communicating with different encodings, with one side typically not using UTF-8, resulting in mojibake.

警告: It is highly recommended to set the codeset of your locale to .UTF-8. Otherwise, conversion from UTF-8 to non-Unicode encoding can result in a loss of information.

Incorrect archive encoding

On older versions of Windows (XP, Vista, and 7), File Explorer uses different encoding when creating a zip archive with certain locales. To extract properly, use unzip -O followed with the target encoding. E.g: CP936 is a common encoding in old versions of Windows.

$ unzip -O CP936 file.zip

If unsure about the needed charset, dry-run without extraction by adding the -l flag:

$ unzip -lO SJIS file.zip

Japanese versions of Windows encode ZIP archives with Shift-JIS. Use shift-jis:

$ unzip -O shift-jis nihongo.zip

Chinese versions of Windows encode ZIP archives with gbk:

$ unzip -O gbk file.zip

Alternatively, use unzip-natspecAUR for auto-detecting the targeted encoding.

Incorrect file name encoding

Use convmv for encoding-conversion mv:

$ convmv -f SOURCE_ENCODING -t UTF-8 --notest --nosmart file

--notest is used to dry-run without moving. After figuring out the original encoding using -f (e.g: for Chinese GBK), remove it to proceed with the move operation. Note that --smart skips conversion if it is already UTF8-encoded. Use convmv --list to find the supported encodings.

Incorrect file encoding

Use the iconv command to convert the format. For example:

$ iconv -f SOURCE_ENCODING -t UTF-8 -o new-file origin-file

-f specifies the original encoding and -t specifies the output encoding. Use iconv -l to query all supported encodings and -o to specify the output file.

Vim

If the locale is UTF-8, opening other char-encoded files may be garbled. You can add a fallback adding to vimrc a line similar to:

set fileencodings=utf8,cp936,gb18030,big5

Alternatively, you can explicitly set it by :set fileencodingテンプレート:=ansi. Vim will do the conversion via iconv automatically. See :h charset-conversion.

Incorrect MP3 ID3 tag encoding

To modify the MP3 file tag, convert using python-mutagen or mp3unicode:

$ mid3iconv -e SOURCE_ENCODING XXX.mp3

If file modification is undesired, you can tweak the behavior of media players. For players that use GStreamer as the backend, such as Rhythmbox and totem, set the environment variables:

export GST_ID3_TAG_ENCODING=GBK:UTF-8:GB18030
export GST_ID3V2_TAG_ENCODING=GBK:UTF-8:GB18030

For Beep media player, select MPEG Audio plugin in preferences > plugins > media > title > then tick Disable ID3v2 and Convert non-UTF8 ID3 tags to UTF8, and choose the correct encoding.

Quod Libet player supports tag editing and setting ID3v2 encoding. This can be set in ~/.quodlibet/config:

~/.quodlibet/config
...
id3encoding = gbk
...
ノート: Quod Libet supports utf8 encoding by default.

Incorrect mount encoding

Generally, the mounted character set is different from the locales, which can be set by modifyinig fstab. If the locale is utf8, modify the line to:

/etc/fstab
...
/dev/sdxx /media/win ntfs defaults,iocharset=utf8 0 0

If the locale is GBK, it should be:

/etc/fstab
...
/dev/sdxx /media/win ntfs defaults,iocharset=cp936 0 0
...

Incorrect Samba encoding

When using Arch as a Samba server, adding the following line to /etc/samba/smb.conf can solve the garbled problem of Windows clients:

/etc/samba/smb.conf
...
unix charset=gb2312
...

Incorrect FTP encoding

If you use UTF8 locale, the downloaded file name from a non-Unicode-encoded server might be garbled. For lftp, make the following settings under .lftp/rc:

.lftp/rc
...
set ftp:charset "gbk"
set file:charset "UTF-8"
...

For gftp, you can do the following settings in .gftp/gftprc:

.gftp/gftprc
...
remote_charsets=gb2312
...

However, the downloaded file name is still garbled and needs to be patched and compiled. The patch address is: https://www.teatime.com.tw/%7Etommy/linux/gftp_remote_charsets.patch