文件bom头

有关于BOM，是Windows程序处理文本是常用，但linux不常用。
BOM 是 byte-order mark 的缩写，是 "字节序标记" 的意思, 它常被用来当做标识文件是以 UTF-8、UTF-16 或 UTF-32编码的标记。在 Unicode 编码中有一个叫做 "零宽度非换行空格" 的字符 ( ZERO WIDTH NO-BREAK SPACE ), 用字符 FEFF 来表示对于 UTF-16 ，如果接收到以 FEFF 开头的字节流，就表明是大端字节序，如果接收到 FFFE，就表明字节流是小端字节序
UTF-8 没有字节序问题，上述字符只是用来标识它是 UTF-8 文件，而不是用来说明字节顺序的。"零宽度非换行空格" 字符的 UTF-8 编码是 EF BB BF, 所以如果接收到以 EF BB BF 开头的字节流，就知道这是UTF-8 文件
关于文件16进制查看方法，在linux下可以用hexdump $path_to_file命令查，windows下也可以用vim打开后，使用%!xxd 命令，本质上是调用了xxd程序进行转化。windows下也可以用vs code插件，需要安装Hex Editor插件，然后在vscode打开对应文件名页签处右键调用

客户端js代码下载函数如下，关键的是在数据前面增加了 FFFE 前缀，浏览器下载后会变更EF BB BF 开头。

export function downloadExportUtf8Csv(url, fileName) {
  request({url: url, method: 'get'}).then(response => {
    let data = "\ufeff" + response.data
    let blob = new Blob([data], {type:'text/csv,charset=UTF-8'})
    let url = window.URL.createObjectURL(blob)
    let a = document.createElement("a")
    a.href = url
    a.download = fileName
    a.click()
    window.URL.revokeObjectURL(url)
    a.remove()
  })
}

后续
在node_module中的客户端库，FileSaver中也有自动转码。

auto_bom = function(blob) {
  // prepend BOM for UTF-8 XML and text/* types (including HTML)
  // note: your browser will automatically convert UTF-16 U+FEFF to EF BB BF
  if (/^\s*(?:text\/\S*|application\/xml|\S*\/\S*\+xml)\s*;.*charset\s*=\s*utf-8/i.test(blob.type)) {
      return new Blob([String.fromCharCode(0xFEFF), blob], {type: blob.type});
  }
  return blob;
}
FileSaver = function(blob, name, no_auto_bom) {
   if (!no_auto_bom) {
      blob = auto_bom(blob);
   }
   // ...
}

评论 (0)