使用PowerShell以不含BOM的UTF-8编写文件

使用UTF-8时, Out-File似乎强制BOM:

 $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding "UTF8" $MyPath 

如何使用PowerShell以UTF-8编写不含BOM的文件?

使用.NET的UTF8Encoding类并传递$False到构造函数似乎工作:

 $MyFile = Get-Content $MyPath $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False [System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding) 

到目前为止, 正确的方法是使用@Roman Kuzminbuild议的解决scheme在@M 的评论 。 达德利回答 :

 [IO.File]::WriteAllLines($filename, $content) 

(我也通过删除不必要的System名称空间的说明缩短了一点 – 它将被默认自动replace。)

我想这不会是UTF,但我只是find了一个非常简单的解决scheme,似乎工作…

 Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext 

对我来说,无论源文件格式如何,都会导致utf-8无bom文件。

为了补充达德利先生自己的简单而实际的答案 (和ForNeVeR更简洁的重写 ):

为了方便起见,这里有一个高级函数Out-FileUtf8NoBom ,它是一个模仿Out-File的基于stream水线的替代方法 ,这意味着:

  • 你可以像pipe道中的Out-File一样使用它。
  • 非string的input对象的格式与将它们发送到控制台时的格式一样,就像使用Out-File

例:

 (Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath 

注意(Get-Content $MyPath)是如何包含在(...) ,它确保在通过stream水线发送结果之前打开整个文件,完全读取并closures整个文件。 这是必要的,以便能够回写到相同的文件(更新到位 )。
一般来说,这种技术是不可取的,原因有两个:(a)整个文件必须装入内存;(b)如果命令中断,数据将丢失。

关于内存使用的说明

  • 达德里先生自己的回答要求首先在内存中build立整个文件内容,这对于大文件可能是有问题的。
  • 下面的函数只是略有改善:所有的input对象仍然先被缓冲,然后它们的string表示被生成并逐个写入输出文件。

Out-FileUtf8NoBom源代码 (也可作为麻省理工学院授权的Gist )提供:

 <# .SYNOPSIS Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark). .DESCRIPTION Mimics the most important aspects of Out-File: * Input objects are sent to Out-String first. * -Append allows you to append to an existing file, -NoClobber prevents overwriting of an existing file. * -Width allows you to specify the line width for the text representations of input objects that aren't strings. However, it is not a complete implementation of all Out-String parameters: * Only a literal output path is supported, and only as a parameter. * -Force is not supported. Caveat: *All* pipeline input is buffered before writing output starts, but the string representations are generated and written to the target file one by one. .NOTES The raison d'être for this advanced function is that, as of PowerShell v5, Out-File still lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8 invariably prepends a BOM. #> function Out-FileUtf8NoBom { [CmdletBinding()] param( [Parameter(Mandatory, Position=0)] [string] $LiteralPath, [switch] $Append, [switch] $NoClobber, [AllowNull()] [int] $Width, [Parameter(ValueFromPipeline)] $InputObject ) #requires -version 3 # Make sure that the .NET framework sees the same working dir. as PS # and resolve the input path to a full path. [System.IO.Directory]::SetCurrentDirectory($PWD) # Caveat: .NET Core doesn't support [Environment]::CurrentDirectory $LiteralPath = [IO.Path]::GetFullPath($LiteralPath) # If -NoClobber was specified, throw an exception if the target file already # exists. if ($NoClobber -and (Test-Path $LiteralPath)) { Throw [IO.IOException] "The file '$LiteralPath' already exists." } # Create a StreamWriter object. # Note that we take advantage of the fact that the StreamWriter class by default: # - uses UTF-8 encoding # - without a BOM. $sw = New-Object IO.StreamWriter $LiteralPath, $Append $htOutStringArgs = @{} if ($Width) { $htOutStringArgs += @{ Width = $Width } } # Note: By not using begin / process / end blocks, we're effectively running # in the end block, which means that all pipeline input has already # been collected in automatic variable $Input. # We must use this approach, because using | Out-String individually # in each iteration of a process block would format each input object # with an indvidual header. try { $Input | Out-String -Stream @htOutStringArgs | % { $sw.WriteLine($_) } } finally { $sw.Dispose() } } 

这个脚本会将目录1中的所有.txt文件转换为没有BOM的UTF-8,并将它们输出到DIRECTORY2

 foreach ($i in ls -name DIRECTORY1\*.txt) { $file_content = Get-Content "DIRECTORY1\$i"; [System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content); } 

我使用的一种技术是使用Out-File cmdlet将输出redirect到ASCII文件。

例如,我经常运行SQL脚本,创build另一个SQL脚本在Oracle中执行。 使用简单的redirect(“>”),输出将是UTF-16,SQLPlus无法识别。 要解决这个问题:

 sqlplus -s / as sysdba "@create_sql_script.sql" | Out-File -FilePath new_script.sql -Encoding ASCII -Force 

生成的脚本可以通过另一个SQLPlus会话执行,而不会有任何Unicode问题:

 sqlplus / as sysdba "@new_script.sql" | tee new_script.log 

通过扩展将多个文件更改为不含BOM的UTF-8:

 $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) foreach($i in ls -recurse -filter "*.java") { $MyFile = Get-Content $i.fullname [System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding) } 

无论出于何种原因, WriteAllLines调用仍然为我生成一个BOM,带有BOMless UTF8Encoding参数,没有它。 但下面的工作对我来说:

 $bytes = gc -Encoding byte BOMthetorpedoes.txt [IO.File]::WriteAllBytes("$(pwd)\BOMthetorpedoes.txt", $bytes[3..($bytes.length-1)]) 

我必须使文件path绝对的工作。 否则,它将文件写入我的桌面。 另外,我想这只适用于你知道你的BOM是3字节。 我不知道如何期望基于编码的给定的BOM格式/长度是多么可靠。

另外,正如所写的,这可能只适用于如果您的文件适合到一个PowerShell数组,似乎有一个低于我的机器上的[int32]::MaxValue的长度限制。

  [System.IO.FileInfo] $file = Get-Item -Path $FilePath $sequenceBOM = New-Object System.Byte[] 3 $reader = $file.OpenRead() $bytesRead = $reader.Read($sequenceBOM, 0, 3) $reader.Dispose() #A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191 if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191) { $utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) [System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding) Write-Host "Remove UTF-8 BOM successfully" } Else { Write-Warning "Not UTF-8 BOM file" } 

源码如何使用PowerShell从文件中删除UTF8字节顺序标记(BOM)

如果要使用[System.IO.File]::WriteAllLines() ,则应该将第二个参数[System.IO.File]::WriteAllLines()String[] (如果$MyFile的types为Object[] ),并且使用$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath)指定绝对path$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath) ,如:

 $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile [System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding) 

如果你想使用[System.IO.File]::WriteAllText() ,有时你应该将第二个参数[System.IO.File]::WriteAllText()| Out-String | | Out-String | 将CRLF添加到每行的末尾(尤其是当您使用ConvertTo-Csv ):

 $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp [System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding) 

或者你可以使用[Text.Encoding]::UTF8.GetBytes()Set-Content -Encoding Byte

 $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp [System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding) 

请参阅: 如何将ConvertTo-Csv的结果写入UTF-8中没有BOM的文件

可以使用下面的方式来获取没有BOM的UTF8

 $MyFile | Out-File -Encoding ASCII 

这一个为我工作(使用“默认”,而不是“UTF8”):

 $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding "Default" $MyPath 

结果是没有BOM的ASCII。