'How to remove nth column from the text table in powershell?
Let’s say I am working with a nicely formatted table. Take kubectl
output for example:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
me-pod-name 2/2 Running 0 6s 10.0.0.10 node1 <none> <none>
me-pod-name-2 1/1 Running 0 6d18h 10.0.1.20 node2 <none> <none>
me-pod-name-3 1/1 Running 0 11d 10.0.0.30 node3 <none> <none>
I tend to watch such output and log changes a lot. In this case, I would like to remove one of the middle columns from the table and still get a nice output.
To demonstrate, suppose I want to remove the AGE
column, as it changes a lot and is useless for watching when the resource is young:
NAME READY STATUS RESTARTS IP NODE NOMINATED NODE READINESS GATES
me-pod-name 2/2 Running 0 10.0.0.10 node1 <none> <none>
me-pod-name-2 1/1 Running 0 10.0.1.20 node2 <none> <none>
me-pod-name-3 1/1 Running 0 10.0.0.30 node3 <none> <none>
My question is: How do I easily remove such column and still output nicely formatted table, with all other columns intact?
The columns are not always the same size (e.g. AGE
is not always 8 characters wide).
- I’d like to find some reusable, one-liner solution. I work with a lot of CLI tools (not just k8s-related) and often need to filter output when passing data between them.
- Also, I’d like to avoid pattern-based solution, where I need to produce regex for each column I want to remove. (I am able to do it, but that requires me writing specific solution for each use case.)
I tried to work with ConvertFrom-String
and Format-Table
, but this messes a lot with a data format (e.g. "1/1"
is being treated as a date format, which is not true for the case).
Solution 1:[1]
What you are looking at is Fixed-Width data.
To handle that, the Import-Csv
won't do so I made a couple of functions some time ago to convert from and to Fixed-Width data.
function ConvertFrom-FixedWidth {
[CmdletBinding()]
[OutputType([PSObject[]])]
Param(
[Parameter(Mandatory = $true, Position = 0, ValueFromPipeline = $true)]
[string[]]$Data,
[int[]]$ColumnWidths = $null, # an array of integers containing the width in characters for each field
[string[]]$Header = $null, # if given, a string array containing the columnheaders
[switch]$AllowRaggedContent # if set, the function accepts the last items to be trimmed.
)
# If the data is sent through the pipeline, use $input to collect is as array
if ($PSCmdlet.MyInvocation.ExpectingInput) { $Data = @($Input) }
# or use : $Data = $Input | ForEach-Object { $_ }
if (!$ColumnWidths) {
Write-Verbose "Calculating column widths using first row"
# Try and determine the width of each field from the top (header) line.
# This can only work correctly if the fields in that line do not contain space
# characters OR if each field is separated from the next by more than 1 space.
# temporarily replace single spaces in the header row with underscore
$row1 = $Data[0] -replace '(\S+) (\S+)', '$1_$2'
# Get the starting index of each field and add the total length for the last field
$indices = @(([regex] '\S+').Matches($row1) | ForEach-Object {$_.Index}) + $row1.Length
# Calculate individual field widths from their index
$ColumnWidths = (0..($indices.Count -2)) | ForEach-Object { $indices[$_ + 1] - $indices[$_] }
}
# Combine the field widths integer array into a regex string like '^(.{10})(.{50})(.{12})'
$values = $ColumnWidths | ForEach-Object { "(.{$_})" }
if ($AllowRaggedContent) {
# account for fields that are too short (possibly by trimming trailing spaces)
# set the last item to be '(.*)$' to capture any characters left in the string.
$values[-1] = '(.*)$'
}
$regex = '^{0}' -f ($values -join '')
Write-Verbose "Splitting fields and generating output"
# Execute a scriptblock to convert each line in the array.
$csv = & {
switch -Regex ($Data) {
$regex {
# Join what the capture groups matched with a comma and wrap the fields
# between double-quotes. Double-quotes inside a fields value must be doubled.
($matches[1..($matches.Count - 1)] | ForEach-Object { '"{0}"' -f ($_.Trim() -replace '"', '""') }) -join ','
}
}
}
if ($Header) { $csv | ConvertFrom-Csv -Header $Header }
else { $csv | ConvertFrom-Csv }
}
function ConvertTo-FixedWidth {
[CmdletBinding()]
[OutputType([String[]])]
Param (
[Parameter(Mandatory = $true, Position = 0, ValueFromPipeline = $true)]
[PSObject[]]$Data,
[Parameter(Mandatory = $false)]
[ValidateRange(1, 8)]
[int]$Gap = 2
)
# if the data is sent through the pipeline, use $input to collect is as array
if ($PSCmdlet.MyInvocation.ExpectingInput) { $Data = @($Input) }
# or use : $Data = $Input | ForEach-Object { $_ }
# get the headers from the first object in the correct order
Write-Verbose "Retrieving column headers"
$headers = $Data[0].PSObject.Properties | ForEach-Object {$_.Name }
# calculate the maximum width for each column
Write-Verbose "Calculating column widths"
$columnWidths = @{}
foreach ($item in $Data) {
foreach ($column in $headers) {
$length = [Math]::Max($item.$column.Length, $column.Length)
if ($column -ne $headers[-1]) { $length += $Gap }
if ($columnWidths[$column]) { $length = [Math]::Max($columnWidths[$column], $length) }
$columnWidths[$column] = $length
}
}
# output the headers, all left-aligned
$line = foreach ($column in $headers) {
"{0, -$($columnWidths[$column])}" -f $column
}
# output the first (header) line
$line -join ''
# regex to test for numeric values
$reNumeric = '^[+-]?\s*(?:\d{1,3}(?:(,?)\d{3})?(?:\1\d{3})*(\.\d*)?|\.\d+)$'
# next go through all data lines and output formatted rows
foreach ($item in $Data) {
$line = foreach ($column in $headers) {
$padding = $columnWidths[$column]
# if the value is numeric, align right, otherwise align left
if ($item.$column -match $reNumeric) {
$padding -= $Gap
"{0, $padding}{1}" -f $item.$column, (' ' * $Gap)
}
else {
"{0, -$padding}" -f $item.$column
}
}
# output the line
$line -join ''
}
}
Having these functions in place, using your example, you could do this:
# get the fixed-width data from file and convert it to an array of PSObjects
$data = (Get-Content -Path 'D:\Test.txt') | ConvertFrom-FixedWidth -AllowRaggedContent -Verbose
# now you can remove any column like this
$data = $data | Select-Object * -ExcludeProperty 'AGE'
# show on screen
$data | Format-Table -AutoSize
# save to disk as new fixed-width file
$data | ConvertTo-FixedWidth -Gap 3 -Verbose | Set-Content -Path 'D:\Output.txt'
# or you can save it as regular CSV to disk
$data | Export-Csv -Path 'D:\Output.csv' -NoTypeInformation
Result on screen:
NAME READY STATUS RESTARTS IP NODE NOMINATED NODE READINESS GATES ---- ----- ------ -------- -- ---- -------------- --------------- me-pod-name 2/2 Running 0 10.0.0.10 node1 <none> <none> me-pod-name-2 1/1 Running 0 10.0.1.20 node2 <none> <none> me-pod-name-3 1/1 Running 0 10.0.0.30 node3 <none> <none>
Result as saved to Fixed-Width file:
NAME READY STATUS RESTARTS IP NODE NOMINATED NODE READINESS GATES me-pod-name 2/2 Running 0 10.0.0.10 node1 <none> <none> me-pod-name-2 1/1 Running 0 10.0.1.20 node2 <none> <none> me-pod-name-3 1/1 Running 0 10.0.0.30 node3 <none> <none>
Solution 2:[2]
A rather late answer...
I have a similar solution as @Theo in the form of a PowerShell cmdlet called ConvertFrom-SourceTable
which I am trying to maintain. It automatically recognizes a lot of known table formats and converts them to objects accordingly, for details see the related project site.
I agree with @mklement0 that you should try to tackle the problem at the source but that is not always possible. Besides, if there was a standard for fixed width text tables, it would place the statement in a complete different perspective knowing that the closest to a fixed width text table is probably a CSV file, which format is not fully standardized, only supports strings (in PowerShell) and above all, hard to read for a human. From my perspective: if a human can read it, a program should be able to interpret it. Programs should help humans were possible, not vise versa.
In contradiction to Theo's solution, my cmdlet doesn't define the column boundaries by 2 or more spaces (as there is no guaranty that columns are just separated by single space) but on a combination of the header, the ruler and the data that follows. Anyways, this was actually the cause of an issue I was dealing with for the given table in question: The NOMINATED NODE
column was split into two separated columns causing a duplicated NODE
column.
Considering the fact that splitting columns based on 2 or more spaces is not natural, it is obvious in this table that the NOMINATED NODE
and READINESS GATES
are single columns as there is no data under the NODE
- and GATES
header texts.
This constrain has kept me busy for a while but is now incorporated in the last update of the ConvertFrom-SourceTable
:
Install-Script -Name ConvertFrom-SourceTable # https://www.powershellgallery.com/packages/ConvertFrom-SourceTable
. .\ConvertFrom-SourceTable.ps1 # Load (dot-source) the script
$Data = Get-Content .\Table.txt -Raw | ConvertFrom-SourceTable
Note: You might also consider to omit the -Raw
switch and stream each individual line. This will use less memory but the ConvertFrom-SourceTable
cmdlet is only able to base its decisions (such as column recognition) on the first data row rather than the whole table.
# now you can remove any column like as per Theo's answer
$Data = $Data | Select-Object * -ExcludeProperty 'AGE'
# and create a table without horizontal ruler
$Header = [Ordered]@{}
$Data[0].PSObject.Properties | ForEach-Object {$Header[$_.Name] = $_.Name}
[PSCustomObject]$Header, $Data | Format-Table -HideTableHeaders | Set-Content .\NewTable.txt
Result
NAME READY STATUS RESTARTS IP NODE NOMINATED NODE READINESS GATES me-pod-name 2/2 Running 0 10.0.0.10 node1 <none> <none> me-pod-name-2 1/1 Running 0 10.0.1.20 node2 <none> <none> me-pod-name-3 1/1 Running 0 10.0.0.30 node3 <none> <none>
Note that the ConvertFrom-SourceTable
cmdlet is still able to read back the above resulting table despite the fact that the columns are separated by a single space
Solution 3:[3]
As an aside: For a given utility, you may be able to
tackle the problem at the source, by using its options to emit a machine-parseable data format rather than pretty-printed text (e.g, JSON), which PowerShell may be able to parse into objects (e.g., ConvertFrom-Json
), which you can then easily select properties from and format for display with PowerShell's Format-*
cmdlets.
Parsing and transforming fixed-width columnar text output in PowerShell:
Generally, it's best to parse the input text into custom objects ([pscustomobject]
instances), which makes them suitable for further programmatic processing and also reformatting using PowerShell's flexible Format-*
cmdlets.
To what extent this parsing can be automated without advance knowledge depends on the specifics of a given utility's fixed-width output (the assumption is that trailing whitespace in each column should be trimmed):
Automated parsing:
If the following condition is met, you can automatically determine the extent of the columns and values:
- The columns are separated by separator strings (at least one character).
- and the column names and row values themselves have no embedded instances of this separator string (except for trailing spaces).
Custom function
Select-Column
, whose source code is printed below, can perform this automated parsing via a regex that describes the separator string.
Otherwise, you have to parse by known-in-advance column widths / starting positions:
- That is, you have to know the width of all individual columns / their starting positions and parse based on that. Theo's answer shows how to do that.
In the case of kubectl
, the columns are separated by 2 spaces, whereas the column names contain only 1 space, and the values none (in your sample output), so the prerequisite for automated parsing is met.
Custom function Select-Column
, whose source code is below, can parse the text into custom objects ([pscustomobject]
instances) via regex +
as the column separator expression (2 or more spaces), and the resulting objects are suitable for further programmatic processing and formatting in PowerShell.
You can simply pipe kubectl
output to it.
With up to 4 selected columns, the resulting objects are implicitly formatted as a table (implied Format-Table
); with 5 or more, as a list (implicit Format-List
).
You can pipe to Format-*
cmdlets explicitly to control the formatting.
# Parse all columns except 'Age' into custom objects and format them as a table:
kubectl ... | Select-Column ' +' -Exclude Age | Format-Table
The above yields:
NAME READY STATUS RESTARTS IP NODE NOMINATED NODE READINESS GATES
---- ----- ------ -------- -- ---- -------------- ---------------
me-pod-name 2/2 Running 0 10.0.0.10 node1 <none> <none>
me-pod-name-2 1/1 Running 0 10.0.1.20 node2 <none> <none>
me-pod-name-3 1/1 Running 0 10.0.0.30 node3 <none> <none>
You could wrap such calls in a custom function, including piping to Format-Table
, but note that doing the latter means that the output is again not suitable for programmatic processing, because Format-*
calls output formatting instructions, not data.
If you wanted the output to be table-formatted by default even for 5 or more properties, more work would be needed: you'd have to assign a distinct (virtual) type name to the output objects and define formatting data for that type.
Select-Column
source code:
function Select-Column {
<#
.SYNOPSIS
Parses columnar text data into custom objects.
.DESCRIPTION
Parses line-based columnar text data into custom objects, based on a
column-separator pattern specified as a regular expression.
By default, the values from all columns are returned as properties of the output
objects.
Use -Name to extract only given columns, or -ExcludeName to exclude columns.
.PARAMETER SeparatorPattern
A regular expression specifying what text separates the column names / values
in the input text.
The default is ' +', i.e. any run of one or more spaces, which works with
fixed-width columns whose column names and values have no embedded spaces.
.PARAMETER Name
The names of one more columns to extract from the input text.
These names must match existing columns.
By default, all columns are returned.
.PARAMETER Exclude
The names of one more input columns to exclude from the properties of the output
objects.
These names must match existing columns.
.PARAMETER InputObject
This parameter receives the individual lines of input text via the pipeline.
the pipeline.
Alternatively, you can pass the input text as a single multi-line string, both
via the pipeline and directly to this parameter.
.EXAMPLE
'col1 col2', 'val1 val2' | Select-Column
Converts the line-by-line input into custom objects with properties 'col1'
and 'col2' whose values are 'val1' and 'val2', based on runs of 1 or more
spaces acting as separators.
.EXAMPLE
'col 1 col 2', 'val1 val2' | Select-Column ' {2,}' -Exclude 'col 2'
Converts the line-by-line input into custom objects based on two or more spaces
as separators, excluding values from column 'col 2'.
#>
[CmdletBinding(DefaultParameterSetName = 'All')]
param(
[Parameter(Position = 0)]
[Alias('s')]
[string[]] $SeparatorPattern = ' +'
,
[Parameter(ParameterSetName = 'Include', Position = 1)]
[Alias('n')]
[string[]] $Name
,
[Parameter(ParameterSetName = 'Exclude')]
[Alias('x')]
[string[]] $ExcludeName
,
[Parameter(Mandatory, ValueFromPipeline)]
[string] $InputObject
)
begin {
Set-StrictMode -Version 1
$lineIndex = 0
}
process {
foreach ($line in $InputObject -split '\r?\n' -ne '') {
# Split the line into colum names / fields.
$fields = $line -split $SeparatorPattern
# Process the header row
if ($lineIndex++ -eq 0) {
$ndx = 0
# Map column names to their indices.
$nameToIndex = @{ }
foreach ($n in $fields) {
$nameToIndex[$n] = $ndx++
}
# Based on the given column names, build a list of indices
# to extract, and create an ordered hashtable with the specified
# column names to serve as a template for the output objects.
$unknownName = $null
if ($Name) {
# only the specified columns
$unknownName = (Compare-Object -PassThru $fields $Name).Where({ $_.SideIndicator -eq '=>' })
}
elseif ($ExcludeName) {
# all but the specified columns
$Name, $unknownName = (Compare-Object -PassThru $fields $ExcludeName).Where({ $_.SideIndicator -eq '<=' }, 'Split')
}
else {
$Name = $fields # default to all columns
}
if ($unknownName) { Throw "Unknown column name(s): $unknownName" }
if (-not $Name) { Throw "You have selected no output columns." }
$oht = [ordered] @{ }
$outColIndices = foreach ($n in $Name) {
$oht[$n] = $null
$nameToIndex[$n] # output the index
}
}
# Process a data row.
else {
# Fill in the ordered hashtable with this line's field values...
$ndx = 0
foreach ($n in $Name) {
$oht[$n] = $fields[$outColIndices[$ndx++]]
}
# ... and construct and output a custom object from it.
[pscustomobject] $oht
}
}
}
}
Solution 4:[4]
Here's a method. First put all the data in a plain text file("C:\Temp.txt") Then this script will first take all data into a csv file, Then format it.
Get-Content "C:\Temp.txt" | Export-CSV "C:\Temp.csv"
Import-CSV "C:\Temp.csv" | Select-Object NAME, READY, STATUS, RESTARTS, IP, NODE, NOMINATED NODE, READINESS GATES
Remove-Item "C:\Temp.txt"
Remove-Item "C:\Temp.csv"
Or if any command generated the output no need of putting it into a text file.
TheCommand | Export-CSV "C:\Temp.csv"
Import-CSV "C:\Temp.csv" | Select-Object NAME, READY, STATUS, RESTARTS, IP, NODE, NOMINATED NODE, READINESS GATES
Remove-Item "C:\Temp.txt"
Remove-Item "C:\Temp.csv"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | iRon |
Solution 3 | |
Solution 4 | Wasif |