'How to detect and remove indentation of a piped text
I'm looking for a way to remove the indentation of a piped text. Below is a solution using cut -c 9-
which assumes the indentation is 8 character wide.
I'm looking for a solution which can detect the number of spaces to remove. This implies going through the whole (piped) file to know the minimum number of spaces (tabs?) used to indent it, then remove them on each line.
run.sh
help() {
awk '
/esac/{b=0}
b
/case "\$arg" in/{b=1}' \
"$me" \
| cut -c 9-
}
while [[ $# -ge 1 ]]
do
arg="$1"
shift
case "$arg" in
help|h|?|--help|-h|'-?')
# Show this help
help;;
esac
done
$ ./run.sh --help
help|h|?|--help|-h|'-?')
# Show this help
help;;
Note: echo $' 4\n 2\n 3' | python3 -c 'import sys; import textwrap as tw; print(tw.dedent(sys.stdin.read()), end="")'
works but I expect there is a better, way (I mean, one which doesn't only depends on software more common than python. Maybe awk? I wouldn't mind seeing a perl solution either.
Note2: echo $' 4\n 2\n 3' | python -c 'import sys; import textwrap as tw; print tw.dedent(sys.stdin.read()),'
also works (Python 2.7.15rc1).
Solution 1:[1]
The following is pure bash, with no external tools or command substitutions:
#!/usr/bin/env bash
all_lines=( )
min_spaces=9999 # start with something arbitrarily high
while IFS= read -r line; do
all_lines+=( "$line" )
if [[ ${line:0:$min_spaces} =~ ^[[:space:]]*$ ]]; then
continue # this line has at least as much whitespace as those preceding it
fi
# this line has *less* whitespace than those preceding it; we need to know how much.
[[ $line =~ ^([[:space:]]*) ]]
line_whitespace=${BASH_REMATCH[1]}
min_spaces=${#line_whitespace}
done
for line in "${all_lines[@]}"; do
printf '%s\n' "${line:$min_spaces}"
done
Its output is:
4
2
3
Solution 2:[2]
Suppose you have:
$ echo $' 4\n 2\n 3\n\ttab'
4
2
3
tab
You can use the Unix expand utility to expand the tabs to spaces. Then run through an awk
to count the minimum number of spaces on a line:
$ echo $' 4\n 2\n 3\n\ttab' |
expand |
awk 'BEGIN{min_indent=9999999}
{lines[++cnt]=$0
match($0, /^[ ]*/)
if(RLENGTH<min_indent) min_indent=RLENGTH
}
END{for (i=1;i<=cnt;i++)
print substr(lines[i], min_indent+1)}'
4
2
3
tab
Solution 3:[3]
Here's the (semi-) obvious temp file solution.
#!/bin/sh
t=$(mktemp -t dedent.XXXXXXXXXX) || exit
trap 'rm -f $t' EXIT ERR
awk '{ n = match($0, /[^ ]/); if (NR == 1 || n<min) min = n }1
END { exit min+1 }' >"$t"
cut -c $?- "$t"
This obviously fails if all lines have more than 255 leading whitespace characters because then the result won't fit into the exit code from Awk.
This has the advantage that we are not restricting ourselves to the available memory. Instead, we are restricting ourselves to the available disk space. The drawback is that disk might be slower, but the advantage of not reading big files into memory will IMHO trump that.
Solution 4:[4]
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(text="$(cat)"; echo "$text" \
| cut -c "$(echo "$text" | sed 's/[^ ].*$//' | awk 'NR == 1 {a = length} length < a {a = length} END {print a + 1}')-"\
)
With explanations:
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(
text="$(cat)" # Obtain the input in a varibale
echo "$text" | cut -c "$(
# `cut` removes the n-1 first characters of each line of the input, where n is:
echo "$text" | \
sed 's/[^ ].*$//' | \
awk 'NR == 1 || length < a {a = length} END {print a + 1}'
# sed: keep only the initial spaces, remove the rest
# awk:
# At the first line `NR == 1`, get the length of the line `a = length`.
# For any shorter line `a < length`, update the length `a = length`.
# At the end of the piped input, print the shortest length + 1.
# ... we add 1 because in `cut`, characters of the line are indexed at 1.
)-"
)
Update:
It is possible to avoid spawning sed
. As per tripleee's comment, sed's s///
can be replace awk's sub()
. Here is an even shorter option, using n = match()
as in tripleee's answer.
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(
text="$(cat)" # Obtain the input in a varibale
echo "$text" | cut -c "$(
# `cut` removes the a-1 first characters of each line of the input, where a is:
echo "$text" | \
awk '
{n = match($0, /[^ ]/)}
NR == 1 || n < a {a = n}
END || a == 0 {print a + 1; exit 0}'
# awk:
# At every line, get the position of the first non-space character
# At the first line `NR == 1`, copy that lenght to `a`.
# For any line with less spaces than `a` (`n < a`) update `a`, (`a = n`).
# At the end of the piped input, print a + 1.
# a is then the minimum number of common leading spaces found in all lines.
# ... we add 1 because in `cut`, characters of the line are indexed at 1.
#
# I'm not sure the whether the `a == 0 {...; exit 0}` optimisation will let the "$text" be written to the script stdout yet (which is not desirable at all). Gotta test that when I get the time.
)-"
)
Apparently, it's also possible to do in Perl 6 with the function my &f = *.indent(*);
.
Solution 5:[5]
Another solution with awk
, based on dawg’s answer. Major differences include:
- No need to set an arbitrary large number for indentation, which feels hacky.
- Works on text with empty lines, by not considering them when gathering the lowest indented line.
awk '
{
lines[++count] = $0
if (NF == 0) next
match($0, /[^ ]/)
if (length(min) == 0 || RSTART < min) min = RSTART
}
END {
for (i = 1; i <= count; i++) print substr(lines[i], min)
}
' <<< $' 4\n 2\n 3'
Or all on the same line
awk '{ lines[++count] = $0; if (NF == 0) next; match($0, /[^ ]/); if (length(min) == 0 || RSTART < min) min = RSTART; } END { for (i = 1; i <= count; i++) print substr(lines[i], min) }' <<< $' 4\n 2\n 3'
Explanation:
Add current line to an array, and increment count
variable
{
lines[++count] = $0
If line is empty, skip to next iteration
if (NF == 0) next
Set RSTART
to the start index of the first non-space character.
match($0, /[^ ]/)
If min
isn’t set or is higher than RSTART, set the former to the latter.
if (length(min) == 0 || RSTART < min) min = RSTART
}
Run after all input is read.
END {
Loop over the array, and for each line print only a substring going from the index set in min
to the end of the line.
for (i = 1; i <= count; i++) print substr(lines[i], min)
}
Solution 6:[6]
solution using bash
#!/usr/bin/env bash
cb=$(xclip -selection clipboard -o)
firstchar=${cb::1}
if [ "$firstchar" == $'\t' ];then
tocut=$(echo "$cb" | awk -F$'\t' '{print NF-1;}' | sort -n | head -n1)
else
tocut=$(echo "$cb" | awk -F '[^ ].*' '{print length($1)}' | sort -n | head -n1)
fi
echo "$cb" | cut -c$((tocut+1))- | xclip -selection clipboard
Note: assumes first line has the left-most indent
Works for both spaces and tabs
Ctrl+V some text, run that bash script, and now the dedented text is saved to your clipboard
solution using python
detab.py
import sys
import textwrap
data = sys.stdin.readlines()
data = "".join(data)
print(textwrap.dedent(data))
use with pipes
xclip -selection clipboard -o | python detab.py | xclip -selection clipboard
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Charles Duffy |
Solution 2 | |
Solution 3 | |
Solution 4 | |
Solution 5 | user137369 |
Solution 6 | Seth Foster |