String slicing

Sometimes you'll find yourself in a spot where you need to remove x characters from your string.
There are multiple ways to perform this functionality. This chapter will cover "the simplest" ones. Later on this page you'll see regular expressions and sed commands.

If it's a simple case and you know exactly where this should be done, you can use substring expansion. It expands to up to length characters of the value of parameter starting at the character specified by offset. If length is omitted, it expands to the substring of the value of parameter starting at the character specified by offset and extending to the end of the value.

Syntax: ${parameter:offset:length}

#!/bin/bash

VAR1="im an example string"

#remove 3 last characters
echo ${VAR1::-3}
#output: im an example str

#remove 3 first characters
echo ${VAR1:3}
#output: an example string

#remove 1 from left and right and save it to variable
result=${VAR1:1:-1}
echo $result
#output: m an example strin


#just an example with a POSIX way
# remove first 3 chars
example=testing
echo ${example#???}
#output: ting

The other very usefull command is a cut command. Manual describes that it is used to remove sections from each line of files.
If no file is defined, command reads standard input! This means that you need to pipe string to it!

The basic syntax is quite simple: cut OPTION... [FILE]...

The most usufull options are:
-d = delimiter, what is used to separate fields
-f = fields, print the desired field
* -c = cut by characters

#!/bin/bash

VAR1="dog wolf cow horse"

#display the 1st field
echo $VAR1 | cut -d ' ' -f 1
#output: dog

#display the 1st and 4th fields
echo $VAR1 | cut -d ' ' -f 1,4
#output: dog horse

#show the 2nd to 5th characters 
echo "imjustaexample" | cut -c 2-5
#ouput: mjust

#remove the first character
echo "imjustaexample" | cut -c 2-
"output: mjustaexample

#Get the 2nd item from .csv file (just a variable value) and save it to a variable
VAR1="dog;horse,cow"
result=$(echo $VAR1 | cut -d ';' -f 2)
echo $result
#ouput: horse


File handling

When you're scripting with Bash, sometimes you need to read data from or write data to a file.
Sometimes a file may contain configuration options, and other times the file is the data your user is creating with your application.

Reading from a file

The most common one to read method is to get file content to the standart output with some command, like cat for example and after it redirect the output to some other process (if needed).
Like any other standard output, it can be saved to a variable as well with a command substitution OR by using stdin redirection (<).

#!/bin/bash

#---file.txt has 2 lines in it: 
#cat
#dog

#---stdin redirect to a variable
variable1=$(< file.txt)
echo ${variable1}
#output: cat dog

#---save stdout to a variable

variable1=$(cat file.txt)
echo ${variable1}
#output: cat dog

#---redirect directly to the desired command

cat file1.txt | grep dog
#or
variable2=$(cat file.txt | grep dog)
echo ${variable2}
#output: dog

Sometimes it's nice read from file in "ordered fashion". The best way is to use read command.
read command reads from the file line by line, so if the file has multiple lines, you need to combine it with a while statement.
Note that by default every line is read to a variable. Newline character is used as a delimiter. Option -r leaves backslashes the way they are.

Also note that the file is given to a while loop after the done statement. This can be done using redirection operator directly from the file or with a command substitution.

#!/bin/bash

#---file.txt has 2 lines in it:
#cat miau
#dog wuff

while read line
do
  echo ${line}
done < file.txt
#done < <(cat file.txt)

#output:
#cat miau
#dog wuff

If you like to read every item on the line to a own field, you can tag fields after the read command.

#!/bin/bash

#---file.txt has 2 lines in it:
#cat miau
#dog wuff

while read -r animal sound
do
  echo "animal: ${animal}"
  echo "sound: ${sound}"
done < file.txt

#output:
#animal: cat
#sound: miau
#animal: dog
#sound: wuff

If you need to describe another delimiter, a term IFS comes handy. IFS means Internal File Separator that is a special shell variable.

As the name states, it is used to describe what is used to split words. By default it uses a spaces, tabs and newlines as a delimiter.
To define another delimiter, define variable IFS with a new value before the read command. Note that defined IFS is valid only during the while loop! If you want to define it so that it applies to whole script (with for loops for instance), add it to it's own line.

#!/bin/bash

#---file.csv
animal;sound;age
dog;wuff;10
chicken;kotkot;0,5
cow;mooo;5

while IFS=';' read -r animal sound age
do
  echo "animal: ${animal}"
  echo "sound: ${sound}"
  echo "age: ${age}"
  echo "new iteration round"
done < file.csv

#new iteration round
#animal: animal
#sound: sound
#age: age
#new iteration round
#animal: dog
#sound: wuff
#age: 10
#new iteration round
#animal: chicken
#sound: kotkot
#age: 0,5
#new iteration round
#animal: cow
#sound: mooo
#age: 5

Sometimes it might be good to read lines directly to an array. Using this method you know exactly which index holds the correlating header values and therefore know how to handle them.

#!/bin/bash

#---file.csv
animal;sound;age
dog;wuff;10
chicken;kotkot;0,5
cow;mooo;5

ARRAY1=()

while read -r line
do
  ARRAY1+=(${line})
done < file.csv

for x in ${ARRAY1[@]}
do
  echo $x
done

#output:
#animal;sound;age
#dog;wuff;10
#chicken;kotkot;0,5
#cow;mooo;5

#To parse the desired value, cut command is used and first item is returned to a variable
#(cut need stdin output, thats why array[index] is printed and piped to it)

value=$(echo ${ARRAY1[1]} | cut -d ';' -f1)
echo $value

#output:
#dog

If you just want print 1 value from the line when there is spaces as delimiter, you dont need to read values on own fields. Like you learned during the loop section, a string values can be used themselfs if there is a space between them.
You can also use printf command to access different items within the string using indexes.

#!/bin/bash

#---file.txt has 2 lines in it:
#cat miau
#dog wuff

while read line
do
  printf "animal: %s\nsound: %s\n" ${line[0]} ${line[1]}
done < file.txt

#output:
#animal: cat
#sound: miau
#animal: dog
#sound: wuff

Writing to a file

In order to write new lines to a file, use I/O redirection operators.
There are three IO channels: stdin, stdout, stderr. As heir names state, in is for process input, out for output and err for errors.

  • > = Redirect stdout to the file (overwrite)
  • >> = Redirect stdout to the end of file
  • 2> = Redirect stderr to the file (overwrite)
  • 2>> = Redirect stderr to the end of file
  • &> = Redirect both channels to the file (overwrite)
  • &>> = Redirect both channels to the end of file

Other possibility is to use a tee command.
tee command reads stdin and writes it to stdout and one or more files at the same time. Because of that, you might want to use this commands with o I/O redirection (piping, |).
tee allows you to write in multiple files at the same time and also supports append functionality (-a).

#!/bin/bash

echo "just an example" | tee $HOME/test.txt
#echo commands output is redirected to the tee process and "just an example" is written to the file and stdout"

echo "just an example" | tee -a $HOME/test.txt
#echo commands output is redirected to the tee process and "just an example" is written to the end of file and stdout"

Regular Expressions

Regular expressions are notations that lets you search for text that fits a particular criteria, such as “starts with the letter a". The notation lets you write a single expression that can select, or match, multiple data strings.
The POSIX standard defines Regular expressions in the following way:

Regular expressions are a context-independent syntax that can represent a wide variety of character sets and character set orderings, where these character sets are interpreted according to the current locale. While many regular expressions can be interpreted differently depending on the current locale, many features, such as character class expressions, provide for contextual invariance across locales.

There are multiple implementations on regex, based on the used programming language. All implementations are based on the POSIX standard, but they might have some kind extensions (additional features --> syntax differences).
We are focusing on POSIX BRE and ERE constructs ,which are intended to formalize the two basic “flavors” of regular expressions found among most Unix utilities. The only difference is that BRE’s have a little less functionalities and it will use backslashes to give various characters (+, ?, |, [], ()) a special meaning, while ERE’s will use backslashes to take away the special meaning of the same characters.

BRE vs ERE


Special characters

There are some special characters/metacharacters that have a special meaning when defining a regex.
Remember that syntax depends on the implementation! In ERE if the special character is a part of a of actual string that will be searched, it must be escaped with a backslash (\)!

Speacial character are: .*^$?\+[]{}|()

Character Description
. matches any single character
* Match any number (or none) of the single character that immediately precedes it.
^ Match the following regular expression at the beginning of the line or string.
$ Match the preceding regular expression at the end of the line or string.
? preceding character is optional and matched at most once
[...] Bracket expression, matches any single character (character expression)
{n,m} preciding character can be matched n to m times
\ backslash (turn of special meaning of the following character)
+ preceding character can occur one or more times
| alternation (logical or)
() grouping, (ab)* --> ab or abab or ababab matches

Check GNU's manual page for sed command for syntax examples.


Bracket expression []

You can also use the square brackets to match any enclosed character or range of characters, if pairs of characters are separated by a hyphen (-). In other words you can describe a list of character that can be used to match one character (the same way as question mark (?) wildcard).
A new range is defined by listing characters inside of brackets ([ ]). Note that defined charaacter matches with a lower and upper version of the letter. For example: [aeiou] matches with a vowels (upper and lower), [a-c] matches with a letters a,A,b,B,c,C and [0-4] matches with a digits 0 to 4.
If the first character inside the braces is a ^, characters not enclosed will be matched.

For example:

  • [hc]at = matches with "hat" and "cat"

There are multiple predefined character classes available:

The most common ones are introduced below:

  • [:alpha:] = matches alphabets (lower or upper)
  • [:digit:] = matches numbers
  • [:lower:] = matches lowercase letters
  • [:upper:] = matches uppercase letters
  • [:space:] = matches whitespace

Sed

Sed is a stream editor. It receives text input, whether from stdin or from a file, performs certain operations on specified lines of the input, one line at a time, then outputs the result to stdout or to a file.

Sed determines which lines of its input it will operate on from the address range passed to it.
Specify this address range either by line number or by a pattern to match. For example, 3d signals sed to delete line 3 of the input.

Sed has several commands, but most used is the substitute command: s.
The substitue command attempts to match the pattern space against the supplied regular expression regexp; if the match is successful, then that portion of the pattern space which was matched is replaced with replacement.
Note that by default, original file is not changed!

Syntax: sed "s/regexp/replacement/flags" file

In the syntax, slashes (/) are delimiters.
However, sed allows other delimiters, such as ;, # or `|`` and many more. These may be useful when / is part of a replacement string, as in a file pathname, but note that everything depends on your strings.

The most common operators are:

Operator Name
-i edit files in-place instead of printing to standard outputs
-r or -E use extended regular expressions in the script (ERE)
-n suppress output

The most common flags are:

  • g = global, replacement is applied to every match, not just the first on the line
  • number = Only replace the numbered match of the regexp
  • i = case-insensitive/ignore case
  • p = print the changed lines (combine with -n)
  • w = write changed lines to a file

Examples:

#!/bin/bash

#---file.txt has 3 lines in it:
#cat miau
#dog wuff
#cow moo

#replace all words "dog" with a "wolf"
sed -r 's/dog/wolf/g' file.txt

#remove string "wuff" with ignore case (first hit)
sed -r 's/WuFF//i' file.txt

#replace all the lines that start with a character c with a word test"
sed -r 's/^c.*/test/g' hark.txt

#replace the first match "dog" with a "wolf" adn save changes in place
sed -r  -i 's/dog/wolf/' file.txt

#replace a word moo with a string "test" and write changes to the other file
sed -r 's/moo/test/w testing.txt' file.txt
#OR
sed -rn 's/moo/test/p' file.txt  > example.txt

#remove the line that starts with "ca"
sed -r '/^ca/d' file.txt

#You can do multiple sed commands in one script by separating them with a semicolon.
#remove first line and do substitue on lines 2 to 5
sed -r '1 d; 2,5 s/target/replacement/g' file.txt

jq

jq is a tool for manipulating JSON data. Unfortunately, shells don't directly offer this kind of functionality, you must use multiple tools (grep, sed, awk) or use 3th party programs like jq.
Basic idea is behind the jq is a concept of filters that work over a stream of JSON. Each filter takes an input and emits JSON to standard output.

Install jq to your environment with the following command: sudo apt install jq

Syntax:
jq 'filters' file OR X stdout | jq 'filters'

There are multiple filteers available, the most common ones are shown in the below but check the manual

#prettify the JSON
echo '{"Car":{"manufacturer":"Volvo","model":"S60","color":"black"}}' | jq '.'
#{
#  "Car": {
#    "manufacturer": "Volvo",
#    "model": "S60",
#    "color": "black"
#  }
#}

#------------------------------------------------

#access the properties
echo '{"Car":{"manufacturer":"Volvo","model":"S60","color":"black"}}' | jq '.Car.manufacturer'
#"Volvo"

#------------------------------------------------

#access the multiple properties, use comma
echo '{"Car":{"manufacturer":"Volvo","model":"S60","color":"black"}}' | jq '.Car.manufacturer,.Car.model'
#"Volvo"
#"S60"

#------------------------------------------------

#key function returns the keys
echo '{"Car":{"manufacturer":"Volvo","model":"S60","color":"black"}}' | jq '.Car | keys'
#[
#  "color",
#  "manufacturer",
#  "model"
#]

#------------------------------------------------

#accessing an array values --> use indexes!
echo '[{"manufacturer": "Volvo","model": "S60"},{"manufacturer": "Volvo","model": "S80"},{"manufacturer": "Volvo","model": "V50"}]' | jq '.[0]'
#{
#  "manufacturer": "Volvo",
#  "model": "S60"
#}

#------------------------------------------------

#accessing x array property
echo '[{"manufacturer": "Volvo","model": "S60"},{"manufacturer": "Volvo","model": "S80"},{"manufacturer": "Volvo","model": "V50"}]' | jq '.[0].manufacturer'
#"Volvo"


#access the same properties of an array
 echo '[{"manufacturer": "Volvo","model": "S60"},{"manufacturer": "Volvo","model": "S80"},{"manufacturer": "Volvo","model": "V50"}]' | jq '.[].
model'
#"S60"
#"S80"
#"V50"