Home | 簡體中文 | 繁體中文 | 雜文 | 知乎專欄 | Github | OSChina 博客 | 雲社區 | 雲棲社區 | Facebook | Linkedin | 視頻教程 | 打賞(Donations) | About

5.8. Text Processing

5.8.1. iconv - Convert encoding of given files from one encoding to another

5.8.1.1. cconv - A iconv based simplified-traditional chinese conversion tool

cconv是建立在iconv之上,可以UTF8編碼直接轉換,並增加了詞轉換。

sudo apt-get install cconv
			

使用cconv進行簡繁轉換的方法為:

cconv -f UTF8-CN -t UTF8-HK zh-cn.txt -o zh-hk.txt
			

5.8.1.2. uconv - convert data from one encoding to another

安裝

sudo apt-get install libicu-dev
			

例子

$ uconv -f cp1252 -t UTF-8 -o file_in_utf8.txt file_in_cp1252_encoding.txt
			

5.8.2. 字元串處理命令expr

		
字元串處理命令expr用法簡介:
名稱:expr
用途:求表達式變數的值。
語法: expr Expression
實例如下:
例子1:字串長度
shell>> expr length "this is a test content";
22
例子2:求餘數
shell>> expr 20 % 9
2
例子3:從指定位置處截取字元串
shell>> expr substr "this is a test content" 3 5
is is
例子4:指定字元串第一次出現的位置
shell>> expr index "testforthegame" s
3
例子5:字元串真實重現
shell>> expr quote thisisatestformela
thisisatestformela
		
		

5.8.3. cat - concatenate files and print on the standard output

-b	不對空白行編號。
-e	使用 $ 字元顯示行尾。
-n	從 1 開始對所有輸出行編號。
-q	使用靜默操作(禁止錯誤消息)。
-r	將所有多個空行替換為單行(“壓縮”空白)。
-t	將製表符顯示為 ^I。
-u	不對輸出進行緩衝。
-v	可視地顯示非打印控制字元。
		

5.8.3.1. -s, --squeeze-blank suppress repeated empty output lines

-S 將多個空白行壓縮到單行中(與 -r 相同)

			
$ cat >> /tmp/test <<EOF
Line1

Line2


Line3




Line4


Line5

EOF

$ cat -s /tmp/test
Line1

Line2

Line3

Line4

Line5

			
			

5.8.3.2. -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB

顯示控制字元。例如Tab等,下面例子查看檔案結尾換行符類型

			
[neo@netkiller ~]# cat -v file.txt
GRANT USAGE ON *.* TO 'esauser'@'localhost' IDENTIFIED BY xxxxxxx; ^M
^M
file^M
2059^M
			
			

5.8.4. nl - number lines of files

$ nl /etc/issue
     1  CentOS release 5.4 (Final)
     2  Kernel \r on an \m
		

5.8.5. tr - translate or delete characters

		
[:alnum:] :所有字母字元與數字
[:alpha:] :所有字母字元
[:blank:] :所有水平空格
[:cntrl:] :所有控制字元
[:digit:] :所有數字
[:graph:] :所有可打印的字元(不包含空格符)
[:lower:] :所有小寫字母
[:print:] :所有可打印的字元(包含空格符)
[:punct:] :所有標點字元
[:space:] :所有水平與垂直空格符
[:upper:] :所有大寫字母
[:xdigit:] :所有 16 進位制的數字		
		
		

5.8.5.1. 替換字元

":"替換為"\n"

			
$ cat /etc/passwd |tr ":" "\n"
			
			

5.8.5.2. 英文大小寫轉換

使用 tr '[:lower:]' '[:upper:]' 將小寫字母替換成大寫字母

			
[root@localhost ~]# echo "Helloworld" | tr '[:lower:]' '[:upper:]'
HELLOWORLD			
			
			

替換整段文字

						
[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 

[root@localhost ~]# cat /etc/redhat-release | tr '[:lower:]' '[:upper:]'
CENTOS LINUX RELEASE 7.5.1804 (CORE) 
			
			

			
[root@localhost ~]# echo "Netkiller" |  tr ‘a-z’ ‘A-Z’
NETKILLER			
			
			

5.8.5.3. [CHAR*] 和 [CHAR*REPEAT]

			
[root@localhost ~]# echo "1234567890" | tr ‘1-5′ ‘[A*]‘ 
AAAAA67890

[root@localhost ~]# echo "1234567890" | tr ‘1-9′ ‘[A*5]BCDE’
AAAAABCDE0			
			
			

5.8.5.4. -s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character

刪除重複的字元

			
[root@localhost ~]# echo "My      nickname  is      netkiller." | tr -s ' '
My nickname is netkiller.	

[root@localhost ~]# echo "aaaabbbbccccdddd." | tr -s 'a'
abbbbccccdddd.

[root@localhost ~]# echo "aaaabbbbccccdddd." | tr -s 'a-z'
abcd.		
			
			

5.8.5.5. -d, --delete delete characters in SET1, do not translate

刪除字元

			
[root@localhost ~]# echo "My nickname is netkiller" | tr -d ' '
Mynicknameisnetkiller
			
[root@localhost ~]# md5sum /etc/issue | tr -d [0-9] 
ffedfcfbdcaebdec  /etc/issue			
			
			

5.8.6. cut - remove sections from each line of files

列操作

$ last | grep  'neo' | cut -d ' ' -f1
        
$ cat /etc/passwd | cut -d ':' -f1
root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy

$ cat /etc/passwd | cut -d ':' -f1,3,4

# cat /etc/passwd | cut -d ':' -f1,6
root:/root
bin:/bin
daemon:/sbin
adm:/var/adm
lp:/var/spool/lpd
sync:/sbin
shutdown:/sbin
halt:/sbin
mail:/var/spool/mail
uucp:/var/spool/uucp
operator:/root
games:/usr/games
gopher:/var/gopher
ftp:/var/ftp
nobody:/
vcsa:/dev
saslauth:/var/empty/saslauth
postfix:/var/spool/postfix
sshd:/var/empty/sshd
rpc:/var/cache/rpcbind
rpcuser:/var/lib/nfs
nfsnobody:/var/lib/nfs
ntp:/etc/ntp
nagios:/var/log/nagios

        

行操作

$ cat /etc/passwd | cut -c 1-4
root
daem
bin:
sys:
sync
game
man:

$ echo "No such file or directory"| cut -c4-7
such

$ echo "No such file or directory"| cut -c -8
No such

$ echo "No such file or directory"| cut -c-8
No such

        

5.8.7. printf - format and print data

printf "%d\n" 1234
		
$ printf "\033[1;33m TEST COLOR \n\033[m"
		

5.8.8. Free `recode' converts files between various character sets and surfaces.

Following will convert text files between DOS, Mac, and Unix line ending styles:

		
$ recode /cl../cr <dos.txt >mac.txt
$ recode /cr.. <mac.txt >unix.txt
$ recode ../cl <unix.txt >dos.txt
		
		

5.8.9. /dev/urandom 隨機字元串

		
[neo@test .deploy]$ echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
GidAuuNN
[neo@test .deploy]$ echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
UyGaWSKr
		
		

我常常使用這樣的隨機字元初始化密碼

		
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:alnum:] | head -c 8`
xig8Meym
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:alnum:] | head -c 8`
23Ac1vZg
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:digit:] | head -c 8`
73652314
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 8`
GO_o>OnJ
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 10`
iGy0FS/aO5
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 50`
;`E^{5(T4v~5$YovW.?%_?9la<`+qPcRh@7mD\!Whx;MJZVQ\K
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:print:] | head -c 50`
fy$[#:'(')jt'gp1/g-)d~p]8 :r9i;MO2d!8M<?Qs3t:QgK$O
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 50`
6SivJ5y$/FTi8mf}rrqE&s0"WkA}r;uK-=MT!Wp0UlL_lF0|bL
		
		

批量生成

		
for i in {1..10}
do
echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
done
		
		
		
# cat /dev/urandom | tr -cd [:alnum:] | fold -w30 | head -n 20
AVqROzjF6ZATJGv2J6PzDHp3jLpKV4
ONt68UFNDwgXpSnLBV7oRDX3VLRYsX
EZTWCGvZc3mIEeuw9sxMtV8ZkzVRJv
BhUiv0a7utsjZFLYpKGZrY5aDXcZL4
5YfUl2hmDT1O9X61DRYg4wSp4lXoXX
ykyPJxH47PzxnNGlujIUF98ZtB01H0
QyP53mksQN8bCNNo1fSD3RtqhhEGfa
u2RkT1M9GUQF4a6O18tG5WD97OOXze
Whm5X7398Q8L9BONN8k2oLy9CL37JO
TmGQz7WB6WnkjhyB4wrBHBJ3HMIRyf
hww43yvddUDYUnbNOKjhv3sLhCA4YD
uY6zQtBC6miwLUl3jkCVVA0Xu8ASgj
jv58qu46VW7LvRIq4txNE8bG9NBlZl
pzaMkydAiCHCF5H2oQVqMn4DTTYgNL
yoN2A9LyrCwLfjP1ad9HMAwxExJL5i
J27iy2L90m9dpcPLJ8tl46GGb9xqmQ
6YwFCvuPHyyEwnctUTpqLFcvUafVZ2
Nuq9XgIgRQGynjlVqGLMOpO0MkGpsn
tChkRG7eoRuKVXgW7ccTGx45E54K3Y
qPv48XqdGlOrdULCOGZ45kwJ1v5kVX		
		
		

5.8.10. col - filter reverse line feeds from input

清除 ^M 字元

$ cat oldfile | col -b > newfile
		

5.8.11. apg - generates several random passwords

sudo apt-get install apg

$ apg

Please enter some random data (only first 16 are significant)
(eg. your old password):>
imlogNukcel5 (im-log-Nuk-cel-FIVE)
Drocdaf1 (Droc-daf-ONE)
fagJook0 (fag-Jook-ZERO)
heabugJer4 (heab-ug-Jer-FOUR)
5OsEsudy (FIVE-Os-Es-ud-y)
IrjOgneagOc9 (Irj-Og-neag-Oc-NINE)


$ apg -M SNCL -m 16
WoidWemFut6dryn,
byRowpEus-Flutt0
|QuogCagFaycsic0
ojHoadCyct4Freg_
Vir9blir`orhohoo
bapOip?Ibreawov2
		

5.8.12. head/tail

head -c 17 | tail -c 1
		

5.8.13. 反轉字元串或檔案內容

rev - reverse lines of a file or files

反轉字元串

# echo hello | rev
olleh

# echo "hello world" | rev
dlrow olleh
		

反轉檔案內容

# rev /etc/passwd
hsab/nib/:toor/:toor:0:0:x:toor
nigolon/nibs/:nib/:nib:1:1:x:nib
nigolon/nibs/:nibs/:nomead:2:2:x:nomead
nigolon/nibs/:mda/rav/:mda:4:3:x:mda
nigolon/nibs/:dpl/loops/rav/:pl:7:4:x:pl
cnys/nib/:nibs/:cnys:0:5:x:cnys
nwodtuhs/nibs/:nibs/:nwodtuhs:0:6:x:nwodtuhs
tlah/nibs/:nibs/:tlah:0:7:x:tlah
nigolon/nibs/:liam/loops/rav/:liam:21:8:x:liam
nigolon/nibs/:pcuu/loops/rav/:pcuu:41:01:x:pcuu
nigolon/nibs/:toor/:rotarepo:0:11:x:rotarepo
nigolon/nibs/:semag/rsu/:semag:001:21:x:semag
nigolon/nibs/:rehpog/rav/:rehpog:03:31:x:rehpog
nigolon/nibs/:ptf/rav/:resU PTF:05:41:x:ptf
nigolon/nibs/:/:ydoboN:99:99:x:ydobon
nigolon/nibs/:ved/:renwo yromem elosnoc lautriv:96:96:x:ascv
nigolon/nibs/:ptn/cte/::83:83:x:ptn
nigolon/nibs/:htualsas/ytpme/rav/:"resu dhtualsaS":67:994:x:htualsas
nigolon/nibs/:xiftsop/loops/rav/::98:98:x:xiftsop
nigolon/nibs/:dhss/ytpme/rav/:HSS detarapes-egelivirP:47:47:x:dhss
hsab/nib/:lqsym/bil/rav/:revres LQSyM:994:894:x:lqsym
hsab/nib/:www/:noitacilppA beW:08:08:x:www
nigolon/nibs/:xnign/ehcac/rav/:resu xnign:894:794:x:xnign
		

5.8.14. TAB符號與空格處理

5.8.14.1. expand - convert tabs to spaces

轉換 TAB 字元為空格
root@netkiller /var/log % yum --showduplicates list httpd | expand
Repository epel is listed more than once in the configuration
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Available Packages
httpd.x86_64                    2.4.6-67.el7.centos                      os     
httpd.x86_64                    2.4.6-67.el7.centos.2                    updates
			

5.8.14.2. unexpand - convert spaces to tabs

轉換空格為TAB符
root@netkiller /var/log % cat /etc/fstab | unexpand -t 16
/dev/vda1	     /	          ext3	     noatime,acl,user_xattr 1 1
proc	     /proc	          proc	     defaults	           0 0
sysfs	     /sys	          sysfs	     noauto	           0 0
debugfs	     /sys/kernel/debug    debugfs    noauto	           0 0
devpts	     /dev/pts	          devpts     mode=0620,gid=5       0 0
			

將16個空格替換為一個TAB符