SAS Training Vol5. 新生代农名工必知细节

今天讲讲 SAS 对字符串的操作,主要是两类 function:handle blanks 和 concatenate character strings。
第一类
处理字符串中的空白,主要有这些 function:TRIM,TRIMN,STRIP,LEFT,COMPRESS,COMPBL。
通过例子一一来看:
trim vs. trimn
data sample;
input string $char14.;
datalines;
Mary Smith /* contains trailing blanks */
John Brown /* contains leading blanks */
Alice Park /* contains leading and trailing blanks */
Tom Wang /* contains leading, trailing and multiple blanks in between */
/* contains a blank string */
;
run;
data sample;
set sample;
original= "*" || string || "*";
trim= "*" || trim(string) || "*";
trimn= "*" || trimn(string) || "*";
run;
proc print data=sample(drop= string) noobs;
title2 "Output of TRIM and TRIMN";
run;

我们用星号来标注字符串位置,之所以这么做,是想提醒大家,当你设定 character variable length 时,若字符串本身达不到这个长度,剩下的位置会以空格补齐,注意这一点很重要,有时候能在 debug 的时候帮助你快速找到问题所在。我们看到,都是 remove trailing blanks,二者只有在处理第四行空字符串时有所差异,trim 对空字符串返回一个空格,trimn 作长度为 0 的字符串返回。
strip
data sample;
set sample;
strip= "*" || strip(string) || "*";
trim_left= "*" || trim(left(string)) || "*";
trimn_left= "*" || trimn(left(string)) || "*";
run;
proc print data= sample noobs;
title2 "Output of STRIP, TRIM(LEFT) and TRIMN(LEFT)";
var original strip trim_left trimn_left;
run;

这里要提的是,strip 同时 remove leading and trailing blanks,left 左对齐并 remove leading blanks,在这一点上,strip 等价于 trim(left),只不过,strip 要更为高效。
compress vs. compbl
先来看 compress:
data zipcode;
input zipcode $14.;
zipcode1= compress(zipcode); /* to remove blanks */
zipcode2= compress(zipcode,' ()?'); /* to remove blanks, () and ? */
zipcode3= compress(zipcode,'- ()?'); /* to remove dash, blanks, () and ? */
datalines;
22168- 12 34
22168- (1234?)
;
run;
proc print data= zipcode noobs;
title2 "Listing of Zipcodes";
run;

compress 可以删除指定字符,如空格、破折号、括号等,默认删除空格。但是要注意:一旦我们指定 compress 的第二个参数,它就不再默认删除空格,要想同时删除空格,必须要显式地指定参数含有空格。
data sample;
set sample;
compress= "*" || compress(string) || "*";
compbl= "*" || compbl(string) || "*";
run;
proc print data=sample noobs;
title2 "Output of COMPRESS and COMPBL";
var original compress compbl;
run;

compbl 与它类似,唯一不同的是,compbl 无论是处理多个空格还是单个空格,总要留下一个空格不处理。
第二类
处理完空格后,我们想自如地 concatenate 这些字符串,怎么办?借助 cat,catt,cats,catx。
当然,最简单的还是 concatenation operator ||,上面例子中,我们已经见识过它的级联作用了,它不对前后字符串作任何 leading、trailing blanks 的处理。当小规模的级联时,它还算方便,太多字符串复杂连接时,它还是 tedious。
cat,catt,cats,catx
data sample;
set sample;
length cat catt cats $16 catx $20;
text='Hello';
cat= cat('*',string,'*'); /* (= ||) */
catt= catt('*',string,'*'); /* (= TRIM || or TRIMN ||) */
cats= cats('*',string,'*'); /* (= STRIP ||)) */
catx= catx('!',text,string); /* (= STRIP || separator) */
run;
proc print data=sample noobs; var cat catt cats catx;
title2 "Output of Concatenation Functions";
run;

cat 其实相当于 ||,不对前后字符串作任何 leading、trailing blanks 的处理;catt 你当它是 trimn + ||;cats 你当作是 strip + ||;catx 就有意思了,不仅有 cats 的全部功能,还能够在相连的字符串之间插入分隔符。
注意:这四个函数输出字符串的默认长度是 200,如果想控制输出字符串的实际长度,最好提前 length 规定字符串的长度属性。
最后,来一个便捷操作:对 variable list 使用 of。
data mailing;
length mail_1-mail_3 $35;
address1= '123 Main St.';
address2= 'Eden';
address3= 'VT';
address4= '05060';
mail_1= catx(' ', of address1-address4); /* Insert space as separator */
mail_2= catx(',', of address1-address4); /* Insert comma as separator */
mail_3= catx(', ', of address1-address4); /* Insert comma and space as separator */
run;
proc print data=mailing noobs;
var mail_1 mail_2 mail_3;
title2 "Listing of Data Set: Mailing";
run;

