Fedora Users — Re: Text processing

Dan Track wrote:

Hi

I've got the following output

Col1    Col2   Col3       Col5
1         000    001        Yes
2         000    001
3         000    001
4         Yes                 Yes
4         000    001
4         000    001
5         000    001
5         Yes    001
6         000    001        Yes

As you can see the column widths vary in size. What I need to do is to
find out The number in Col1 that is associated with all those "Yes"
occurrences in Col5. How can I do this.
I've tried the following
cat file | tr -s ' ' ' ' | tr -s '\t' ' ' | cut -d ' ' -f 6

But I get a result like this

Hi

I've got the following output

Col1 Col2 Col3 Col5
1 000 001 Yes
2 000 001
3 000 001
4 Yes Yes
4 000 001
4 000 001
5 000 001
5 Yes 001
6 000 001 Yes

As you can see one of the "Yes" statements has moved into the third
column, so that's a wrong move.

Any help would be appreciated

The problem here I think is that some of your columns are empty, so forinstance:


Col1    Col2   Col3       Col5
4         Yes                 Yes

appears the same as:

Col1    Col2   Col3       Col5
4       Yes    Yes

to most Unix text-processing tools that separate fields based on whitespace.

If you're actually looking for lines where the last field is "Yes", youcould just do:


$ awk '$NF == "Yes"' file

If all you want is the number in the first field, you'd have:

$ awk '$NF == "Yes" { print $1 }' file

Paul.