Fedora Users — Re: Text processing

On 2/3/06, Paul Howarth <paul@xxxxxxxxxxxx> wrote:
> Dan Track wrote:
> > Hi
> >
> > I've got the following output
> >
> > Col1    Col2   Col3       Col5
> > 1         000    001        Yes
> > 2         000    001
> > 3         000    001
> > 4         Yes                 Yes
> > 4         000    001
> > 4         000    001
> > 5         000    001
> > 5         Yes    001
> > 6         000    001        Yes
> >
> > As you can see the column widths vary in size. What I need to do is to
> > find out The number in Col1 that is associated with all those "Yes"
> > occurrences in Col5. How can I do this.
> > I've tried the following
> > cat file | tr -s ' ' ' ' | tr -s '\t' ' ' | cut -d ' ' -f 6
> >
> > But I get a result like this
> >
> > Hi
> >
> > I've got the following output
> >
> > Col1 Col2 Col3 Col5
> > 1 000 001 Yes
> > 2 000 001
> > 3 000 001
> > 4 Yes Yes
> > 4 000 001
> > 4 000 001
> > 5 000 001
> > 5 Yes 001
> > 6 000 001 Yes
> >
> > As you can see one of the "Yes" statements has moved into the third
> > column, so that's a wrong move.
> >
> > Any help would be appreciated
>
> The problem here I think is that some of your columns are empty, so for
> instance:
>
> Col1    Col2   Col3       Col5
> 4         Yes                 Yes
>
> appears the same as:
>
> Col1    Col2   Col3       Col5
> 4       Yes    Yes
>
> to most Unix text-processing tools that separate fields based on whitespace.
>
> If you're actually looking for lines where the last field is "Yes", you
> could just do:
>
> $ awk '$NF == "Yes"' file
>
> If all you want is the number in the first field, you'd have:
>
> $ awk '$NF == "Yes" { print $1 }' file
>


Great, excellent. That did the trick

Thanks
Dan