On 2/3/06, Paul Howarth <paul@xxxxxxxxxxxx> wrote: > Dan Track wrote: > > Hi > > > > I've got the following output > > > > Col1 Col2 Col3 Col5 > > 1 000 001 Yes > > 2 000 001 > > 3 000 001 > > 4 Yes Yes > > 4 000 001 > > 4 000 001 > > 5 000 001 > > 5 Yes 001 > > 6 000 001 Yes > > > > As you can see the column widths vary in size. What I need to do is to > > find out The number in Col1 that is associated with all those "Yes" > > occurrences in Col5. How can I do this. > > I've tried the following > > cat file | tr -s ' ' ' ' | tr -s '\t' ' ' | cut -d ' ' -f 6 > > > > But I get a result like this > > > > Hi > > > > I've got the following output > > > > Col1 Col2 Col3 Col5 > > 1 000 001 Yes > > 2 000 001 > > 3 000 001 > > 4 Yes Yes > > 4 000 001 > > 4 000 001 > > 5 000 001 > > 5 Yes 001 > > 6 000 001 Yes > > > > As you can see one of the "Yes" statements has moved into the third > > column, so that's a wrong move. > > > > Any help would be appreciated > > The problem here I think is that some of your columns are empty, so for > instance: > > Col1 Col2 Col3 Col5 > 4 Yes Yes > > appears the same as: > > Col1 Col2 Col3 Col5 > 4 Yes Yes > > to most Unix text-processing tools that separate fields based on whitespace. > > If you're actually looking for lines where the last field is "Yes", you > could just do: > > $ awk '$NF == "Yes"' file > > If all you want is the number in the first field, you'd have: > > $ awk '$NF == "Yes" { print $1 }' file > Great, excellent. That did the trick Thanks Dan