Working with wide text files at the command line

Use the cut command.

It comes with Linux systems and you can download it for Windows as part of GOW.

You can see the first 30 characters of the first few lines by piping the output of head to cut.

head data.

csv | cut -c -30This shows “GEO_ID”,”NAME”,”DP05_0001E”,” “id”,”Geographic Area Name”,”E “8600000US01379″,”ZCTA5 01379” “8600000US01440″,”ZCTA5 01440” “8600000US01505″,”ZCTA5 01505” “8600000US01524″,”ZCTA5 01524” “8600000US01529″,”ZCTA5 01529” “8600000US01583″,”ZCTA5 01583” “8600000US01588″,”ZCTA5 01588” “8600000US01609″,”ZCTA5 01609” which is much more useful.

The syntax -30 says to show up to the 30th character.

You could do the opposite with 30- to show everything starting with the 30th character.

And you can show a range, such as 20-30 to show the 20th through 30th characters.

You can also use cut to pick out fields with the -f option.

The default delimiter is tab, but our file is delimited with commas so we need to add -d, to tell it to split fields on commas.

We could see just the second column of data, for example, with head data.

csv | cut -d, -f 2This produces “NAME” “Geographic Area Name” “ZCTA5 01379” “ZCTA5 01440” “ZCTA5 01505” “ZCTA5 01524” “ZCTA5 01529” “ZCTA5 01583” “ZCTA5 01588” “ZCTA5 01609” You can also specify a range of fields, say by replacing 2 with 3-4 to see the third and fourth columns.

The humble cut command is a good one to have in your toolbox.

RelatedDaily Unix tool tips on TwitterSparsely populated zip codes .

. More details

Leave a Reply