Change delimiter in a csv file and remove line breaks in fields

I wrote a script to convert delimiters in CSV files, eg, commas to pipes. I prefer pipe-delimited files because the the pipe-delimiter (|) will not clash data in the different fields 99.999% of the time. I also added the option to convert newline () and carriage return () characters in the data fields to spaces. This comes in handy when I use PROC IMPORT in SAS as line breaks cause it to choke.

Here’s my csvconvert.py script:

#! /usr/bin/env python

#### Command line arguments
import argparse
parser = argparse.ArgumentParser(description="Convert delimited file from one delimiter to another; defaults to converting CSV to pipe-delimited.")
parser.add_argument("--dlm-input", action="store", dest="dlm_in", default=",", required=False, help="delimiter of the input file; defaults to comma (,)", nargs='?', metavar="','")
parser.add_argument("--dlm-output", action="store", dest="dlm_out", default="|", required=False, help="delimiter of the output file; defaults to pipe (|)", nargs='?', metavar="'|'")
parser.add_argument("--remove-line-char", action="store_true", dest="remove_line_char", default=False, help="remove \\n and \\r characters in fields and replace with spaces")
parser.add_argument("--quote-char", action="store", dest="quote_char", default='"', required=False, help="quote character; defaults to double quote (\")", nargs='?', metavar="\"")
parser.add_argument("-i", "--input", action="store", dest="input", required=False, help="input file; if not specified, take from standard input.", nargs='?', metavar="file.csv")
parser.add_argument("-o", "--output", action="store", dest="output", required=False, help="output file; if not specified, write to standard output", nargs='?', metavar="file.pipe")
parser.add_argument("-v", "--verbose", action="store_true", dest="verbose", default=False, help="increase verbosity")
args  =  parser.parse_args()
# print args

# http://snipplr.com/view/45759/convert-csv-file-to-pipe-delineated-file/
import argparse
import csv
import sys
from signal import signal, SIGPIPE, SIG_DFL # http://stackoverflow.com/questions/14207708/ioerror-errno-32-broken-pipe-python
signal(SIGPIPE,SIG_DFL) ## no error when exiting a pipe like less

if args.input:
    csv_reader = csv.reader(open(args.input, 'rb'), delimiter=args.dlm_in, quotechar=args.quote_char)
else:
    csv_reader = csv.reader(sys.stdin, delimiter=args.dlm_in, quotechar=args.quote_char)

if args.output:
    h_outfile = open(args.output, 'wb')
else:
    h_outfile = sys.stdout

for row in csv_reader:
    row = args.dlm_out.join(row)
    if args.remove_line_char:
        row  =  row.replace('\n', ' ').replace('\r', ' ')
    h_outfile.write("%s\n" % (row))
    h_outfile.flush()
    # print row

Help description:

usage: csvconvert.py [-h] [--dlm-input [',']] [--dlm-output ['|']]
                     [--remove-line-char] [--quote-char ["]] [-i [file.csv]]
                     [-o [file.pipe]] [-v]

Convert delimited file from one delimiter to another; defaults to converting CSV to pipe-delimited.

optional arguments: -h, --help show this help message and exit --dlm-input [','] delimiter of the input file; defaults to comma (,) --dlm-output ['|'] delimiter of the output file; defaults to pipe (|) --remove-line-char remove \n and \r characters in fields and replace with spaces --quote-char ["] quote character; defaults to double quote (") -i [file.csv], --input [file.csv] input file; if not specified, take from standard input. -o [file.pipe], --output [file.pipe] output file; if not specified, write to standard output -v, --verbose increase verbosity

Usage:

cat myfile.csv | csvconvert.py --remove-line-char > myfile.pipe

About Vinh Nguyen

Statistician

3 comments

Leave a Reply to Vinh Nguyen Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>