четверг, 11 апреля 2019 г.

How to combine multiple CSV files with 8 lines of code


Why do you need this?

Manually copy-pasting is fine if you don’t have too many files to work with.
But imagine if you have 100+ files to concatenate — are you willing to do it manually? Doing this repetitively is tedious and error-prone.
If all the files have the same table structure (same headers & number of columns), let this tiny Python script do the work.

Step 1: Import packages and set the working directory

Change “/mydir” to your desired working directory.
import os
import glob
import pandas as pd
os.chdir("/mydir")
Step 2: Use glob to match the pattern ‘csv’
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

Match the pattern (‘csv’) and save the list of file names in the ‘all_filenames’ variable. You can check out this link to learn more about regular expression matching.
Step 3: Combine all files in the list and export as CSV
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

Use pandas to concatenate all files in the list and export as CSV. The output file is named “combined_csv.csv” located in your working directory.
encoding = ‘utf-8-sig’ is added to overcome the issue when exporting ‘Non-English’ languages.
And…it’s done!

1 коммент.:

  1. # -*- coding: utf-8 -*-
    """
    Created on Sat Jan 12 06:56:16 2019
    @author: Chris
    """
    #credited:
    #https://stackoverflow.com/questions/9234560/find-all-csv-files-in-a-directory-using-python/12280052

    import os
    import glob
    import pandas as pd
    #set working directory
    os.chdir("/mydir")

    #find all csv files in the folder
    #use glob pattern matching -> extension = 'csv'
    #save result in list -> all_filenames
    extension = 'csv'
    all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
    #print(all_filenames)

    #combine all files in the list
    combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
    #export to csv
    combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

    ОтветитьУдалить