Migrating SAS files to CSV and database
Many organizations have been using SAS Software to analyse their data for many decades. As with most technologies organisations will move to alternative technologies are some point. Recently I’ve experienced this kind of move. In doing so any data stored in one format for the older technology needed to be moved or migrated to the newer technology. SAS Software can process data in a variety of format, one of which is their own internal formats. Thankfully Pandas in Python as a function to read such SAS files into a pandas dataframe, from which it can be saved into some alternative format such as CSV. The following code illustrates this conversion, where it will migrate all the SAS files in a particular directory, converts them to CSV and saves the new file in a destination directory. It also copies any existing CSV files to the new destination, while ignore any other files. The following code is helpful for batch processing of files.
import os import pandas as pd #define the Source and Destination directories source_dir='/Users/brendan.tierney/Dropbox/4-Datasets/SAS_Data_Sets' dest_dir=source_dir+'/csv' #What file extensions to read and convert file_ext='.sas7bdat' #Create output directory if it does not already exist if not os.path.exists(dest_dir): os.mkdir(dest_dir) else: #If directory exists, delete all files for filename in os.listdir(source_dir): os.remove(filename) #Process each file for filename in os.listdir(source_dir): #Process the SAS file if filename.endswith(file_ext): print('.processing file :',filename) print('...converting file to csv') df=pd.read_sas(os.path.join(source_dir, filename)) df.to_csv(os.path.join(dest_dir, filename)) print('.....finished creating CSV file') elif filename.endswith('csv'): #Copy any CSV files to the Destination Directory print('.copying CSV file') cmd_copy='cp '+os.path.join(source_dir, filename)+' '+os.path.join(dest_dir, filename) os.system(cmd_copy) print('.....finished copying CSV file') else: #Ignore any other type of files print('.ignoring file :',filename) print('--Finished--')
That’s it. All the files have now been converted and located in the destination directory.
For most, the above might be all you need, but sometimes you’ll need to move the the newer technology. In most cases the newer technology will easily use the CSV files. But in some instance your final destination might be a database. In my scenarios I use the CSV2DB app developed by Gerald Venzi. You can download the code from GitHub. You can use this to load CSV files into Oracle, MySQL, PostgreSQL, SQL Server and Db2 databases. Here’s and example of the command line to load into an Oracle Database.
csv2db load -f /Users/brendan.tierney/Dropbox/4-Datasets/SAS_Data_Sets/csv -t pva97nk -u analytics -p analytics -d PDB1