Monday 24 July 2017

Data Science & Machine Learning - 4.7 Pandas Input Output

Hi friends,

Welcome to another post under Data Science & Machine Learning. In the previous post, we discussed various important methods supported by Pandas DataFrames. In this post, we will see another important feature of reading and writing data to and from Pandas DataFrames using various resources.

Note: All the commands discussed below are run in the Jupyter Notebook environment. See this post on Jupyter Notebook to know about it in detail. 

Pandas Input Output

To see the list of sources we can read data from into Pandas DataFrames, we type the pd.read_ in Jupyter Notebook and press Tab key. It shows the list of functions to read data from into the Pandas DataFrames.


Similarly, typing <df_name>.to_ and pressing the Tab key shows the list of functions to write data to various sources from a Pandas DataFrame.



Let's now see the usage of important ones.
  1. Using CSV files: 
    • The read_csv method is used to read data from csv files. Make sure that the csv file to be read from should be present in the current working directory. In the example below, I have a csv file named sample which I have read using the read_csv method.  


    • The to_csv method on the other hand is used to write data to csv files.


  2. Using Excel files: 

    • The read_excel method is used to read data from Microsoft Excel files. Once again, make sure that the Excel file to be read from should be present in the current working directory. In the example below, I have an excel file named sample2 which I have read using the read_excel method.  


    • The to_excel method on the other hand is used to write data from Pandas DataFrames to excel files


  3. Using HTML files: 

    • We can even read data from a webpage provided it is contained within the table HTML tag. The read_html method is used to read data from tables in a webpage. Here is an example of read_html which reads data from the following Wikipedia URL.  


      There are nine tables in the given Wikipedia URL which can be found by checking the length of the df variable


    We can view each of them by using the access mechanism as in case of Python Lists. For example, to view a portion of the third table, run the following command in a Jupyter Notebook cell:

We can also load data to a Pandas DataFrame from a sql file but I'll leave it to you guys in case you are interested. In the next post under Data Science & Machine Learning, we will use the concepts we have learnt till now to explore the Kaggle SF Salaries Dataset.
Share:

1 comment:

  1. Nice Blog.Thanks for sharing.
    For Online MBA check below.
    Innomatics Research Labs is collaborated with JAIN (Deemed-to-be University) and offering the Online MBA in Artificial Intelligence & Business Intelligence Program. It is a sublime program of getting an MBA degree from one of the best renowned university – JAIN University and an IBM certification program in Data Science, Artificial Intelligence, and Business Intelligence from Innomatics Research Labs in collaboration with Royal Society London.
    Online MBA in Business Intelligence
    Online MBA in Business Analytics
    Online MBA in Data Science

    ReplyDelete

Contact Me

Name

Email *

Message *

Popular Posts

Blog Archive