Sunday 23 July 2017

Data Science & Machine Learning - 4.5 Pandas SQL Functionalities

Hi friends,

In the previous post under the series of posts under Data Science & Machine Learning, we learnt how to fill in the missing values in the Pandas DataFrames. In this post, we will see the SQL functionalities of Groupby, Merge and Join supported by Pandas DataFrames.

Note: All the commands discussed below are run in the Jupyter Notebook environment. See this post on Jupyter Notebook to know about it in detail.

Pandas SQL Functionalities

So, let's begin with the Groupby functionality first. Groupby helps in grouping large data frames by different columns and applying summary functions on each group. Let's go through an example now to understand it better.


In the above example taken from here, we can see that the first table is grouped based on the column state and then various aggregate function like the max(), min() and average() and applied on it.

Let's now go through some examples using Jupyter Notebook. So, let's first create a sample DataFrame.


In the sample DataFrame created above, we have four columns namely, company, name, profile and CTC. Now, if we want to compute the average CTC of employees based on the company column then we would do the following:


We can also find the average CTC based on the profile:


We can also apply other aggregate functions like sum(), standard deviation (std), min(), max(), etc.




Pandas provide a describe() method that lists these basic measures all together: 


Let's now see the concatenate and join operations on DataFrames. Before we can perform the concatenate and join operations, we first need to create a couple of sample DataFrames.


Now, we concatenate these two DataFrames using the concat() method:


Notice that the DataFrames are concatenated along the rows by default. We can concatenate them along the columns as well by setting axis parameter to 1.


We can also merge the two columns based on the SQL notion of join operation using the merge() method by providing the key parameter:


We can also apply advanced SQL operations of joining the DataFrames such as inner join, outer join and so on. You can refer this link in case you are interested. In the next post, we will see the list of useful methods we can apply on DataFrames.
Share:

0 comments:

Post a Comment

Contact Me

Name

Email *

Message *

Popular Posts

Blog Archive