{"id":11037,"date":"2018-11-24T18:20:34","date_gmt":"2018-11-24T15:20:34","guid":{"rendered":"https:\/\/railsware.com\/blog\/?p=11037"},"modified":"2021-08-17T11:00:48","modified_gmt":"2021-08-17T08:00:48","slug":"python-for-machine-learning-pandas-axis-explained","status":"publish","type":"post","link":"https:\/\/railsware.com\/blog\/python-for-machine-learning-pandas-axis-explained\/","title":{"rendered":"Python for Machine Learning: Pandas Axis Explained"},"content":{"rendered":"\n<p class=\"intro-text\">Pandas is a powerful library in a toolbox for every Machine Learning engineer. It provides two main data structures: <a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/generated\/pandas.Series.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Series<\/a> and <a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/generated\/pandas.DataFrame.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">DataFrame<\/a>.<\/p>\n\n\n\n<p>Many API calls of these types accept cryptical &#8220;axis&#8221; parameter. This parameter is poorly described in Pandas\u2019 documentation, though it has a key significance for using the library efficiently. The goal of the article is to fill in this gap and to provide a solid understanding of what the &#8220;axis&#8221; parameter is and how to use it in various use cases including leading-edge <a href=\"https:\/\/railsware.com\/services\/ai-machine-learning-consulting\/\" target=\"_blank\" rel=\"noopener noreferrer\">artificial intelligence applications<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"2400\" height=\"1260\" src=\"https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/PythonMLPandas-illustration.jpg\" alt=\"Pandas Axis Usage in Machine Learning\" class=\"wp-image-11106\" srcset=\"https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/PythonMLPandas-illustration.jpg 2400w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/PythonMLPandas-illustration-360x189.jpg 360w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/PythonMLPandas-illustration-768x403.jpg 768w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/PythonMLPandas-illustration-1024x538.jpg 1024w\" sizes=\"auto, (max-width: 2400px) 100vw, 2400px\" \/><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Axis in Series<\/h2>\n\n\n\n<p><em>Series<\/em> is a one-dimensional array of values. Under the hood, it uses <a href=\"https:\/\/www.numpy.org\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">NumPy<\/a> <a href=\"https:\/\/www.numpy.org\/devdocs\/reference\/arrays.ndarray.html\">ndarray<\/a>. That is where the term &#8220;axis&#8221; came from. <em>NumPy<\/em> uses it quite frequently because <em>ndarray<\/em> can have a lot of dimensions.<\/p>\n\n\n\n<p><em>Series<\/em> object has only &#8220;axis 0&#8221; because it has only one dimension.<br><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11041\" src=\"https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/pandas-series-axis.png\" alt=\"\" width=\"1138\" height=\"486\" srcset=\"https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/pandas-series-axis.png 1138w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/pandas-series-axis-360x154.png 360w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/pandas-series-axis-768x328.png 768w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/pandas-series-axis-1024x437.png 1024w\" sizes=\"auto, (max-width: 1138px) 100vw, 1138px\" \/><br>The arrow on the image displays &#8220;axis 0&#8221; and its direction for the <em>Series<\/em> object.<\/p>\n\n\n\n<p>Usually, in Python, one-dimensional structures are displayed as a row of values. On the contrary, here we see that <em>Series<\/em> is displayed as a column of values.<\/p>\n\n\n\n<p>Each cell in <em>Series<\/em> is accessible via index value along the &#8220;axis 0&#8221;. For our <em>Series<\/em> object indexes are: 0, 1, 2, 3, 4. Here is an example of accessing different values:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import pandas as pd\n>>> srs = pd.Series(['red', 'green', 'blue', 'white', 'black'])\n>>> srs[0]\n'red'\n>>> srs[3]\n'white'\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Axes in DataFrame<\/h2>\n\n\n\n<p><em>DataFrame<\/em> is a two-dimensional data structure akin to SQL table or Excel spreadsheet. It has columns and rows. Its columns are made of separate <em>Series<\/em> objects. Let&#8217;s see an example:<br><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11040\" src=\"https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/data-frame-axes.png\" alt=\"\" width=\"1172\" height=\"634\" srcset=\"https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/data-frame-axes.png 1172w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/data-frame-axes-360x195.png 360w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/data-frame-axes-768x415.png 768w, https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/data-frame-axes-1024x554.png 1024w\" sizes=\"auto, (max-width: 1172px) 100vw, 1172px\" \/><\/p>\n\n\n\n<p>A <em>DataFrame<\/em> object has two axes: &#8220;axis 0&#8221; and &#8220;axis 1&#8221;. &#8220;axis 0&#8221; represents rows and &#8220;axis 1&#8221; represents columns. Now it&#8217;s clear that <em>Series<\/em> and <em>DataFrame<\/em> share the same direction for &#8220;axis 0&#8221; &#8211; it goes along rows direction.<\/p>\n\n\n\n<p>Our <em>DataFrame<\/em> object has 0, 1, 2, 3, 4 indexes along the &#8220;axis 0&#8221;, and additionally, it has &#8220;axis 1&#8221; indexes which are: <em>&#8216;a&#8217;<\/em> and <em>&#8216;b&#8217;<\/em>.<\/p>\n\n\n\n<p>To access an element within <em>DataFrame<\/em> we need to provide two indexes (one per each axis). Also, instead of bare brackets, we need to use <em>.loc<\/em> method:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import pandas as pd\n>>> srs_a = pd.Series([1,3,6,8,9])\n>>> srs_b = pd.Series(['red', 'green', 'blue', 'white', 'black'])\n>>> df = pd.DataFrame({'a': srs_a, 'b': srs_b})\n>>> df.loc[2, 'b']\n'blue'\n>>> df.loc[3, 'a']\n8\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Using \u201caxis\u201d parameter in API calls<\/h2>\n\n\n\n<p>There are a lot of different API calls for <a href=\"\/\/pandas.pydata.org\/pandas-docs\/stable\/generated\/pandas.Series.html\u201d\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Series<\/a> and <a href=\"\/\/pandas.pydata.org\/pandas-docs\/stable\/generated\/pandas.DataFrame.html\u201d\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">DataFrame<\/a> objects which accept &#8220;axis&#8221; parameter. <em>Series<\/em> object has only one axis, so this parameter always equals <em> 0 <\/em> for it. Thus, you can omit it, because it does not affect the result:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import pandas as pd\n>>> srs = pd.Series([1, 3, pd.np.nan, 4, pd.np.nan])\n>>> srs.dropna()\n0    1.0\n1    3.0\n3    4.0\ndtype: float64\n>>> srs.dropna(axis=0)\n0    1.0\n1    3.0\n3    4.0\ndtype: float64\n<\/pre>\n\n\n\n<p>On the contrary, <em>DataFrame<\/em> has two axes, and &#8220;axis&#8221; parameter determines along which axis an operation should be performed. For example, <em>.sum<\/em> can be applied along &#8220;axis 0&#8221;. That means, <em>.sum<\/em> operation calculates a sum for each column:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import pandas as pd\n>>> srs_a = pd.Series([10,30,60,80,90])\n>>> srs_b = pd.Series([22, 44, 55, 77, 101])\n>>> df = pd.DataFrame({'a': srs_a, 'b': srs_b})\n>>> df\n    a    b\n0  10   22\n1  30   44\n2  60   55\n3  80   77\n4  90  101\n>>> df.sum(axis=0)\na    270\nb    299\ndtype: int64\n<\/pre>\n\n\n\n<p>We see, that having sum with <em>axis=0<\/em> smashed all values along the direction of the &#8220;axis 0&#8221; and left only columns(<em>&#8216;a&#8217;<\/em> and <em>&#8216;b&#8217;<\/em>) with appropriate sums.<\/p>\n\n\n\n<p>With <em>axis=1<\/em> it produces a sum for each row:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> df.sum(axis=1)\n0     32\n1     74\n2    115\n3    157\n4    191\ndtype: int64\n<\/pre>\n\n\n\n<p>If you prefer regular names instead of numbers, each axis has a string alias. &#8220;axis 0&#8221; has two aliases: <em>&#8216;index&#8217;<\/em> and <em>&#8216;rows&#8217;<\/em>. &#8220;axis 1&#8221; has only one: <em>&#8216;columns&#8217;<\/em>. You can use these aliases instead of numbers:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> df.sum(axis='index')\na    270\nb    299\ndtype: int64\n>>> df.sum(axis='rows')\na    270\nb    299\ndtype: int64\n>>> df.sum(axis='columns')\n0     32\n1     74\n2    115\n3    157\n4    191\ndtype: int64\n<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Dropping NaN values<\/h3>\n\n\n\n<p>Let&#8217;s build a simple <em>DataFrame<\/em> with <em>NaN<\/em> values and observe how axis affects <em>.dropna<\/em> method:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import pandas as pd\n>>> import numpy as np\n>>> df = pd.DataFrame({'a': [2, np.nan, 8, 3], 'b': [np.nan, 32, 15, 7], 'c': [-3, 5, 22, 19]})\n>>> df\n     a     b   c\n0  2.0   NaN  -3\n1  NaN  32.0   5\n2  8.0  15.0  22\n3  3.0   7.0  19\n>>> df.dropna(axis=0)\n     a     b   c\n2  8.0  15.0  22\n3  3.0   7.0  19\n<\/pre>\n\n\n\n<p>Here <em>.dropna<\/em> filters out any row(we are moving along &#8220;axis 0&#8221;) which contains <em>NaN<\/em> value.<\/p>\n\n\n\n<p>Let&#8217;s use &#8220;axis 1&#8221; direction:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> df.dropna(axis=1)\n    c\n0  -3\n1   5\n2  22\n3  19\n<\/pre>\n\n\n\n<p>Now <em>.dropna<\/em> collapsed &#8220;axis 1&#8221; and removed all columns with <em>NaN<\/em> values. Columns <em>&#8216;a&#8217;<\/em> and <em>&#8216;b&#8217;<\/em> contained <em>NaN<\/em> values, thus only <em>&#8216;c&#8217;<\/em> column was left.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Concatenation<\/h3>\n\n\n\n<p>Concatenation function with <em>axis=0<\/em> stacks the first <em>DataFrame<\/em> over the second:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import pandas as pd\n>>> df1 = pd.DataFrame({'a': [1,3,6,8,9], 'b': ['red', 'green', 'blue', 'white', 'black']})\n>>> df2 = pd.DataFrame({'a': [0,2,4,5,7], 'b': ['jun', 'jul', 'aug', 'sep', 'oct']})\n>>> pd.concat([df1, df2], axis=0)\n   a      b\n0  1    red\n1  3  green\n2  6   blue\n3  8  white\n4  9  black\n0  0    jun\n1  2    jul\n2  4    aug\n3  5    sep\n4  7    oct\n<\/pre>\n\n\n\n<p>With <em>axis=1<\/em> both DataFrames are put along each other:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> pd.concat([df1, df2], axis=1)\n   a      b  a    b\n0  1    red  0  jun\n1  3  green  2  jul\n2  6   blue  4  aug\n3  8  white  5  sep\n4  9  black  7  oct\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p><em>Pandas<\/em> borrowed the &#8220;axis&#8221; concept from <em>NumPy<\/em> library. The &#8220;axis&#8221; parameter does not have any influence on a <em>Series<\/em> object because it has only one axis. On the contrary, <em>DataFrame<\/em> API heavily relies on the parameter, because it&#8217;s a two-dimensional data structure, and many operations can be performed along different axes producing totally different results.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pandas, a powerful library for Python, is a must-have tool for every machine learning developer. Check out the hands-on explanation of the Pandas \u201caxis\u201d parameter and how to use it in various cases<\/p>\n","protected":false},"author":25,"featured_media":11105,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3],"tags":[],"coauthors":["Sergii Boiko"],"class_list":["post-11037","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development"],"acf":[],"aioseo_notices":[],"categories_data":[{"name":"Engineering","link":"https:\/\/railsware.com\/blog?category=development"}],"post_thumbnails":"https:\/\/railsware.com\/blog\/wp-content\/uploads\/2018\/11\/PythonMLPandas-illustration.jpg","amp_enabled":true,"_links":{"self":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/11037","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/users\/25"}],"replies":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/comments?post=11037"}],"version-history":[{"count":20,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/11037\/revisions"}],"predecessor-version":[{"id":14219,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/11037\/revisions\/14219"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/media\/11105"}],"wp:attachment":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/media?parent=11037"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/categories?post=11037"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/tags?post=11037"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/coauthors?post=11037"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}