matplotlib中plt.hist()參數解釋及應用實例

一、plt.hist()參數詳解

簡介:
plt.hist():直方圖,一種特殊的柱狀圖。
將統計值的范圍分段,即將整個值的范圍分成一系列間隔,然後計算每個間隔中有多少值。
直方圖也可以被歸一化以顯示“相對”頻率。 然後,它顯示瞭屬於幾個類別中的每個類別的占比,其高度總和等於1。

import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.pyplot import MultipleLocator
from matplotlib import ticker
%matplotlib inline


plt.hist(x, bins=None, range=None, density=None, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, normed=None, *, data=None, **kwargs)

常用參數解釋:
x: 作直方圖所要用的數據,必須是一維數組;多維數組可以先進行扁平化再作圖;必選參數;
bins: 直方圖的柱數,即要分的組數,默認為10;
range:元組(tuple)或None;剔除較大和較小的離群值,給出全局范圍;如果為None,則默認為(x.min(), x.max());即x軸的范圍;
density:佈爾值。如果為true,則返回的元組的第一個參數n將為頻率而非默認的頻數;
weights:與x形狀相同的權重數組;將x中的每個元素乘以對應權重值再計數;如果normed或density取值為True,則會對權重進行歸一化處理。這個參數可用於繪制已合並的數據的直方圖;
cumulative:佈爾值;如果為True,則計算累計頻數;如果normed或density取值為True,則計算累計頻率;
bottom:數組,標量值或None;每個柱子底部相對於y=0的位置。如果是標量值,則每個柱子相對於y=0向上/向下的偏移量相同。如果是數組,則根據數組元素取值移動對應的柱子;即直方圖上下便宜距離;
histtype:{‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’};'bar’是傳統的條形直方圖;'barstacked’是堆疊的條形直方圖;'step’是未填充的條形直方圖,隻有外邊框;‘stepfilled’是有填充的直方圖;當histtype取值為’step’或’stepfilled’,rwidth設置失效,即不能指定柱子之間的間隔,默認連接在一起;
align:{‘left’, ‘mid’, ‘right’};‘left’:柱子的中心位於bins的左邊緣;‘mid’:柱子位於bins左右邊緣之間;‘right’:柱子的中心位於bins的右邊緣;
orientation:{‘horizontal’, ‘vertical’}:如果取值為horizontal,則條形圖將以y軸為基線,水平排列;簡單理解為類似bar()轉換成barh(),旋轉90°;
rwidth:標量值或None。柱子的寬度占bins寬的比例;
log:佈爾值。如果取值為True,則坐標軸的刻度為對數刻度;如果log為True且x是一維數組,則計數為0的取值將被剔除,僅返回非空的(frequency, bins, patches);
color:具體顏色,數組(元素為顏色)或None。
label:字符串(序列)或None;有多個數據集時,用label參數做標註區分;
stacked:佈爾值。如果取值為True,則輸出的圖為多個數據集堆疊累計的結果;如果取值為False且histtype=‘bar’或’step’,則多個數據集的柱子並排排列;
normed: 是否將得到的直方圖向量歸一化,即顯示占比,默認為0,不歸一化;不推薦使用,建議改用density參數;
edgecolor: 直方圖邊框顏色;
alpha: 透明度;

返回值(用參數接收返回值,便於設置數據標簽):
n:直方圖向量,即每個分組下的統計值,是否歸一化由參數normed設定。當normed取默認值時,n即為直方圖各組內元素的數量(各組頻數);
bins: 返回各個bin的區間范圍;
patches:返回每個bin裡面包含的數據,是一個list。
其他參數與plt.bar()類似。

二、plt.hist()簡單應用

import matplotlib.pyplot as plt
%matplotlib inline


# 最簡單,隻傳遞x,組數,寬度,范圍
plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left')
plt.show()

三、plt.bar()綜合應用

import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.pyplot import MultipleLocator
from matplotlib import ticker
%matplotlib inline


plt.figure(figsize=(8,5), dpi=80)
# 拿參數接收hist返回值,主要用於記錄分組返回的值,標記數據標簽
n, bins, patches = plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left', label='xx直方圖')
for i in range(len(n)):
    plt.text(bins[i], n[i]*1.02, int(n[i]), fontsize=12, horizontalalignment="center") #打標簽,在合適的位置標註每個直方圖上面樣本數
plt.ylim(0,16000)
plt.title('直方圖')
plt.legend()
# plt.savefig('直方圖'+'.png')
plt.show()

附官方參數解釋

Parameters
----------
x : (n,) array or sequence of (n,) arrays
    Input values, this takes either a single array or a sequence of
    arrays which are not required to be of the same length.

bins : int or sequence or str, optional
    If an integer is given, ``bins + 1`` bin edges are calculated and
    returned, consistent with `numpy.histogram`.

    If `bins` is a sequence, gives bin edges, including left edge of
    first bin and right edge of last bin.  In this case, `bins` is
    returned unmodified.

    All but the last (righthand-most) bin is half-open.  In other
    words, if `bins` is::

        [1, 2, 3, 4]

    then the first bin is ``[1, 2)`` (including 1, but excluding 2) and
    the second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which
    *includes* 4.

    Unequally spaced bins are supported if *bins* is a sequence.

    With Numpy 1.11 or newer, you can alternatively provide a string
    describing a binning strategy, such as 'auto', 'sturges', 'fd',
    'doane', 'scott', 'rice' or 'sqrt', see
    `numpy.histogram`.

    The default is taken from :rc:`hist.bins`.

range : tuple or None, optional
    The lower and upper range of the bins. Lower and upper outliers
    are ignored. If not provided, *range* is ``(x.min(), x.max())``.
    Range has no effect if *bins* is a sequence.

    If *bins* is a sequence or *range* is specified, autoscaling
    is based on the specified bin range instead of the
    range of x.

    Default is ``None``

density : bool, optional
    If ``True``, the first element of the return tuple will
    be the counts normalized to form a probability density, i.e.,
    the area (or integral) under the histogram will sum to 1.
    This is achieved by dividing the count by the number of
    observations times the bin width and not dividing by the total
    number of observations. If *stacked* is also ``True``, the sum of
    the histograms is normalized to 1.

    Default is ``None`` for both *normed* and *density*. If either is
    set, then that value will be used. If neither are set, then the
    args will be treated as ``False``.

    If both *density* and *normed* are set an error is raised.

weights : (n, ) array_like or None, optional
    An array of weights, of the same shape as *x*.  Each value in *x*
    only contributes its associated weight towards the bin count
    (instead of 1).  If *normed* or *density* is ``True``,
    the weights are normalized, so that the integral of the density
    over the range remains 1.

    Default is ``None``.

    This parameter can be used to draw a histogram of data that has
    already been binned, e.g. using `np.histogram` (by treating each
    bin as a single point with a weight equal to its count) ::

        counts, bins = np.histogram(data)
        plt.hist(bins[:-1], bins, weights=counts)

    (or you may alternatively use `~.bar()`).

cumulative : bool, optional
    If ``True``, then a histogram is computed where each bin gives the
    counts in that bin plus all bins for smaller values. The last bin
    gives the total number of datapoints. If *normed* or *density*
    is also ``True`` then the histogram is normalized such that the
    last bin equals 1. If *cumulative* evaluates to less than 0
    (e.g., -1), the direction of accumulation is reversed.
    In this case, if *normed* and/or *density* is also ``True``, then
    the histogram is normalized such that the first bin equals 1.

    Default is ``False``

bottom : array_like, scalar, or None
    Location of the bottom baseline of each bin.  If a scalar,
    the base line for each bin is shifted by the same amount.
    If an array, each bin is shifted independently and the length
    of bottom must match the number of bins.  If None, defaults to 0.

    Default is ``None``

histtype : {'bar', 'barstacked', 'step',  'stepfilled'}, optional
    The type of histogram to draw.

    - 'bar' is a traditional bar-type histogram.  If multiple data
      are given the bars are arranged side by side.

    - 'barstacked' is a bar-type histogram where multiple
      data are stacked on top of each other.

    - 'step' generates a lineplot that is by default
      unfilled.

    - 'stepfilled' generates a lineplot that is by default
      filled.

    Default is 'bar'

align : {'left', 'mid', 'right'}, optional
    Controls how the histogram is plotted.

        - 'left': bars are centered on the left bin edges.

        - 'mid': bars are centered between the bin edges.

        - 'right': bars are centered on the right bin edges.

    Default is 'mid'

orientation : {'horizontal', 'vertical'}, optional
    If 'horizontal', `~matplotlib.pyplot.barh` will be used for
    bar-type histograms and the *bottom* kwarg will be the left edges.

rwidth : scalar or None, optional
    The relative width of the bars as a fraction of the bin width.  If
    ``None``, automatically compute the width.

    Ignored if *histtype* is 'step' or 'stepfilled'.

    Default is ``None``

log : bool, optional
    If ``True``, the histogram axis will be set to a log scale. If
    *log* is ``True`` and *x* is a 1D array, empty bins will be
    filtered out and only the non-empty ``(n, bins, patches)``
    will be returned.

    Default is ``False``

color : color or array_like of colors or None, optional
    Color spec or sequence of color specs, one per dataset.  Default
    (``None``) uses the standard line color sequence.

    Default is ``None``

label : str or None, optional
    String, or sequence of strings to match multiple datasets.  Bar
    charts yield multiple patches per dataset, but only the first gets
    the label, so that the legend command will work as expected.

    default is ``None``

stacked : bool, optional
    If ``True``, multiple data are stacked on top of each other If
    ``False`` multiple data are arranged side by side if histtype is
    'bar' or on top of each other if histtype is 'step'

    Default is ``False``

normed : bool, optional
    Deprecated; use the density keyword argument instead.

Returns
-------
n : array or list of arrays
    The values of the histogram bins. See *density* and *weights* for a
    description of the possible semantics.  If input *x* is an array,
    then this is an array of length *nbins*. If input is a sequence of
    arrays ``[data1, data2,..]``, then this is a list of arrays with
    the values of the histograms for each of the arrays in the same
    order.  The dtype of the array *n* (or of its element arrays) will
    always be float even if no weighting or normalization is used.

bins : array
    The edges of the bins. Length nbins + 1 (nbins left edges and right
    edge of last bin).  Always a single array even when multiple data
    sets are passed in.

patches : list or list of lists
    Silent list of individual patches used to create the histogram
    or list of such list if multiple input datasets.

Other Parameters
----------------
**kwargs : `~matplotlib.patches.Patch` properties

See also
--------
hist2d : 2D histograms

Notes
-----


.. note::
    In addition to the above described arguments, this function can take a
    **data** keyword argument. If such a **data** argument is given, the
    following arguments are replaced by **data[<arg>]**:

    * All arguments with the following names: 'weights', 'x'.

    Objects passed as **data** must support item access (``data[<arg>]``) and
    membership test (``<arg> in data``).

到此這篇關於matplotlib中plt.hist()參數解釋及應用實例的文章就介紹到這瞭,更多相關matplotlib plt.hist()參數內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!

推薦閱讀: