R語言 使用ggplot2繪制好看的分組散點圖

我們以iris數據集為例,該數據集包括花萼的長度和寬度,花瓣的長度和寬度,以及物種,如下圖:

本文我們要繪制不同物種下花萼的長度和寬度的分佈情況,以及二者之間的相關性關系。

1. 首先載入ggplot2包,

library(ggplot2)

2. 然後進行ggplot(data = NULL, mapping = aes(), …, environment = parent.frame())繪制,在繪制中第一個參數是數據,第二個參數是數據映射,是繪制的全局變量,其中包含的參數有x,y,color,size,alpha,shape等。

例如:ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)),然後通過快捷散點繪制

+geom_point(size = 2.0, shape = 16),顏色代表不同的物種,如下圖:

3. 上面顯示的是最原始的散點繪制,通過顏色區分不同的物種,那麼如何進行效果的提升呢?

首先是可以進行分面,使得不同物種的對比效果更為顯著,這裡使用+facet_wrap( ~ Species),效果如下:

4. 通過分面後對比效果好瞭不少,如果想看下不同物種下花萼長度與寬度的關系呢?可以使用+geom_smooth(method = “loess”),效果圖如下:

5. 通過上面的操作效果好瞭很多,但是還是感覺不夠高大上,那我們可以使用library(ggthemes)這個包進行精修一下,通過修改theme,使用+theme_solarized(),效果如下:

還有更多的theme選擇,例如+theme_wsj(),效果如下:

這樣我們的圖是不是高大上瞭很多呢,所以其實數據可視化也沒有多難。最後給下源碼:

library(ggthemes)
library(ggplot2)

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
 geom_point(size = 2.0, shape = 16) +
 facet_wrap( ~ Species) +
 geom_smooth(method = "loess")+

 theme_wsj()

補充:R語言 畫圖神器ggplot2包

ggplot2

R語言裡畫圖最好用的包啦。感覺圖都挺清晰的,就懶得加文字瞭(或者以後回來補吧>.)前面幾個圖挺基礎的,後面也許會有沒見過的ggplot用法哦。

Install Package

install.packages("ggplot2")
library(ggplot2)

Scatter Plot

為瞭方便展示,用gapminder的數據

if(!require(gapminder)) install.packages("gapminder")
 library(gapminder)
gapminder

數據大概是這樣的

假設我們現在想要知道2007年lifeExp和人均GDP之間的關系。

先篩選數據

library(dplyr)
gapminder_2007 <- gapminder %>%
 filter(year == 2007)

畫lifeExp和gdpPercap關系的散點圖,x為gdpPercap,y為lifeExp。

ggplot(gapminder_2007,aes(x = gdpPercap, y = lifeExp))+geom_point()

看的出來lifeExp與gdpPercap存在近似lifeExp=log(gdpPercap)的關系,對x軸的數值進行log值處理。另外,為瞭呈現更多信息,用顏色標記國傢所在的洲,並用點的大小表示人口數量。

ggplot(gapminder_2007,aes(x = gdpPercap, y = lifeExp, color = continent, size = pop))+
 geom_point()+scale_x_log10()+theme_minimal()+
 labs(x = "GDP per capita",
 y = "Life expectancy",
 title = "Life expectancy increases as GDP per capita increases",
 caption = "Data source: gapminder")

另外一種呈現方式如下:

加入瞭回歸線和坐標軸的histogram。

plot <- ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp)) + 
 geom_point()+geom_smooth(method="lm")+scale_x_log10()+
 labs(x = "GDP per capita",
 y = "Life expectancy",
 title = "Life expectancy increases as GDP per capita increases",
 caption = "Data source: gapminder")
ggMarginal(plot, type = "histogram", fill="transparent")
#ggMarginal(plot, type = "boxplot", fill="transparent")

Histogram

gapminder_gdp2007 <- gapminder %>%
 filter(year == 2007, continent == "Americas") %>%
 mutate(country = fct_reorder(country,gdpPercap,last))
ggplot(gapminder_gdp2007, aes(x=country, y = gdpPercap))+
 geom_col(fill="skyblue", color="black")+
 labs(x = "Country",
 y = "GDP per capita",
 title = "GDP per capita in North America and South America, 2007",
 caption = "Data source: gapminder")+
 coord_flip()+theme_minimal()

Line Plot

gapminder_pop <- gapminder %>%
 filter(country %in% c("United States","China"))
ggplot(gapminder_pop,aes(x = year, y = pop, color = country))+
 geom_line(lwd = 0.8)+theme_light()+
 labs(x = "Year",
 y = "Population",
 title = "Population in China and United States, 1953-2007",
 caption = "Data source: gapminder")

Facet Plot

gapminder_gdp <- gapminder %>%
 group_by(year, continent) %>%
 summarize(avg_gdp = mean(gdpPercap))
ggplot(gapminder_gdp,aes(x = year, y = avg_gdp, color = continent))+
 geom_line(lwd = 0.8)+theme_light()+facet_wrap(~continent)+
 labs(x = "Year",
 y = "Average GDP per capita",
 title = "Average GDP per capita change in different continent",
 caption = "Data source: gapminder")+
 scale_x_continuous(breaks=c(1955,1970,1985,2000))

Path Plot

gapminder_lifeexp <- gapminder %>%
 filter(year %in% c(1957,2007), continent == "Europe") %>%
 arrange(year) %>%
 mutate(country = fct_reorder(country,lifeExp,last))
ggplot(gapminder_lifeexp) +geom_path(aes(x = lifeExp, y = country),
 arrow = arrow(length = unit(1.5, "mm"), type = "closed")) +
 geom_text(
 aes(x = lifeExp,
 y = country,
 label = round(lifeExp, 1),
 hjust = ifelse(year == 2007,-0.2,1.2)),
 size =3,
 family = "Bookman",
 color = "gray25")+
 scale_x_continuous(limits=c(45, 85))+
 labs(
 x = "Life expectancy",
 y = "Country",
 title = "People live longer in 2007 compared to 1957",
 subtitle = "Life expectancy in European countries",
 caption = "Data source: gapminder"
 )
 

Density Plot

gapminder_1992 <- gapminder %>%
 filter(year == 1992)
ggplot(gapminder_1992, aes(lifeExp))+theme_classic()+
 geom_density(aes(fill=factor(continent)), alpha=0.8) + 
 labs(
 x="Life expectancy",
 title="Life expectancy group by continent, 1992", 
 caption="Data source: gapminder",
 fill="Continent")

Slope Chart

gapminder_lifeexp2 <- gapminder %>%
 filter(year %in% c(1977,1987,1997,2007),
 country %in% c("Canada", "United States","Mexico","Haiti","El Salvador",
  "Guatemala","Jamaica")) %>%
 mutate(lifeExp = round(lifeExp))
ylabs <- subset(gapminder_lifeexp2, year==head(year,1))$country
yvals <- subset(gapminder_lifeexp2, year==head(year,1))$lifeExp
ggplot(gapminder_lifeexp2, aes(x=as.factor(year),y=lifeExp)) +
 geom_line(aes(group=country),colour="grey80") +
 geom_point(colour="white",size=8) +
 geom_text(aes(label=lifeExp), size=3, color = "black") +
 scale_y_continuous(name="", breaks=yvals, labels=ylabs)+
 theme_classic()+
 labs(title="Life Expectancy of some North America countries change from 1977 to 2007") + 
 theme(axis.title=element_blank(),
 axis.ticks = element_blank(),
 plot.title = element_text(hjust=0.5))

以上為個人經驗,希望能給大傢一個參考,也希望大傢多多支持WalkonNet。如有錯誤或未考慮完全的地方,望不吝賜教。