R語言學習Rcpp基礎知識全面整理

Posted on 2021-11-06 by WalkonNet

1. 相關配置和說明

由於Dirk的書Seamless R and C++ Integration with Rcpp是13年出版的，當時Rcpp Attributes這一特性還沒有被CRAN批準，所以當時調用和編寫Rcpp函數還比較繁瑣。Rcpp Attributes（2016）極大簡化瞭這一過程(“provides an even more direct connection between C++ and R”)，保留瞭內聯函數，並提供瞭sourceCpp函數用於調用外部的.cpp文件。換句話說，我們可以將某C++函數存在某個.cpp文件中，再從R腳本文件中，像使用source一樣，通過sourceCpp來調用此C++函數。

例如，在R腳本文件中，我們希望調用名叫test.cpp文件中的函數，我們可以采用如下操作：

library(Rcpp)
Sys.setenv("PKG_CXXFLAGS"="-std=c++11")
sourceCpp("test.cpp")

其中第二行的意思是使用C++11的標準來編譯文件。

在test.cpp文件中, 頭文件使用Rcpp.h，需要輸出到R中的函數放置在//[[Rcpp::export]]之後。如果要輸出到R中的函數需要調用其他C++函數，可以將這些需要調用的函數放在//[[Rcpp::export]]之前。

#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]

為進行代數計算，Rcpp提供瞭RcppArmadillo和RcppEigen。如果要使用此包，需要在函數文件開頭註明依賴關系，例如// [[Rcpp::depends(RcppArmadillo)]]，並載入相關頭文件：

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]

C++的基本知識可以參見此處。

2. 常用數據類型

關鍵字	描述
int/double/bool/String/auto	整數型/數值型/佈爾值/字符型/自動識別(C++11)
IntegerVector	整型向量
NumericVector	數值型向量(元素的類型為double)
ComplexVector	復數向量 Not Sure
LogicalVector	邏輯型向量； R的邏輯型變量可以取三種值：TRUE, FALSE, NA；而C++佈爾值隻有兩個,true or false。如果將R的NA轉化為C++中的佈爾值，則會返回true。
CharacterVector	字符型向量
ExpressionVector	vectors of expression types
RawVector	vectors of type raw
IntegerMatrix	整型矩陣
NumericMatrix	數值型矩陣(元素的類型為double)
LogicalMatrix	邏輯型矩陣
CharacterMatrix	字符矩陣
List aka GenericVector	列表；lists;類似於R中列表，其元素可以使任何數據類型
DataFrame	數據框；data frames；在Rcpp內部，數據框其實是通過列表實現的
Function	函數型
Environment	環境型；可用於引用R環境中的函數、其他R包中的函數、操作R環境中的變量
RObject	可以被R識別的類型

註釋：

某些R對象可以通過as<Some_RcppObject>(Some_RObject)轉化為轉化為Rcpp對象。例如:
在R中擬合一個線性模型（其為List），並將其傳入C++函數中

>mod=lm(Y~X);

NumericVector resid = as<NumericVector>(mod["residuals"]);
NumericVector fitted = as<NumericVector>(mod["fitted.values"]);

可以通過as<some_STL_vector>(Some_RcppVector)，將NumericVector轉換為std::vector。例如：

std::vector<double> vec;
vec = as<std::vector<double>>(x);

在函數中，可以用wrap()，將std::vector轉換為NumericVector。例如：

arma::vec long_vec(16,arma::fill::randn);
vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
NumericVector output = wrap(long_vec2);

在函數返回時，可以使用wrap()，將C++ STL類型轉化為R可識別類型。示例見後面輸入和輸出示例部分。

以上數據類型除瞭Environment之外（Function不確定），大多可直接作為函數返回值，並被自動轉化為R對象。

算數和邏輯運算符號+, -, *, /, ++, --, pow(x,p), <, <=, >, >=, ==, !=。邏輯關系符號&&, ||, !。

3. 常用數據類型的建立

//1. Vector
NumericVector V1(n);//創立瞭一個長度為n的默認初始化的數值型向量V1。
NumericVector V2=NumericVector::create(1, 2, 3); //創立瞭一個數值型向量V2，並初始化使其含有三個數1，2，3。
LogicalVector V3=LogicalVector::create(true,false,R_NaN);//創立瞭一個邏輯型變量V3。如果將其轉化為R Object，則其含有三個值TRUE, FALSE, NA。
//2. Matrix
NumericMatrix M1(nrow,ncol);//創立瞭一個nrow*ncol的默認初始化的數值型矩陣。
//3. Multidimensional Array
NumericVector out=NumericVector(Dimension(2,2,3));//創立瞭一個多維數組。然而我不知道有什麼卵用。。
//4. List
NumericMatrix y1(2,2);
NumericVector y2(5);
List L=List::create(Named("y1")=y1,
                    Named("y2")=y2);

//5. DataFrame
NumericVector a=NumericVector::create(1,2,3);
CharacterVector b=CharacterVector::create("a","b","c");
std::vector<std::string> c(3);
c[0]="A";c[1]="B";c[2]="C";
DataFrame DF=DataFrame::create(Named("col1")=a,
                               Named("col2")=b,
                               Named("col3")=c);

4. 常用數據類型元素訪問

元素訪問	描述
[n]	對於向量類型或者列表，訪問第n個元素。對於矩陣類型，首先把矩陣的下一列接到上一列之下，從而構成一個長列向量，並訪問第n個元素。不同於R，n從0開始。
(i,j)	對於矩陣類型，訪問第(i,j)個元素。不同於R，i和j從0開始。不同於向量，此處用圓括號。
List[“name1”]/DataFrame[“name2”]	訪問List中名為name1的元素/訪問DataFrame中，名為name2的列。

5. 成員函數

成員函數	描述
X.size()	返回X的長度；適用於向量或者矩陣，如果是矩陣，則先向量化
X.push_back(a)	將a添加進X的末尾；適用於向量
X.push_front(b)	將b添加進X的開頭；適用於向量
X.ncol()	返回X的列數
X.nrow()	返回X的行數

6. 語法糖

6.1 算術和邏輯運算符

+, -, *, /, pow(x,p), <, <=, >, >=, ==, !=, !

以上運算符均可向量化。

6.2. 常用函數

is.na()
Produces a logical sugar expression of the same length. Each element of the result expression evaluates to TRUE if the corresponding input is a missing value, or FALSE otherwise.

seq_len()
seq_len( 10 ) will generate an integer vector from 1 to 10 (Note: not from 0 to 9), which is very useful in conjugation withsapply() and lapply().

pmin(a,b) and pmax(a,b)
a and b are two vectors. pmin()(or pmax()) compares the i <script type=”math/tex” id=”MathJax-Element-1″>i</script>th elements of a and b and return the smaller (larger) one.

ifelse()
ifelse( x > y, x+y, x-y ) means if x>y is true, then do the addition; otherwise do the subtraction.

sapply()
sapply applies a C++ function to each element of the given expression to create a new expression. The type of the resulting expression is deduced by the compiler from the result type of the function.

The function can be a free C++ function such as the overload generated by the template function below:

template <typename T>
T square( const T& x){
    return x * x ;
}
sapply( seq_len(10), square<int> ) ;

Alternatively, the function can be a functor whose type has a nested type called result_type

template <typename T>
struct square : std::unary_function<T,T> {
    T operator()(const T& x){
    return x * x ;
    }
}
sapply( seq_len(10), square<int>() ) ;

lappy()
lapply is similar to sapply except that the result is allways an list expression (an expression of type VECSXP).

sign()

其他函數

數學函數: abs(), acos(), asin(), atan(), beta(), ceil(), ceiling(), choose(), cos(), cosh(), digamma(), exp(), expm1(), factorial(), floor(), gamma(), lbeta(), lchoose(), lfactorial(), lgamma(), log(), log10(), log1p(), pentagamma(), psigamma(), round(), signif(), sin(), sinh(), sqrt(), tan(), tanh(), tetragamma(), trigamma(), trunc().
匯總函數: mean(), min(), max(), sum(), sd(), and (for vectors) var()
返回向量的匯總函數: cumsum(), diff(), pmin(), and pmax()
查找函數: match(), self_match(), which_max(), which_min()
重復值處理函數: duplicated(), unique()

7. STL

Rcpp可以使用C++的標準模板庫STL中的數據結構和算法。Rcpp也可以使用Boost中的數據結構和算法。

7.1. 迭代器

此處僅僅以一個例子代替，詳細參見C++ Primer，或者此處。

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum3(NumericVector x) {
  double total = 0;
  NumericVector::iterator it;
  for(it = x.begin(); it != x.end(); ++it) {
    total += *it;
  }
  return total;
}

7.2. 算法

頭文件<algorithm>中提供瞭許多的算法（可以和迭代器共用），具體可以參見此處。

For example, we could write a basic Rcpp version of findInterval() that takes two arguments a vector of values and a vector of breaks, and locates the bin that each x falls into.

#include <algorithm>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector findInterval2(NumericVector x, NumericVector breaks) {
  IntegerVector out(x.size());
  NumericVector::iterator it, pos;
  IntegerVector::iterator out_it;
  for(it = x.begin(), out_it = out.begin(); it != x.end(); 
      ++it, ++out_it) {
    pos = std::upper_bound(breaks.begin(), breaks.end(), *it);
    *out_it = std::distance(breaks.begin(), pos);
  }
  return out;
}

7.3. 數據結構

STL所提供的數據結構也是可以使用的，Rcpp知道如何將STL的數據結構轉換成R的數據結構，所以可以從函數中直接返回他們，而不需要自己進行轉換。
具體請參考此處。

7.3.1. Vectors

詳細信息請參見處此

創建
vector<int>, vector<bool>, vector<double>, vector<String>

元素訪問
利用標準的[]符號訪問元素

元素增加
利用.push_back()增加元素。

存儲空間分配
如果事先知道向量長度，可用.reserve()分配足夠的存儲空間。

例子：

The following code implements run length encoding (rle()). It produces two vectors of output: a vector of values, and a vector lengths giving how many times each element is repeated. It works by looping through the input vector x comparing each value to the previous: if it’s the same, then it increments the last value in lengths; if it’s different, it adds the value to the end of values, and sets the corresponding length to 1.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List rleC(NumericVector x) {
  std::vector<int> lengths;
  std::vector<double> values;

  // Initialise first value
  int i = 0;
  double prev = x[0];
  values.push_back(prev);
  lengths.push_back(1);

  NumericVector::iterator it;
  for(it = x.begin() + 1; it != x.end(); ++it) {
    if (prev == *it) {
      lengths[i]++;
    } else {
      values.push_back(*it);
      lengths.push_back(1);

      i++;
      prev = *it;
    }
  }
  return List::create(
    _["lengths"] = lengths, 
    _["values"] = values
  );
}

7.3.2. Sets

參見鏈接1，鏈接2和鏈接3。

STL中的集合std::set不允許元素重復，而std::multiset允許元素重復。集合對於檢測重復和確定不重復的元素具有重要意義((like unique, duplicated, or in))。

Ordered set: std::set和std::multiset。

Unordered set: std::unordered_set
一般而言unordered set比較快，因為它們使用的是hash table而不是tree的方法。
unordered_set<int>, unordered_set<bool>, etc

7.3.3. Maps

與table()和match()關系密切。

Ordered map: std::map

Unordered map: std::unordered_map

Since maps have a value and a key, you need to specify both types when initialising a map:

map<double, int>, unordered_map<int, double>.

8. 與R環境的互動

通過EnvironmentRcpp可以獲取當前R全局環境(Global Environment)中的變量和載入的函數，並可以對全局環境中的變量進行修改。我們也可以通過Environment獲取其他R包中的函數，並在Rcpp中使用。

獲取其他R包中的函數

Rcpp::Environment stats("package:stats");
Rcpp::Function rnorm = stats["rnorm"];
return rnorm(10, Rcpp::Named("sd", 100.0));

獲取R全局環境中的變量並進行更改
假設R全局環境中有一個向量x=c(1,2,3)，我們希望在Rcpp中改變它的值。

Rcpp::Environment global = Rcpp::Environment::global_env();//獲取全局環境並賦值給Environment型變量global
Rcpp::NumericVector tmp = global["x"];//獲取x
tmp=pow(tmp,2);//平方
global["x"]=tmp;//將新的值賦予到全局環境中的x

獲取R全局環境中的載入的函數
假設全局環境中有R函數funR，其定義為：

x=c(1,2,3);
funR<-function(x){
  return (-x);
}

並有R變量x=c(1,2,3)。我們希望在Rcpp中調用此函數並應用在向量x上。

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector funC() {
  Rcpp::Environment global =
    Rcpp::Environment::global_env();
  Rcpp::Function funRinC = global["funR"];
  Rcpp::NumericVector tmp = global["x"];
  return funRinC(tmp);
}

9. 用Rcpp創建R包

見此文

利用Rcpp和RcppArmadillo創建R包

10. 輸入和輸出示例

如何傳遞數組

如果要傳遞高維數組，可以將其存為向量，並附上維數信息。有兩種方式：

通過.attr(“dim”)設置維數

NumericVector可以包含維數信息。數組可以用過NumericVector輸出到R中。此NumericVector可以通過.attr(“dim”)設置其維數信息。

// Dimension最多設置三個維數
output.attr("dim") = Dimension(3,4,2);
// 可以給.attr(“dim”)賦予一個向量，則可以設置超過三個維數
NumericVector dim = NumericVector::create(2,2,2,2);
output.attr("dim") = dim;

示例：

// 返回一個3*3*2數組
RObject func(){
  arma::vec long_vec(18,arma::fill::randn);
  vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
  NumericVector output = wrap(long_vec2);
  output.attr("dim")=Dimension(3,3,2);
  return wrap(output);
}

// 返回一個2*2*2*2數組 
// 註意con_to<>::from()
RObject func(){
  arma::vec long_vec(16,arma::fill::randn);
  vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
  NumericVector output = wrap(long_vec2);
  NumericVector dim = NumericVector::create(2,2,2,2);
  output.attr("dim")=dim;
  return wrap(output);
}

另外建立一個向量存維數，在R中再通過.attr(“dim”)設置維數

函數返回一維STL vector

自動轉化為R中的向量

vector<double> func(NumericVector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return vec;
}
NumericVector func(NumericVector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return wrap(vec);
}
RObject func(NumericVector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return wrap(vec);
}

函數返回二維STL vector

自動轉化為R中的list，list中的每個元素是一個vector。

vector<vector<double>> func(NumericVector x) {
  vector<vector<double>> mat;
  for (int i=0;i!=3;++i){
    mat.push_back(as<vector<double>>(x));
  }
  return mat;
}
RObject func(NumericVector x) {
  vector<vector<double>> mat;
  for (int i=0;i!=3;++i){
    mat.push_back(as<vector<double> >(x));
  }
  return wrap(mat);
}

返回Armadillo matrix, Cube 或 field

自動轉化為R中的matrix

NumericMatrix func(){
  arma::mat A(3,4,arma::fill::randu);
  return wrap(A);
}
arma::mat func(){
  arma::mat A(3,4,arma::fill::randu);
  return A;
}

自動轉化為R中的三維array

arma::cube func(){
  arma::cube A(3,4,5,arma::fill::randu);
  return A;
}
RObject func(){
  arma::cube A(3,4,5,arma::fill::randu);
  return wrap(A);
}

自動轉化為R list，每個元素存儲一個R向量，但此向量有維數信息（通過.Internal(inspect())查詢）。

RObject func() {
  arma::cube A(3,4,2,arma::fill::randu);
  arma::cube B(3,4,2,arma::fill::randu);
  arma::field <arma::cube> F(2,1);
  F(0)=A;
  F(1)=B;
  return wrap(F);
}

參考文獻：

Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. Springer Publishing Company, Incorporated. ·

Allaire, J.J. (2016). Rcpp Attributes.

Eddelbuettel, D. (2016). Rcpp syntactic sugar.

http://adv-r.had.co.nz/Rcpp.html

http://www.rcpp.org/

http://blog.csdn.net/a358463121

http://www.runoob.com/cplusplus/cpp-operators.html

如需引用，請註明出處。

以上就是R語言學習Rcpp知識全面整理的詳細內容，更多關於Rcpp知識全面整理的資料請關註WalkonNet其它相關文章！

R語言學習Rcpp基礎知識全面整理

目錄

1. 相關配置和說明

2. 常用數據類型

3. 常用數據類型的建立

4. 常用數據類型元素訪問

5. 成員函數

6. 語法糖

6.1 算術和邏輯運算符

6.2. 常用函數

7. STL

7.1. 迭代器

7.2. 算法

7.3. 數據結構

7.3.1. Vectors

7.3.2. Sets

7.3.3. Maps

8. 與R環境的互動

9. 用Rcpp創建R包

10. 輸入和輸出示例

如何傳遞數組

通過.attr(“dim”)設置維數

函數返回一維STL vector

函數返回二維STL vector

返回Armadillo matrix, Cube 或 field

參考文獻：

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

1. 相關配置和說明

2. 常用數據類型

3. 常用數據類型的建立

4. 常用數據類型元素訪問

5. 成員函數

6. 語法糖

6.1 算術和邏輯運算符

6.2. 常用函數

7. STL

7.1. 迭代器

7.2. 算法

7.3. 數據結構

7.3.1. Vectors

7.3.2. Sets

7.3.3. Maps

8. 與R環境的互動

9. 用Rcpp創建R包

10. 輸入和輸出示例

如何傳遞數組

通過.attr(“dim”)設置維數

函數返回一維STL vector

函數返回二維STL vector

返回Armadillo matrix, Cube 或 field

參考文獻：

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆