PostgreSQL 自定義自動類型轉換操作(CAST)

背景

PostgreSQL是一個強類型數據庫,因此你輸入的變量、常量是什麼類型,是強綁定的,例如

在調用操作符時,需要通過操作符邊上的數據類型,選擇對應的操作符。

在調用函數時,需要根據輸入的類型,選擇對應的函數。

如果類型不匹配,就會報操作符不存在,或者函數不存在的錯誤。

postgres=# select '1' + '1'; 
ERROR: operator is not unique: unknown + unknown 
LINE 1: select '1' + '1'; 
     ^ 
HINT: Could not choose a best candidate operator. You might need to add explicit type casts. 

那麼使用起來是不是很不方便呢?

PostgreSQL開放瞭類型轉換的接口,同時也內置瞭很多的自動類型轉換。來簡化操作。

查看目前已有的類型轉換:

postgres=# \dC+ 
            List of casts 
   Source type   |   Target type   |  Function  | Implicit? | Description 
-----------------------------+-----------------------------+--------------------+---------------+------------- 
 "char"      | character     | bpchar    | in assignment | 
 "char"      | character varying   | text    | in assignment | 
 "char"      | integer      | int4    | no   | 
 "char"      | text      | text    | yes   | 
 abstime      | date      | date    | in assignment | 
 abstime      | integer      | (binary coercible) | no   | 
 abstime      | time without time zone  | time    | in assignment | 
 
 ................................ 
 
 timestamp without time zone | timestamp with time zone | timestamptz  | yes   | 
 timestamp without time zone | timestamp without time zone | timestamp   | yes   | 
 xml       | character     | (binary coercible) | in assignment | 
 xml       | character varying   | (binary coercible) | in assignment | 
 xml       | text      | (binary coercible) | in assignment | 
(246 rows) 

如果你發現有些類型轉換沒有內置,怎麼辦呢?我們可以自定義轉換。

當然你也可以使用這種語法,對類型進行強制轉換:

CAST(x AS typename) 
 
 or 
 
x::typename 

如何自定義類型轉換(CAST)

自定義CAST的語法如下:

CREATE CAST (source_type AS target_type) 
 WITH FUNCTION function_name [ (argument_type [, ...]) ] 
 [ AS ASSIGNMENT | AS IMPLICIT ] 
 
CREATE CAST (source_type AS target_type) 
 WITHOUT FUNCTION 
 [ AS ASSIGNMENT | AS IMPLICIT ] 
 
CREATE CAST (source_type AS target_type) 
 WITH INOUT 
 [ AS ASSIGNMENT | AS IMPLICIT ] 

解釋:

1、WITH FUNCTION,表示轉換需要用到什麼函數。

2、WITHOUT FUNCTION,表示被轉換的兩個類型,在數據庫的存儲中一致,即物理存儲一致。例如text和varchar的物理存儲一致。不需要轉換函數。

Two types can be binary coercible, 
which means that the conversion can be performed “for free” without invoking any function. 
 
This requires that corresponding values use the same internal representation. 
 
For instance, the types text and varchar are binary coercible both ways. 
 
Binary coercibility is not necessarily a symmetric relationship. 
 
For example, the cast from xml to text can be performed for free in the present implementation, 
but the reverse direction requires a function that performs at least a syntax check. 
 
(Two types that are binary coercible both ways are also referred to as binary compatible.) 

3、WITH INOUT,表示使用內置的IO函數進行轉換。每一種類型,都有INPUT 和OUTPUT函數。使用這種方法,好處是不需要重新寫轉換函數。

除非有特殊需求,我們建議直接使用IO函數來進行轉換。

        List of functions 
 Schema |  Name  | Result data type | Argument data types | Type 
------------+-----------------+------------------+---------------------+-------- 
 pg_catalog | textin   | text    | cstring    | normal 
 pg_catalog | textout   | cstring   | text    | normal 
 pg_catalog | date_in   | date    | cstring    | normal 
 pg_catalog | date_out  | cstring   | date    | normal 
You can define a cast as an I/O conversion cast by using the WITH INOUT syntax. 
 
An I/O conversion cast is performed by invoking the output function of the source data type, 
and passing the resulting string to the input function of the target data type. 
 
In many common cases, this feature avoids the need to write a separate cast function for conversion. 
 
An I/O conversion cast acts the same as a regular function-based cast; only the implementation is different. 

4、AS ASSIGNMENT,表示在賦值時,自動對類型進行轉換。例如字段類型為TEXT,輸入的類型為INT,那麼可以創建一個 cast(int as text) as ASSIGNMENT。

If the cast is marked AS ASSIGNMENT then it can be invoked implicitly when assigning a value to a column of the target data type. 
 
For example, supposing that foo.f1 is a column of type text, then: 
 
INSERT INTO foo (f1) VALUES (42); 
 
will be allowed if the cast from type integer to type text is marked AS ASSIGNMENT, 
otherwise not. 
 
(We generally use the term assignment cast to describe this kind of cast.) 

5、AS IMPLICIT,表示在表達式中,或者在賦值操作中,都對類型進行自動轉換。(包含瞭AS ASSIGNMENT,它隻對賦值進行轉換)

If the cast is marked AS IMPLICIT then it can be invoked implicitly in any context, 
whether assignment or internally in an expression. 
 
(We generally use the term implicit cast to describe this kind of cast.) 
 
For example, consider this query: 
 
SELECT 2 + 4.0; 
 
The parser initially marks the constants as being of type integer and numeric respectively. 
 
There is no integer + numeric operator in the system catalogs, but there is a numeric + numeric operator. 
 
The query will therefore succeed if a cast from integer to numeric is available and is marked AS IMPLICIT — 
which in fact it is. 
 
The parser will apply the implicit cast and resolve the query as if it had been written 
 
SELECT CAST ( 2 AS numeric ) + 4.0; 

6、註意,AS IMPLICIT需要謹慎使用,為什麼呢?因為操作符會涉及到多個算子,如果有多個轉換,目前數據庫並不知道應該選擇哪個?

Now, the catalogs also provide a cast from numeric to integer. 
 
If that cast were marked AS IMPLICIT — (which it is not — ) 
 
then the parser would be faced with choosing between the above interpretation and 
the alternative of casting the numeric constant to integer and applying the integer + integer operator. 
 
Lacking any knowledge of which choice to prefer, it would give up and declare the query ambiguous. 
 
The fact that only one of the two casts is implicit is the way in which we teach the parser to prefer resolution of 
a mixed numeric-and-integer expression as numeric; 
 
there is no built-in knowledge about that. 

因此,建議謹慎使用AS IMPLICIT。建議使用AS IMPLICIT的CAST應該是非失真轉換轉換,例如從INT轉換為TEXT,或者int轉換為numeric。

而失真轉換,不建議使用as implicit,例如numeric轉換為int。

It is wise to be conservative about marking casts as implicit. 
 
An overabundance of implicit casting paths can cause PostgreSQL to choose surprising interpretations of commands, 
or to be unable to resolve commands at all because there are multiple possible interpretations. 
 
A good rule of thumb is to make a cast implicitly invokable only for information-preserving 
transformations between types in the same general type category. 
 
For example, the cast from int2 to int4 can reasonably be implicit, 
but the cast from float8 to int4 should probably be assignment-only. 
 
Cross-type-category casts, such as text to int4, are best made explicit-only. 

註意事項 + 例子

不能嵌套轉換。例子

1、將text轉換為date

錯誤方法

create or replace function text_to_date(text) returns date as $$ 
 select cast($1 as date); 
$$ language sql strict; 
 
create cast (text as date) with function text_to_date(text) as implicit; 

嵌套轉換後出現死循環

postgres=# select text '2017-01-01' + 1; 
ERROR: stack depth limit exceeded 
HINT: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate. 
CONTEXT: SQL function "text_to_date" during startup 
SQL function "text_to_date" statement 1 
SQL function "text_to_date" statement 1 
SQL function "text_to_date" statement 1 
...... 

正確方法

create or replace function text_to_date(text) returns date as $$   
 select to_date($1,'yyyy-mm-dd'); 
$$ language sql strict; 
 
create cast (text as date) with function text_to_date(text) as implicit; 
postgres=# select text '2017-01-01' + 1; 
 ?column? 
------------ 
 2017-01-02 
(1 row) 

我們還可以直接使用IO函數來轉換:

postgres=# create cast (text as date) with inout as implicit;
CREATE CAST
 
postgres=# select text '2017-01-01' + 1;
 ?column? 
------------
 2017-01-02
(1 row)

補充:PostgreSQL 整型int與佈爾boolean的自動轉換設置(含自定義cast與cast規則介紹)

背景

在使用數據庫時,經常會遇到一些因為客戶端輸入的類型與數據庫定義的類型不匹配導致的錯誤問題。

例如數據庫定義的是佈爾類型,而輸入的是整型:

postgres=# create table cas_test(id int, c1 boolean); 
CREATE TABLE 
 
postgres=# \set VERBOSITY verbose 
postgres=# insert into cas_test values (1, int '1'); 
ERROR: 42804: column "c1" is of type boolean but expression is of type integer 
LINE 1: insert into cas_test values (1, int '1'); 
           ^ 
HINT: You will need to rewrite or cast the expression. 
LOCATION: transformAssignedExpr, parse_target.c:591 

又或者數據庫定義的是時間,用戶輸入的是字符串:

postgres=# create table tbl123(id int, crt_time timestamp); 
CREATE TABLE 
 
postgres=# insert into tbl123 values (1, text '2017-01-01 10:00:00'); 
ERROR: column "crt_time" is of type timestamp without time zone but expression is of type text 
LINE 1: insert into tbl123 values (1, text '2017-01-01 10:00:00'); 
           ^ 
HINT: You will need to rewrite or cast the expression. 

從錯誤提示來看,數據庫已經很清晰的告訴你為什麼瞭。那麼怎麼讓數據庫自動轉換呢?

PostgreSQL有一個語法,支持數據類型的轉換(賦值、參數、表達式 等位置的自動轉換)。

postgres=# \h create cast 
Command:  CREATE CAST 
Description: define a new cast 
Syntax: 
CREATE CAST (source_type AS target_type) 
 WITH FUNCTION function_name [ (argument_type [, ...]) ] 
 [ AS ASSIGNMENT | AS IMPLICIT ] 
 
CREATE CAST (source_type AS target_type) 
 WITHOUT FUNCTION 
 [ AS ASSIGNMENT | AS IMPLICIT ] 
 
CREATE CAST (source_type AS target_type) 
 WITH INOUT 
 [ AS ASSIGNMENT | AS IMPLICIT ] 

數據庫內置瞭很多轉換法則:

postgres=# \dC 
            List of casts 
   Source type   |   Target type   |   Function   | Implicit?  
-----------------------------+-----------------------------+---------------------------+--------------- 
 abstime      | date      | date      | in assignment 
 abstime      | integer      | (binary coercible)  | no 
 abstime      | timestamp without time zone | timestamp     | yes 
 ........ 
 integer      | boolean      | bool      | no 

類型的自動轉換實際上也是有一定的規則的,例如 賦值、參數 算是兩種規則。具體含義見如下文檔:

《PostgreSQL 自定義自動類型轉換(CAST)》

我們看到整型轉佈爾是有內置的轉換規則的,那麼為什麼沒有自動轉呢?

postgres=# \dC 
            List of casts 
   Source type   |   Target type   |   Function   | Implicit?  
-----------------------------+-----------------------------+---------------------------+--------------- 
 integer      | boolean      | bool      | no 

和自動轉換的規則有關,no表示不會自動轉換,隻有當我們強制指定轉換時,才會觸發轉換的動作:

postgres=# select cast ((int '1') as boolean); 
 bool 
------ 
 t 
(1 row) 

pg_cast裡面的context轉換為可讀的內容(e表示no, a表示assignment, 否則表示implicit)

如果讓數據庫賦值時自動將字符串轉換為時間,自動將整型轉換為佈爾

1、如果數據庫已經內置瞭轉換規則,那麼可以通過更新系統表的方式,修改自動轉換規則。

例如,將這個INT轉BOOLEAN的規則,修改為assignment的規則。

postgres=# update pg_cast set castcontext='a' where castsource ='integer'::regtype and casttarget='boolean'::regtype; 
UPDATE 1 

修改後,我們再查看這個轉換規則,就變成這樣瞭

\dC 
            List of casts 
   Source type   |   Target type   |   Function   | Implicit?  
-----------------------------+-----------------------------+---------------------------+--------------- 
 integer      | boolean      | bool      | in assignment 

現在你可以將int自動寫入為BOOLEAN瞭。

postgres=# create table cas_test(id int, c1 boolean); 
CREATE TABLE 
postgres=# insert into cas_test values (1, int '1'); 
INSERT 0 1 

2、如果系統中沒有兩種類型轉換的CAST規則,那麼我們需要自定義一個。

例如

postgres=# create cast (text as timestamp) with inout as ASSIGNMENT; 
CREATE CAST 
 
            List of casts 
   Source type   |   Target type   |   Function   | Implicit?  
-----------------------------+-----------------------------+---------------------------+--------------- 
 text      | timestamp without time zone | (binary coercible)  | in assignment 

這樣就可以自動將TEXT轉換為TIMESTAMP瞭。

postgres=# insert into tbl123 values (1, text '2017-01-01 10:00:00'); 
INSERT 0 1 
postgres=# select * from tbl123; 
 id |  crt_time   
----+--------------------- 
 1 | 2017-01-01 10:00:00 
(1 row) 

刪掉這個轉換,就會報錯。

postgres=# drop cast (text as timestamp); 
DROP CAST 
postgres=# insert into tbl123 values (1, text '2017-01-01 10:00:00'); 
ERROR: column "crt_time" is of type timestamp without time zone but expression is of type text 
LINE 1: insert into tbl123 values (1, text '2017-01-01 10:00:00'); 
           ^ 
HINT: You will need to rewrite or cast the expression. 

3、如果沒有內置的轉換函數,我們可能需要自定義轉換函數來支持這種轉換。

例子

自定義一個函數,用於輸入TEXT,返回TIMESTAMPTZ

postgres=# create or replace function cast_text_to_timestamp(text) returns timestamptz as $$ 
 select to_timestamp($1, 'yyyy-mm-dd hh24:mi:ss'); 
$$ language sql strict ; 
CREATE FUNCTION 

建立規則

postgres=# create cast (text as timestamptz) with function cast_text_to_timestamp as ASSIGNMENT; 
CREATE CAST 
 
postgres=# \dC 
            List of casts 
   Source type   |   Target type   |   Function   | Implicit?  
-----------------------------+-----------------------------+---------------------------+--------------- 
 text      | timestamp with time zone | cast_text_to_timestamp | in assignment 

現在,輸入TEXT,就可以自定轉換為timestamptz瞭。

postgres=# create table tbl1234(id int, crt_time timestamptz); 
CREATE TABLE 
postgres=# insert into tbl1234 values (1, text '2017-01-01 10:10:10'); 
INSERT 0 1 

當然,這些類型實際上內部都有內部的存儲格式,大多數時候,如果存儲格式通用,就可以直接使用INOUT來轉換,不需要寫轉換函數。

僅僅當兩種類型在數據庫的內部存儲格式不一樣的時候,需要顯示的寫函數來轉換。

以上為個人經驗,希望能給大傢一個參考,也希望大傢多多支持WalkonNet。如有錯誤或未考慮完全的地方,望不吝賜教。

推薦閱讀: