如何用Django處理gzip數據流

Posted on 2021-01-30 by WalkonNet

最近在工作中遇到一個需求，就是要開一個接口來接收供應商推送的數據。項目采用的python的django框架，我是想也沒想，就直接一梭哈，寫出瞭如下代碼：

class XXDataPushView(APIView):
  """
  接收xx數據推送
  """
		# ...
  @white_list_required
  def post(self, request, **kwargs):
    req_data = request.data or {}
				# ...

但隨後，發現每日數據並沒有任何變化，質問供應商是否沒有做推送，在忽悠我們。然後對方給的答復是，他們推送的是gzip壓縮的數據流，接收端需要主動進行解壓。此前從沒有處理過這種壓縮的數據，對方具體如何做的推送對我來說也是一個黑盒。

因此，我要求對方給一個推送的簡單示例，沒想到對方不講武德，仍過來一段沒法單獨運行的java代碼：

private byte[] compress(JSONObject body) {
  try {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(out);
    gzip.write(body.toString().getBytes());
    gzip.close();
    return out.toByteArray();
  } catch (Exception e) {
    logger.error("Compress data failed with error: " + e.getMessage()).commit();
  }
  return JSON.toJSONString(body).getBytes();
}

public void post(JSONObject body, String url, FutureCallback<HttpResponse> callback) {
  RequestBuilder requestBuilder = RequestBuilder.post(url);
  requestBuilder.addHeader("Content-Type", "application/json; charset=UTF-8");
  requestBuilder.addHeader("Content-Encoding", "gzip");

  byte[] compressData = compress(body);

  int timeout = (int) Math.max(((float)compressData.length) / 5000000, 5000);

  RequestConfig.Builder requestConfigBuilder = RequestConfig.custom();
  requestConfigBuilder.setSocketTimeout(timeout).setConnectTimeout(timeout);

  requestBuilder.setEntity(new ByteArrayEntity(compressData));

  requestBuilder.setConfig(requestConfigBuilder.build());

  excuteRequest(requestBuilder, callback);
}

private void excuteRequest(RequestBuilder requestBuilder, FutureCallback<HttpResponse> callback) {
  HttpUriRequest request = requestBuilder.build();
  httpClient.execute(request, new FutureCallback<HttpResponse>() {
    @Override
    public void completed(HttpResponse httpResponse) {
      try {
        int responseCode = httpResponse.getStatusLine().getStatusCode();
        if (callback != null) {
          if (responseCode == 200) {
            callback.completed(httpResponse);
          } else {
            callback.failed(new Exception("Status code is not 200"));
          }
        }
      } catch (Exception e) {
        logger.error("Get error on " + requestBuilder.getMethod() + " " + requestBuilder.getUri() + ": " + e.getMessage()).commit();
        if (callback != null) {
          callback.failed(e);
        }
      }

      EntityUtils.consumeQuietly(httpResponse.getEntity());
    }

    @Override
    public void failed(Exception e) {
      logger.error("Get error on " + requestBuilder.getMethod() + " " + requestBuilder.getUri() + ": " + e.getMessage()).commit();
      if (callback != null) {
        callback.failed(e);
      }
    }

    @Override
    public void cancelled() {
      logger.error("Request cancelled on " + requestBuilder.getMethod() + " " + requestBuilder.getUri()).commit();
      if (callback != null) {
        callback.cancelled();
      }
    }
  });
}

從上述代碼可以看出，對方將json數據壓縮為瞭gzip數據流stream。於是搜索django的文檔，隻有這段關於gzip處理的裝飾器描述:

django.views.decorators.gzip 裡的裝飾器控制基於每個視圖的內容壓縮。

gzip_page()

如果瀏覽器允許 gzip 壓縮，那麼這個裝飾器將壓縮內容。它相應的設置瞭 Vary 頭部，這樣緩存將基於 Accept-Encoding 頭進行存儲。

但是，這個裝飾器隻是壓縮請求響應至瀏覽器的內容，我們目前的需求是解壓縮接收的數據。這不是我們想要的。

幸運的是，在flask中有一個擴展叫flask-inflate，安裝瞭此擴展會自動對請求來的數據做解壓操作。查看該擴展的具體代碼處理：

# flask_inflate.py
import gzip
from flask import request

GZIP_CONTENT_ENCODING = 'gzip'


class Inflate(object):
  def __init__(self, app=None):
    if app is not None:
      self.init_app(app)

  @staticmethod
  def init_app(app):
    app.before_request(_inflate_gzipped_content)


def inflate(func):
  """
  A decorator to inflate content of a single view function
  """
  def wrapper(*args, **kwargs):
    _inflate_gzipped_content()
    return func(*args, **kwargs)

  return wrapper


def _inflate_gzipped_content():
  content_encoding = getattr(request, 'content_encoding', None)

  if content_encoding != GZIP_CONTENT_ENCODING:
    return

  # We don't want to read the whole stream at this point.
  # Setting request.environ['wsgi.input'] to the gzipped stream is also not an option because
  # when the request is not chunked, flask's get_data will return a limited stream containing the gzip stream
  # and will limit the gzip stream to the compressed length. This is not good, as we want to read the
  # uncompressed stream, which is obviously longer.
  request.stream = gzip.GzipFile(fileobj=request.stream)

上述代碼的核心是:

 request.stream = gzip.GzipFile(fileobj=request.stream)

於是，在django中可以如下處理：

class XXDataPushView(APIView):
  """
  接收xx數據推送
  """
		# ...
  @white_list_required
  def post(self, request, **kwargs):
    content_encoding = request.META.get("HTTP_CONTENT_ENCODING", "")
    if content_encoding != "gzip":
      req_data = request.data or {}
    else:
      gzip_f = gzip.GzipFile(fileobj=request.stream)
      data = gzip_f.read().decode(encoding="utf-8")
      req_data = json.loads(data)
    # ... handle req_data

ok, 問題完美解決。還可以用如下方式測試請求：

import gzip
import requests
import json

data = {}

data = json.dumps(data).encode("utf-8")
data = gzip.compress(data)

resp = requests.post("http://localhost:8760/push_data/",data=data,headers={"Content-Encoding": "gzip", "Content-Type":"application/json;charset=utf-8"})

print(resp.json())

以上就是如何用Django處理gzip數據流的詳細內容，更多關於Django處理gzip數據流的資料請關註WalkonNet其它相關文章！

如何用Django處理gzip數據流

推薦閱讀：

發佈留言取消回覆

近期文章

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆