使用cfgrib将数据保存为GRIB文件

This article was last updated on <span id="expire-date"></span> days ago, the information described in the article may be outdated.

最近想要将导师的WRF工具包里的一些功能使用Python重写，因为想要使用优雅的方式实现，所以不可避免的涉及到了GRIB文件的读写。花了两天时间摸清楚了如何将数据写入GRIB文件，期间还遇到一些比较奇怪的问题。

使用cfgrib将数据写入GRIB文件

GRIB文件的读取是比较简单的，使用cfgrib.open_dataset或者cfgrib.open_datasets就可以将GRIB文件中的grib message读取成xarray.Dataset的格式。

但是GRIB文件的写入比较麻烦，需要设置一大堆的参数。通过查看cfgrib读取的数据，可以看到attrs里面有众多以GRIB_开头的属性，这里以我下载的ERA5 surface数据为例。

import cfgrib as cf
from pprint import pprint

ds = cf.open_datasets("surface.grib")[-1]
pprint(ds.attrs)
pprint(ds["skt"].attrs)

{'Conventions': 'CF-1.7',
 'GRIB_centre': 'ecmf',
 'GRIB_centreDescription': 'European Centre for Medium-Range Weather Forecasts',
 'GRIB_edition': 1,
 'GRIB_subCentre': 0,
 'institution': 'European Centre for Medium-Range Weather Forecasts'}
{'GRIB_NV': 0,
 'GRIB_Nx': 321,
 'GRIB_Ny': 201,
 'GRIB_cfName': 'unknown',
 'GRIB_cfVarName': 'skt',
 'GRIB_dataType': 'an',
 'GRIB_gridDefinitionDescription': 'Latitude/Longitude Grid',
 'GRIB_gridType': 'regular_ll',
 'GRIB_iDirectionIncrementInDegrees': 0.25,
 'GRIB_iScansNegatively': 0,
 'GRIB_jDirectionIncrementInDegrees': 0.25,
 'GRIB_jPointsAreConsecutive': 0,
 'GRIB_jScansPositively': 0,
 'GRIB_latitudeOfFirstGridPointInDegrees': 60.0,
 'GRIB_latitudeOfLastGridPointInDegrees': 10.0,
 'GRIB_longitudeOfFirstGridPointInDegrees': 80.0,
 'GRIB_longitudeOfLastGridPointInDegrees': 160.0,
 'GRIB_missingValue': 3.4028234663852886e+38,
 'GRIB_name': 'Skin temperature',
 'GRIB_numberOfPoints': 64521,
 'GRIB_paramId': 235,
 'GRIB_shortName': 'skt',
 'GRIB_stepType': 'instant',
 'GRIB_stepUnits': 1,
 'GRIB_totalNumber': 0,
 'GRIB_typeOfLevel': 'surface',
 'GRIB_units': 'K',
 'GRIB_uvRelativeToGrid': 0,
 'long_name': 'Skin temperature',
 'standard_name': 'unknown',
 'units': 'K'}

这些GRIB_开头的属性即从GRIB文件中转换得到的参数，也是想把数据写成GRIB文件时所需要设置的参数。其中有一些参数是非常关键的：

GRIB_centre：GRIB文件只记录了编码后的信息，解码时需要对照码表文件将文件中的编码翻译为对应的信息，而不同的机构会有自己定义的码表。GRIB_centre这个参数即定义了编码和解码时使用哪个机构定义的码表。所有机构的定义可以在这里查阅。
GRIB_edition：GRIB文件有GRIB1和GRIB2两个版本，这两个版本间存在字段定义的区别，设置不同的版本会导致文件记录的信息不一致，这个问题稍后会提到。
GRIB_paramId：记录的变量对应的ID号，可以在这里查看ECMWF所有的ID定义。

也有一些参数是不用手动设置的：

GRIB_missingValue：无论你定义的值是多少，cfgrib软件包内部都会将这个数值重新设置

要想保证数据准确无误的写进GRIB文件，其他参数当然也是必不可少的，但是这里就不再展开说明了。

下面直接以一个示例来说明，如何将数据写入GRIB文件。

import cfgrib as cf
from cfgrib.xarray_to_grib import to_grib
import numpy as np
from xarray import DataArray, Dataset
from pandas import to_datetime


if __name__ == '__main__':
    # 生成假数据
    data = np.random.randint(280, 303, (2, 10, 10))

    # 定义一个DataArray变量，并顺便定义GRIB参数
    data_array = DataArray(
        data=data, dims=["time", "latitude", "longitude"],
        coords={
            "time": to_datetime(["2020-01-01 00:00", "2020-01-01 01:00"]),
            "latitude": np.arange(10, 0, -1),
            "longitude": np.arange(0, 100, 10)
        }, attrs={
            "units": "K",
            "long_name": "Sea surface temperature",
            "standard_name": "Sea surface temperature",
            # The following keys and values will be used in GRIB.
            "GRIB_paramId": 34,		# sst在ECMWF的GRIB码表中的ID是34
            "GRIB_shortName": "sst",	# shortName应确保是正确对应的
            "GRIB_units": "K",	# units应确保是正确对应的
            "GRIB_name": "Sea surface temperature",		# name应确保是正确对应的
            "GRIB_stepUnits": 1,
            "GRIB_stepType": "instant",
            "GRIB_gridType": "regular_ll",	# 投影方式，regular_ll代表等经纬度投影
            "GRIB_iDirectionIncrementInDegrees": 10,	# 经度方向格点间隔，这个值的正负和下面的iScanNegatively相关。
            "GRIB_iScanNegatively": 0,
            "GRIB_jDirectionIncrementInDegrees": 1,		# 纬度方向格点间隔，这个值的正负和下面的jScanPositively相关。
            "GRIB_jScanPositively": 0,
            "GRIB_latitudeOfFirstGridPointInDegrees": 10,	# 区域的经纬度起始点数值
            "GRIB_latitudeOfLastGridPointInDegrees": 1,		# 区域的经纬度起始点数值
            "GRIB_longitudeOfFirstGridPointInDegrees": 0,	# 区域的经纬度起始点数值
            "GRIB_longitudeOfLastGridPointInDegrees": 90,	# 区域的经纬度起始点数值
            "GRIB_Ny": 10,		# 纬度方向的格点数
            "GRIB_Nx": 10,		# 经度方向的格点数
            "GRIB_typeOfLevel": "surface",		# 变量的level类型
            # The following keys and values can't be found at ECMWF websites.
            "GRIB_cfName": "unknown",
            "GRIB_cfVarName": "sst",
            "GRIB_dataType": "an",  # Analysis data, defined at https://codes.ecmwf.int/grib/format/mars/type/
            "GRIB_gridDefinitionDescription": "Latitude/Longitude Grid",
            "GRIB_numberOfPoints": 100,		# 每一时刻数据场的总格点数，也就是 Nx * Ny
            "GRIB_totalNumber": 0,
            "GRIB_uvRelativeToGird": 0
        }
    )

    dataset = Dataset({
        "sst": data_array
    }, attrs={
        "GRIB_centre": "ecmf",		# 定义使用哪个机构的码表
        "GRIB_edition": 1			# 定义GRIB版本号
    })

    to_grib(dataset, "data/sst_test.grib")

    ds = cf.open_dataset("data/sst_test.grib")
    sst = ds["sst"]

    print(np.where(data_array.to_numpy() != sst.to_numpy()))
	print("")
    pprint(ds.attrs)
	print("")
    pprint(sst.attrs)

(array([], dtype=int64), array([], dtype=int64), array([], dtype=int64))

{'Conventions': 'CF-1.7',
 'GRIB_centre': 'ecmf',
 'GRIB_centreDescription': 'European Centre for Medium-Range Weather Forecasts',
 'GRIB_edition': 1,
 'GRIB_subCentre': 0,
 'history': '2024-09-23T21:21 GRIB to CDM+CF via '
            'cfgrib-0.9.14.1/ecCodes-2.37.0 with {"source": '
            '"data/sst_test.grib", "filter_by_keys": {}, "encode_cf": '
            '["parameter", "time", "geography", "vertical"]}',
 'institution': 'European Centre for Medium-Range Weather Forecasts'}
 
{'GRIB_NV': 0,
 'GRIB_Nx': 10,
 'GRIB_Ny': 10,
 'GRIB_cfName': 'unknown',
 'GRIB_cfVarName': 'sst',
 'GRIB_dataType': 'an',
 'GRIB_gridDefinitionDescription': 'Latitude/Longitude Grid',
 'GRIB_gridType': 'regular_ll',
 'GRIB_iDirectionIncrementInDegrees': 10.0,
 'GRIB_iScansNegatively': 0,
 'GRIB_jDirectionIncrementInDegrees': 1.0,
 'GRIB_jPointsAreConsecutive': 0,
 'GRIB_jScansPositively': 0,
 'GRIB_latitudeOfFirstGridPointInDegrees': 10.0,
 'GRIB_latitudeOfLastGridPointInDegrees': 1.0,
 'GRIB_longitudeOfFirstGridPointInDegrees': 0.0,
 'GRIB_longitudeOfLastGridPointInDegrees': 90.0,
 'GRIB_missingValue': 3.4028234663852886e+38,
 'GRIB_name': 'Sea surface temperature',
 'GRIB_numberOfPoints': 100,
 'GRIB_paramId': 34,
 'GRIB_shortName': 'sst',
 'GRIB_stepType': 'instant',
 'GRIB_stepUnits': 1,
 'GRIB_totalNumber': 0,
 'GRIB_typeOfLevel': 'surface',
 'GRIB_units': 'K',
 'GRIB_uvRelativeToGrid': 0,
 'long_name': 'Sea surface temperature',
 'standard_name': 'unknown',
 'units': 'K'}

numpy的输出表示读出的数据都是相同的，同时可以看到解码得到的属性值与我们设置的是相同的，同时cfgrib还为我们添加了一些对应属性的设置，例如GRIB_centreDescription，history，GRIB_missingValue等。

坑人的问题

前面说道，GRIB_edition设置不同的版本会导致文件记录的信息不一致，起初我还以为这是cfgrib或者eccodes的bug，但是后来才发现是我对GRIB理解不足导致的。直接来看代码。

还是使用上面的例子定义的data_array变量，只不过这次我们对dataset.attrs的设置有所不同。

dataset = Dataset({
    "sst": data_array
}, attrs={
    "GRIB_centre": "my_centre",		# 换成一个自定义的centre名称
})
    
dataset = Dataset({
    "sst": data_array		# 干脆不设置attrs
})

这两个dataset写入GRIB以后，读取出来的信息是不一样的。

一开始我是想自定义GRIB_centre的名称的，因为我并没有想到GRIB_centre会影响到文件的解码，所以了解了GRIB_centre的用途之后，自然也就想到了这有可能是GRIB_centre设置不正确影响的，但其实不是的。my_centre并不是一个有效的机构名称，cfgrib会自动进行纠正。

ecmf也是我下载的数据中GRIB_centre的设置，这说明并不是GRIB_centre导致的这个不同。这其实是GRIB_edition设置导致的，缺省情况下，cfgrib会以GRIB2的格式来写数据，而GRIB2和GRIB1的差异导致了我们写入和读取出的信息的不同，手动设置版本为1后，这个问题就可以解决了。当然为了规范期间，GRIB_centre还是设置一个有效的名称更好。