-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Using FluentD with Columnify.
Running on Kubernetes to push logs to S3 in parquet.
Issue's arising when trying to use avro schema with a nested map, or list of strings.
According to kubernetes metadata filter plugin's docs (https://github.com/ViaQ/fluent-plugin-kubernetes_metadata_input/blob/master/README.md#kubernetes-labels-and-annotations)
I believe that columnify would get a nested array of strings.
Got this schema to work so far, but still running into issues upstream with Athena.
{
"type": "record",
"name": "record",
"fields": [
{
"name": "message",
"type": "string"
},
{
"name": "logtag",
"type": "string"
},
{
"name": "stream",
"type": "string"
},
{
"name": "time",
"type": ["null", "string"]
},
{
"name": "docker",
"type": {
"type": "record",
"name": "docker",
"fields": [
{
"name": "container_id",
"type": "string"
}
]
}
},
{
"name": "kubernetes",
"type": {
"type": "record",
"name": "kubernetes",
"fields": [
{
"name": "container_name",
"type": "string"
},
{
"name": "host",
"type": ["null", "string"]
},
{
"name": "master_url",
"type": ["null", "string"]
},
{
"name": "namespace_name",
"type": ["null", "string"]
},
{
"name": "pod_id",
"type": ["null", "string"]
},
{
"name": "pod_name",
"type": ["null", "string"]
},
{
"name": "labels",
"type": {
"type": "array",
"items": {
"name": "label",
"type" : "record",
"fields": [ {
"type": ["null", "string"]
} ]
}
}
}
]
}
}
]
}
Specifically the issue with the labels part. I think this should work, instead of the record with array of record:
{
"name": "labels",
"type":{
"type": "array",
"items":{
"type":"list",
"values":"string"
}
}
}
Example data before fluentd filters:
{
"stream": "stdout",
"logtag": "F",
"message": " Tue Nov 22 23:51:12 UTC 2022 Found redis master (172.20.203.160)",
"time": 1669161072.283568,
"docker": {
"container_id": "29e32e64745530e7a1c5e9174f9e266e051707aec6a76d4556871532157a"
},
"kubernetes": {
"container_name": "split-brain-fix",
"namespace_name": "argocd",
"pod_name": "argocd-redis-ha-server-0",
"container_image": "docker.io/library/redis:6.2.6-alpine",
"container_image_id": "docker.io/library/redis@sha256:132337b9d7744ffee4fae83fde53c3530935ad3ba528b7110f2d805f55cbf5",
"pod_id": "ee5af2aa-14d8-446c-9755-",
"pod_ip": "10.64.124.43",
"host": "ip-10-64-116-85.us-west-2.compute.internal",
"labels": {
"app": "redis-ha",
"argocd-redis-ha": "replica",
"controller-revision-hash": "argocd-redis-ha-server-7cd67685d6",
"release": "argocd",
"statefulset_kubernetes_io/pod-name": "argocd-redis-ha-server-0"
},
"master_url": "https://172.20.0.1:443/api",
"namespace_id": "f3d1453d-d227-4c54-982a-457d5b99cc8b",
"namespace_labels": {
"app_kubernetes_io/managed-by": "Helm",
"kubernetes_io/metadata_name": "argocd"
}
},
"tag": "kubernetes.var.log.containers.argocd-redis-ha-server-0_argocd_split-brain-fix-29e32e64745530e7a171e08251707aec6a76d4556871532157a.log"
}
But getting this error:
2022-11-23 00:29:43 +0000 [warn]: #0 [out_s3] got unrecoverable error in primary and no secondary error_class=Fluent::UnrecoverableError error="failed to execute columnify command. stdout= stderr=panic: runtime error: index out of range [0] with length 0\n\ngoroutine 1 [running]:\n │
│ github.com/xitongsys/parquet-go/layout.PagesToChunk(0x10ea6d8, 0x0, 0x0, 0x20)\n\t/home/runner/go/pkg/mod/github.com/xitongsys/parquet-go@v1.5.2/layout/chunk.go:24 +0x90d\ngithub.com/xitongsys/parquet-go/writer.(*ParquetWriter).Flush(0xc00074fcc0, 0xc00010e001, 0x10, 0xa3abc0)\n\t/ │
│ home/runner/go/pkg/mod/github.com/xitongsys/parquet-go@v1.5.2/writer/writer.go:285 +0x3d5\ngithub.com/xitongsys/parquet-go/writer.(*ParquetWriter).WriteStop(0xc00074fcc0, 0x0, 0xc00010e050)\n\t/home/runner/go/pkg/mod/github.com/xitongsys/parquet-go@v1.5.2/writer/writer.go:120 +0x37 │
│ \ngithub.com/reproio/columnify/columnifier.(*parquetColumnifier).Close(0xc00000c6c0, 0xc00086fe18, 0x9d5cff)\n\t/home/runner/work/columnify/columnify/columnifier/parquet.go:122 +0x2e\nmain.columnify.func1(0xc2d760, 0xc00000c6c0, 0xc00086fec0)\n\t/home/runner/work/columnify/columnif │
│ y/cmd/columnify/columnify.go:24 +0x35\nmain.columnify(0xc2d760, 0xc00000c6c0, 0xc00013a0f0, 0x1, 0x1, 0x0, 0x0)\n\t/home/runner/work/columnify/columnify/cmd/columnify/columnify.go:36 +0xe2\nmain.main()\n\t/home/runner/work/columnify/columnify/cmd/columnify/columnify.go:71 +0x545\n │
│ status=#<Process::Status: pid 48 exit 2>" │
│ 2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-s3-1.7.2/lib/fluent/plugin/s3_compressor_parquet.rb:60:in `compress' │
│ 2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-s3-1.7.2/lib/fluent/plugin/out_s3.rb:352:in `write' │
│ 2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/output.rb:1180:in `try_flush' │
│ 2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/output.rb:1501:in `flush_thread_run' │
│ 2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/output.rb:501:in `block (2 levels) in start' │
│ 2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
Metadata
Metadata
Assignees
Labels
No labels