extend filesource to handle files with tetragon and kubescape extension #10

mebegu · 2025-06-03T15:58:52Z

No description provided.

ddelnano · 2025-06-03T17:43:19Z

src/stirling/source_connectors/file_source/file_source_connector.cc

+
+StatusOr<BackedDataElements> DataElementsFromKubescape() {
+  BackedDataElements data_elements(11);
+  data_elements.emplace_back("time_", "", types::DataType::STRING);


Is there a reason you didn't use a types::DataType::TIME64NS column? You might run into compatibility issues (PxL, etc) later since it differs from how all of Pixie's other tables are (ever data table has a time_ TIME64NS column).

In addition to this, types::DataType::TIME64NS is significantly more memory efficient than types::DataType::STRING. The former is a 64 bit value (fixed size) while the latter is a variable byte value (~ 1 byte for every character in the timestamp string).

Separately, have you tested this end to end? I ask because our in memory database requires a proper time column if I remember correctly.

I've seen issues in the past where a query never completes in these cases. When I experienced this, I think the data table in question didn't have any time_ column, so maybe it wouldn't manifest the same.

Ah, it was my mistake. I was trying to assign the time string in ISO format from the log. I overlooked time_ and time. I am gonna differentiate the naming.

ddelnano · 2025-06-03T17:46:07Z

src/stirling/source_connectors/file_source/file_source_connector.cc

+  // Fill each column — make sure your schema matches this order!
+  r.Append(0, types::StringValue(stringify(doc["time"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(1, types::StringValue(stringify(doc["level"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(2, types::StringValue(stringify(doc["RuleID"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(3, types::StringValue(stringify(doc["message"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(4, types::StringValue(stringify(doc["msg"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(5, types::StringValue(stringify(doc["event"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(6, types::StringValue(stringify(doc["BaseRuntimeMetadata"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(7, types::StringValue(stringify(doc["CloudMetadata"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(8, types::StringValue(stringify(doc["RuntimeK8sDetails"], doc.GetAllocator())), kMaxStringBytes);
+  r.Append(9, types::StringValue(stringify(doc["RuntimeProcessDetails"], doc.GetAllocator())), kMaxStringBytes);
+  sole::uuid u = sole::uuid4();
+  r.Append(10, types::UInt128Value(u.ab, u.cd));


This is fine to use, but seems brittle in the event that the schema changes (hard coded column offsets, column names hard coded, etc).

Is there a material difference between this and the FileSourceConnector::TransferDataFromJSON function? It seems to me if the data table schema is created properly, the ::TransferDataFromJSON function would behave the same (assuming the time_ column can be changed; see my comment from above).

Actually for kubescape, the current FileSource is valid to process. I separated it to file extension .kubescape with its own function to extend it in future. Like providing clickhouse write etc.

But if you recommend using just .json extension for kubescape and not implement it separately for now, I am okay with it 😄 I just wanted to explicitly show you what we are trying to do.

ddelnano · 2025-06-03T17:59:22Z

src/stirling/source_connectors/file_source/file_source_connector.cc

+  std::string timeStr = "empty";
+  if (doc.HasMember("time") && doc["time"].IsString()) {
+    timeStr = doc["time"].GetString();
+  }
+  r.Append(0, types::StringValue(timeStr), kMaxStringBytes);


In what situations will time be missing?

I think, in regular workflow, it wont be. I will remove default

ddelnano · 2025-06-03T18:00:30Z

src/stirling/source_connectors/file_source/file_source_connector.cc

+  // Extract "node_name"
+  std::string nodeNameStr = "empty";
+  if (doc.HasMember("node_name") && doc["node_name"].IsString()) {
+    nodeNameStr = doc["node_name"].GetString();
+  }
+  r.Append(1, types::StringValue(nodeNameStr), kMaxStringBytes);


I think we should consider how to extend the DynamicRecordBuilder to have default values for columns. It would be nice if the schema defined if a column had a default value and could handle this case without the TransferData function needing to specially handle specific columns.

I don't think this should block you from moving forward with this depending on the priorities, but I think it would be possible to enhance TransferDataFromJSON to handle this use case.

ddelnano · 2025-06-03T18:02:02Z

src/stirling/source_connectors/file_source/file_source_connector.cc

+    if (key != "time" && key != "node_name") {
+      typeStr = key;
+      payloadVal = &it->value;
+      break;


This feels dangerous to depend on the sort order of rapidjson's object iteration. Can you explain more details on identifying this type field and why it won't be known up front?

As far as I know, the tetragon always have 3 root keys. time, node_name, and the third one alternating in a fix list. process_exec, process_exit, process_kprobe, process_uprobe, process_tracepoint, process_loader.

I agree if there is a 4th property, it would be dangerous. Maybe I can check if key in [process_exec, process_exit, process_kprobe, process_uprobe, process_tracepoint, process_loader]

extend filesource to handle files with tetragon and kubescape extension

303cdce

mebegu self-assigned this Jun 3, 2025

ddelnano reviewed Jun 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extend filesource to handle files with tetragon and kubescape extension #10

extend filesource to handle files with tetragon and kubescape extension #10

mebegu commented Jun 3, 2025

Uh oh!

ddelnano Jun 3, 2025

Uh oh!

mebegu Jun 5, 2025

Uh oh!

ddelnano Jun 3, 2025

Uh oh!

mebegu Jun 5, 2025

Uh oh!

ddelnano Jun 3, 2025

Uh oh!

mebegu Jun 5, 2025

Uh oh!

ddelnano Jun 3, 2025

Uh oh!

mebegu Jun 4, 2025

Uh oh!

ddelnano Jun 3, 2025

Uh oh!

mebegu Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

extend filesource to handle files with tetragon and kubescape extension #10

Are you sure you want to change the base?

extend filesource to handle files with tetragon and kubescape extension #10

Conversation

mebegu commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants