NetNomos: Logic Rule Mining for Network Data 📏

NetNomos is a scalable and expressive rule-mining framework for network data.

It mines logical formulas from datasets such as NetFlow records, PCAP traces, and pre-aggregated telemetry. Users define the input data, the predicate search space, and the rule-learning strategy through configuration files.

You provide:

a dataset schema, which tells NetNomos how to load, window, and annotate the data
a grammar, which defines the logic predicates NetNomos is allowed to construct
a rule learner, which combines predicates into candidate rules

The Python package is netnomos. The shorter CLI alias is netn.

Citation

@inproceedings{he2026netnomos,
  author    = {Hongyu H{\`e} and Minhao Jin and Maria Apostolaki},
  title     = {{Making Logic a First-Class Citizen in Network Data Generation with {ML}}},
  booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
  pages     = {801--824},
  year      = {2026},
}

1. Introduction

What NetNomos Does

NetNomos supports declarative rule mining over network datasets, including:

NetFlow records
PCAP traces
pre-aggregated telemetry

Small example datasets are provided under ./data/, including:

CIDDS NetFlow records
Netflix PCAP trace
MAWI PCAP trace
preprocessed datacenter logs from Meta [IMC '22]

The typical workflow is:

Describe the dataset using a schema JSON file.
Describe the allowed predicate space using a grammar JSON file.
Run netn learn ....
Inspect the generated predicates, learned rules, and semantic mappings in the run directory.

Key Concepts

Dataset Schema

The dataset schema defines how NetNomos interprets and preprocesses input data. It specifies:

where the data is loaded from
which fields are available and what types they have
semantic roles such as size, time, sequence, src, and dst
preprocessing steps such as filtering, mapping, casting, and hex parsing
context windows for packet-local or time-local reasoning
derived variables such as interarrival statistics

Grammar

The grammar defines the predicate space that NetNomos may explore. It specifies:

which fields may appear in predicates
which operators are allowed
how constants are selected
whether predicates are simple comparisons, scalar terms, addition terms, or quantified window predicates

Rule Learning

NetNomos currently provides two rule learners:

hitting-set: enumerates disjunctive rules from evidence sets, with both a native pybind11/C++ search core and a pure Python fallback backend
tree: learns implication-style rules using entropy-based decision trees

2. Installation & Setup

Requirements

Python >=3.10
uv for dependency management and command execution
a C++ toolchain if you want the native hitting-set backend built locally during uv sync

Setup

git clone <your-repo-url>
cd NetNomos
uv sync

Verify the CLI:

uv run netnomos --help
#* Or just use the alias:
uv run netn --help

uv sync builds the native hitting-set extension automatically. If the extension is unavailable, NetNomos falls back to the pure Python backend unless you explicitly request --hittingset-backend native.

Expected repository locations:

dataset specs: examples/datasets/
grammar specs: examples/grammars/
sample inputs: data/
learning outputs: runs/

3. CLI Usage

`netn --help`

Expand CLI help output

usage: netn [-h] [--log-level LOG_LEVEL] COMMAND ...

Inspect specs, prepare datasets, learn rules, validate rule sets, interpret saved artifacts, and run entailment queries.

positional arguments:
  COMMAND               Subcommand to run
    show-dataset        Print a dataset schema JSON file.
    show-grammar        Print a grammar JSON file.
    prepare             Load and materialize a dataset.
    learn               Generate predicates and learn rules.
    validate            Validate a learned or saved rule set against data.
    interpret           Render rules into human-readable formulas.
    entails             Check whether a query is entailed by a rule set.

options:
  -h, --help            show this help message and exit
  --log-level LOG_LEVEL
                        Logging verbosity for diagnostic messages written to
                        stderr. (default: INFO)

Examples:
  netn learn --dataset-spec examples/datasets/cidds.json --grammar-spec examples/grammars/network_flow.json --input data/cidds_wk2_normal_10k.csv
  netn learn --dataset-spec examples/datasets/pcap_tcp.json --grammar-spec examples/grammars/pcap_window.json --input data/netflix.pcap
  netn entails --dataset-spec examples/datasets/cidds.json --grammar-spec examples/grammars/network_flow.json --rules runs/<run>/rules.json --query "Packets * 65535 >= Bytes"

Subcommands

Expand subcommand reference

Command	Purpose	Typical output
`show-dataset`	Print a dataset schema JSON file after model validation.	Schema JSON on stdout
`show-grammar`	Print a grammar JSON file after model validation.	Grammar JSON on stdout
`prepare`	Load data, apply preprocessing, build context windows, and derived variables.	Prepared schema summary JSON
`learn`	Generate predicates and learn rules.	Run summary JSON and saved artifacts
`validate`	Validate saved or freshly learned rules against data.	Satisfaction statistics JSON
`interpret`	Render rules into readable formulas.	One interpreted rule per line
`entails`	Ask whether a formula satisfies learned theory.	`{"entailed": true/false}`

Flag reference

Expand flag reference

Global flags

Flag	Commands	Default	Purpose	Example
`--log-level`	all	`INFO`	Controls stderr diagnostics from dataset loading, learning, warnings, and early stopping.	`uv run netn --log-level DEBUG prepare ...`

Dataset and grammar selection

Flag	Commands	Default	Purpose	Example
`--dataset-spec`	all except `show-grammar`	required	Path to a dataset schema JSON file.	`--dataset-spec examples/datasets/cidds.json`
`--grammar-spec`	`show-grammar`, `learn`, `validate`, `interpret`, `entails`	required	Path to a grammar JSON file.	`--grammar-spec examples/grammars/network_flow.json`
`--input`	`prepare`, `learn`, `validate`, `interpret`, `entails`	schema default	Overrides `source.path` from the dataset spec.	`--input data/netflix.pcap`
`--limit`	`prepare`, `learn`, `validate`, `interpret`, `entails`	`None`	Loads only the first N raw rows or packets before preprocessing. Useful for smoke tests.	`--limit 200`

Learning and artifact control

Flag	Commands	Default	Purpose	Example
`--learner`	`learn`, `validate`, `interpret`, `entails`	`hitting-set`	Selects the learning backend when the command needs to learn rules.	`--learner tree`
`--stall-timeout`	`learn`, `validate`, `interpret`, `entails`	`None`	Stops the hitting-set learner after this many seconds without a new rule. Partial results are returned and saved. Ignored by `tree`.	`--stall-timeout 60`
`--hittingset-backend`	`learn`, `validate`, `interpret`, `entails`	`auto`	Selects the hitting-set implementation: `native` uses the C++ core, `python` keeps the original Python search, and `auto` prefers `native` when available. Ignored by `tree`.	`--hittingset-backend native`
`--runs-dir`	`learn`, `validate`, `interpret`, `entails`	`runs`	Directory where learned artifacts and cache metadata are stored.	`--runs-dir tmp/runs`
`--rules`	`validate`, `interpret`, `entails`	`None`	Uses an existing `rules.json` artifact instead of learning a fresh rule set.	`--rules runs/<run>/rules.json`
`--query`	`entails`	required	Formula string to check against a theory.	`--query "Packets * 65535 >= Bytes"`

Example commands

Inspect specs

Expand inspect-spec commands

uv run netn show-dataset --dataset-spec examples/datasets/pcap_tcp.json
uv run netn show-grammar --grammar-spec examples/grammars/pcap_window.json

Prepare data

Expand prepare command

uv run netn prepare \
  --dataset-spec examples/datasets/pcap_tcp.json \
  --input data/netflix.pcap \
  --limit 10

Learn rules for the shipped datasets

Expand learning recipes

CIDDS flow records:

uv run netn learn \
  --dataset-spec examples/datasets/cidds.json \
  --grammar-spec examples/grammars/network_flow.json \
  --input data/cidds_wk2_normal_10k.csv

Netflix PCAP:

uv run netn learn \
  --dataset-spec examples/datasets/pcap_tcp.json \
  --grammar-spec examples/grammars/pcap_window.json \
  --input data/netflix.pcap

MAWI PCAP:

uv run netn learn \
  --dataset-spec examples/datasets/pcap_tcp.json \
  --grammar-spec examples/grammars/pcap_window.json \
  --input data/mawi_2025july19_tcp100k.pcap

MetaDC aggregated statistics:

uv run netn learn \
  --dataset-spec examples/datasets/metadc.json \
  --grammar-spec examples/grammars/metadc_agg.json \
  --input data/metadc_test_10racks_5ctx.csv

Example of stall-aware early stopping:

uv run netn learn \
  --dataset-spec examples/datasets/cidds.json \
  --grammar-spec examples/grammars/network_flow.json \
  --input data/cidds_wk2_normal_10k.csv \
  --limit 200 \
  --stall-timeout 60

If the search stalls, NetNomos:

logs a warning on stderr
returns the rules found so far
records search_stopped_early, stop_reason, and stall_timeout_seconds in fit_metadata

Force the original Python search backend:

uv run netn learn \
  --dataset-spec examples/datasets/cidds.json \
  --grammar-spec examples/grammars/network_flow.json \
  --input data/cidds_wk2_normal_10k.csv \
  --hittingset-backend python

Validate, interpret, and query saved artifacts

Expand artifact workflows

uv run netn validate \
  --dataset-spec examples/datasets/cidds.json \
  --grammar-spec examples/grammars/network_flow.json \
  --input data/cidds_wk2_normal_10k.csv \
  --rules runs/<run>/rules.json

uv run netn interpret \
  --dataset-spec examples/datasets/cidds.json \
  --grammar-spec examples/grammars/network_flow.json \
  --input data/cidds_wk2_normal_10k.csv \
  --rules runs/<run>/rules.json

uv run netn entails \
  --dataset-spec examples/datasets/cidds.json \
  --grammar-spec examples/grammars/network_flow.json \
  --input data/cidds_wk2_normal_10k.csv \
  --rules runs/<run>/rules.json \
  --query "Packets * 65535 >= Bytes"

Where outputs go

Every learning run creates a directory under runs/:

runs/<timestamp>_<dataset-name>_<grammar-name>/

Important files in each run directory:

Expand run artifact listing

File	Meaning
`dataset_spec.json`	Resolved dataset spec used for the run
`grammar_spec.json`	Resolved grammar spec used for the run
`fields.json`	Prepared field metadata after windows and derived variables
`derived_variables.json`	Derived-variable definitions and provenance
`configured_exclude_fields.json`	Columns removed by the dataset spec's `exclude_fields` denylist
`excluded_fields.json`	Columns auto-removed because of `NaN` or empty values
`manifest.json`	Run summary including dataset, grammar, source type, rule counts, and `fit_metadata`
`predicates.jsonl`	Raw generated predicates with AST and provenance
`interpreted_predicates.clj`	Human-readable predicate forms using semantic labels such as `p50` and `top1`
`rules.json`	Raw learned rules with AST, provenance, and support
`interpreted_rules.clj`	Human-readable rule forms
`semantic_values.json`	Mapping from semantic labels to raw values for reproducibility

Evidence caches are stored separately under:

runs/.cache/evidence/

4. Dataset Schema Specification

Dataset schema files are DatasetSpec JSON documents. They define how NetNomos should interpret raw input data before predicate generation.

Top-level schema fields

Expand top-level dataset schema fields

Field	Meaning	Valid values	Effect
`name`	Logical dataset name.	any string	Used in run directory names and manifests
`description`	Human-readable summary.	any string	Documentation only
`source`	Input source description.	`SourceSpec` object	Chooses CSV or PCAP loading
`fields`	Explicit field metadata.	list of `FieldSpec`	Controls types, roles, constants, and enum labels
`include_fields`	Allowlist after preprocessing.	list of field names	Keeps only the listed columns
`exclude_fields`	Denylist after preprocessing.	list of field names	Removes columns after `include_fields`
`entity_keys`	Entity-level metadata.	list of field names	Currently metadata only
`grouping_keys`	Group-level metadata.	list of field names	Currently metadata only
`ordering_keys`	Preferred ordering metadata.	list of field names	Documentation and config consistency
`preprocessing`	Ordered source transformations.	list of `PreprocessStepSpec`	Rewrites, filters, maps, and casts raw input
`context_window`	Sliding-window specification.	`ContextWindowSpec` or `null`	Builds `_ctx0`, `_ctx1`, ... columns
`derived_variables`	Derived columns computed after loading or windowing.	list of `DerivedVariableSpec`	Adds new fields such as interarrival statistics

`source`

source is a SourceSpec object:

Expand source fields

Field	Meaning	Valid values	Effect
`type`	Physical input format.	`auto`, `csv`, `pcap`	Chooses the loader
`path`	Default input path.	string or `null`	Used when `--input` is omitted
`csv_read_options`	Extra options passed to `pandas.read_csv`.	JSON object	CSV-specific loading tweaks

source.type = "auto" is the recommended setting when the same logical schema should handle both raw PCAPs and legacy CSV exports. NetNomos infers the loader from the file suffix:

.csv -> CSV loader
.pcap, .pcapng, .cap -> PCAP loader

`fields`

Each entry in fields is a FieldSpec object.

Expand FieldSpec fields

Field	Meaning	Valid values	Effect
`name`	Canonical field name used everywhere else.	string	Referenced by grammars and artifacts
`source_name`	Original column name before renaming.	string or `null`	Lets a schema normalize raw source names
`value_type`	Storage and comparison type.	`integer`, `real`, `categorical`, `boolean`, `string`	Controls predicate generation, constant profiling, and solver lowering
`roles`	Semantic tags.	list of role names	Restricts selectors and numeric compatibility
`bounds`	Optional numeric range metadata.	`{lower, upper}` or `null`	Informational today, available for future checks
`domain`	Explicit categorical domain.	list or `null`	Used by `domain` constant selection and solver typing
`constants`	Field-specific reusable constants.	list of `FieldConstantSpec`	Drives `field_constants` selectors and arithmetic terms
`enum_labels`	Mapping from raw values to human-readable labels.	object	Used in interpreted predicates and rules
`context_family`	Base field family for windowed columns.	string or `null`	Usually auto-filled after windowing
`context_index`	Position inside the context window.	integer or `null`	Usually auto-filled after windowing

`value_type`

Use value_type to tell NetNomos how a field should behave:

Expand value_type reference

Value	Use for	Notes
`integer`	counters, sizes, IDs encoded as integers, timestamps encoded as integers	Can participate in numeric predicates
`real`	durations, rates, floating-point measurements	Can participate in numeric predicates
`categorical`	enums such as protocol classes, mapped subnets, port classes	Treated symbolically
`boolean`	true/false flags	Generates equality predicates
`string`	raw text or high-cardinality identifiers	Treated symbolically

`roles`

Roles connect dataset meaning to grammar selectors.

Expand supported roles and effects

Supported roles in the current schema model:

src
dst
proto
time
sequence
measurement
identifier
window
count
size
flag
derived

Roles matter because they affect:

selector matching in grammars
which numeric fields are considered comparable
whether arithmetic predicates are allowed between two fields

Example:

Bytes and MTU should both carry the size role if you want Bytes + Header <= MTU
Duration should carry time, which prevents meaningless predicates like Bytes <= Duration

`constants`

constants is a list of FieldConstantSpec objects:

Expand constant kinds and example

`kind`	Meaning	Typical use
`assignment`	Symbolic equality constants	mapped subnets, mapped ports, class IDs
`limit`	Numeric thresholds	zero payload, MTU-like bounds
`scalar`	Multipliers for `SCALAR` predicates	`Packets * 65535 <= Bytes`
`addition`	Additive offsets for `ADDITION` predicates	`tcp.seq + 1 = tcp.seq_next`

Example:

{
  "name": "Packets",
  "value_type": "integer",
  "roles": ["count"],
  "constants": [
    {
      "kind": "scalar",
      "values": [65535],
      "description": "Maximum payload size per packet"
    }
  ]
}

`enum_labels`

enum_labels maps raw values to readable names in interpreted outputs.

Expand enum_labels example

Example:

{
  "name": "SrcPortClass",
  "value_type": "integer",
  "constants": [
    {"kind": "assignment", "values": [80, 443, 70000, 71000, 72000]}
  ],
  "enum_labels": {
    "80": "http",
    "443": "https",
    "70000": "well_known",
    "71000": "registered",
    "72000": "dynamic"
  }
}

Variable selection

include_fields and exclude_fields operate after preprocessing.

Expand variable selection behavior

include_fields keeps only the listed columns
exclude_fields removes columns from the post-include set

After field selection, NetNomos automatically removes selected columns that still contain NaN or empty values. It:

logs a warning on stderr
records denylist-driven removals in configured_exclude_fields.json
records incomplete-column removals in excluded_fields.json
records the same information in manifest.json

Preprocessing

Each preprocessing step is a PreprocessStepSpec.

Expand preprocessing steps and fields

Supported kind values:

Kind	Meaning
`rename`	Rename columns
`drop`	Drop columns
`cast`	Cast columns to a dtype such as `int64`
`parse_hex`	Parse hexadecimal strings like `0x0012` into integers
`fillna`	Replace missing values
`map_values`	Value-to-value lookup mapping
`map_rules`	Rule-based mapping using equality, ranges, prefixes, regexes, and defaults
`filter_equals`	Keep rows where a column equals a value
`filter_in`	Keep rows where a column belongs to a set
`filter_present`	Keep rows where a column is present and non-empty
`sort`	Sort rows by one or more columns

Important PreprocessStepSpec fields:

Field	Meaning
`columns`	Source columns affected by the step
`target_column`	Output column name for mapping steps
`mapping`	Explicit lookup table for `map_values`
`rules`	Ordered rule list for `map_rules`
`value`	Filter or fill value
`dtype`	Target dtype for `cast`
`by`	Sort key override for `sort`

Mapping rules support:

equals
in
range
prefix
regex
default

Context windows

context_window is a ContextWindowSpec:

Expand context window fields and example

Field	Meaning	Effect
`size`	Number of rows or packets per window	creates `_ctx0 ... _ctxN` columns
`stride`	Step size between windows	controls overlap
`partition_by`	Group keys	prevents windows from crossing entity boundaries
`order_by`	Sort order inside each partition	ensures stable window order
`column_template`	Output naming template	defaults to `{name}_ctx{index}`

Example:

{
  "size": 3,
  "stride": 1,
  "partition_by": ["ip.src", "ip.dst", "tcp.srcport", "tcp.dstport"],
  "order_by": ["frame.time_epoch", "frame.number"],
  "column_template": "{name}_ctx{index}"
}

Derived variables

Each derived_variables entry is a DerivedVariableSpec.

Expand derived-variable fields and operations

Field	Meaning
`name`	Output column name
`operation`	Derived operation to apply
`inputs`	Source fields consumed by the operation
`value_type`	Output type
`roles`	Semantic tags for the derived field
`literal`	Reserved for literal-driven derivations
`numerator`	Explicit numerator for `ratio`
`denominator`	Explicit denominator for `ratio`
`description`	Free-text metadata

Supported operation values:

copy
sum
min
max
avg
std
diff
ratio
count_nonzero
exists
forall

Example from the shared PCAP schema:

{
  "name": "interarrival_std",
  "operation": "std",
  "inputs": ["interarrival_01", "interarrival_12"],
  "value_type": "real",
  "roles": ["time", "measurement", "derived"]
}

5. Grammar Specification

Grammar files are GrammarSpec JSON documents. They define the search space of predicates and quantifier projections.

Top-level grammar fields

Expand top-level grammar fields

Field	Meaning	Effect
`name`	Grammar name	Used in run directory names and manifests
`description`	Human-readable summary	Documentation only
`max_clause_size`	Maximum disjunct size for the hitting-set learner	Limits rule complexity
`max_rules`	Maximum number of rules to keep before pruning	Caps search size
`predicate_templates`	Allowed predicate-generation patterns	Builds propositional candidates
`quantifier_templates`	Allowed quantified window patterns	Builds projected quantifier predicates

The hitting-set grammar limits apply to both backends. The native backend only replaces the core enumeration step; evidence construction, rule assembly, interpretation, and artifact writing remain in Python.

Operators

Supported comparison operators: =, !=, >, >=, <, <=

These operators are accepted both in grammar files and in formula strings passed to entails.

Variable selectors

Selectors are used in lhs, rhs_field, term templates, and quantifier templates.

VariableSelectorSpec fields:

Expand variable selector fields

Field	Meaning	Interaction with dataset schema
`names`	Explicit field allowlist	Matches exact `FieldSpec.name` values
`regex`	Regex-based allowlist	Matches field names after preprocessing/windowing
`types`	Allowed value types	Matches `FieldSpec.value_type`
`roles`	Required semantic roles	Matches `FieldSpec.roles`
`derived_only`	Restrict to derived or non-derived fields	Matches the `derived` role
`context_family`	Restrict to a window family such as `tcp.seq`	Matches `FieldSpec.context_family`
`window_only`	Restrict to context-window columns	Matches fields with a `context_family`
`exclude`	Explicit denylist	Removes fields after the positive filters

Example:

{
  "roles": ["size"],
  "window_only": true
}

This selects windowed fields that are tagged as size, such as frame.len_ctx0 or tcp.len_ctx2.

Constant selectors

ConstantSelectorSpec controls where constants come from.

Expand constant selector modes

Field	Meaning	Valid values
`mode`	Constant source	`explicit`, `domain`, `profile`, `field_constants`
`values`	Explicit values for `explicit`	list
`kinds`	Constant kinds for `field_constants`	any subset of `assignment`, `limit`, `scalar`, `addition`
`top_k`	Number of categorical values for `profile`	integer
`quantiles`	Numeric quantiles for `profile`	list of floats in `[0, 1]`

Mode behavior:

Mode	Meaning
`explicit`	Use `values` exactly as written
`domain`	Use explicit field domains or observed categorical values
`profile`	Use numeric quantiles or categorical top-k values from the prepared data
`field_constants`	Reuse constants from the dataset schema

profile mode also drives semantic labels:

numeric constants are labeled p25, p50, p75, p90, ...
categorical profile constants are labeled top1, top2, ...

Those labels appear in:

interpreted_predicates.clj
interpreted_rules.clj
semantic_values.json

Term templates

lhs_term and rhs_term are TermTemplateSpec objects.

Expand term template kinds and fields

Supported kind values:

Kind	Meaning	Example
`field`	Plain field reference	`Bytes`
`constant`	Plain literal term	`1500`
`scalar`	Field multiplied by a constant	`Packets * 65535`
`addition`	Field plus another field or constant	`Bytes + Header`, `tcp.seq + 1`

TermTemplateSpec fields:

Field	Meaning
`kind`	Term shape
`field`	Primary field selector
`other_field`	Secondary field selector for `addition`
`constant`	Constant selector for `constant`, `scalar`, or `addition`
`allow_same_field`	Allows `X + X` or `X op X` when meaningful
`description`	Free-text metadata

Predicate templates

Each predicate_templates entry is a PredicateTemplateSpec.

Expand predicate template fields

Field	Meaning	Notes
`name`	Template name	Appears in predicate provenance
`lhs`	Left-hand field selector	Used in simple field-field or field-constant predicates
`operators`	Allowed comparators	Must be a non-empty list
`rhs_field`	Right-hand field selector	Use for variable-variable predicates
`rhs_constant`	Right-hand constant selector	Use for variable-constant predicates
`lhs_term`	Left-hand term template	Use for arithmetic predicates
`rhs_term`	Right-hand term template	Use for arithmetic predicates
`allow_same_field`	Allows comparisons like `X <= X` when desired	Defaults to `false`
`description`	Free-text metadata	Optional

Valid shapes:

lhs + rhs_field
lhs + rhs_constant
lhs_term + rhs_term
lhs_term + legacy rhs_field or rhs_constant via compatibility conversion

Variable-variable example

Expand variable-variable predicate example

{
  "name": "numeric-pairs",
  "lhs": {"roles": ["size"]},
  "operators": ["<=", ">="],
  "rhs_field": {"roles": ["size"]}
}

Possible generated predicates:

Bytes <= MTU
frame.len_ctx0 >= tcp.len_ctx1

Variable-constant example

Expand variable-constant predicate example

{
  "name": "zero-payload",
  "lhs": {
    "names": ["tcp.len_ctx0", "tcp.len_ctx1", "tcp.len_ctx2"]
  },
  "operators": ["=", "!="],
  "rhs_constant": {
    "mode": "field_constants",
    "kinds": ["limit"]
  }
}

Possible generated predicates:

tcp.len_ctx0 = 0
tcp.len_ctx2 != 0

`SCALAR` example

Expand SCALAR predicate example

{
  "name": "packet-capacity",
  "lhs_term": {
    "kind": "scalar",
    "field": {"names": ["Packets"]},
    "constant": {
      "mode": "field_constants",
      "kinds": ["scalar"]
    }
  },
  "operators": ["<=", ">="],
  "rhs_term": {
    "kind": "field",
    "field": {"names": ["Bytes"]}
  }
}

Possible generated predicates:

Packets * 65535 <= Bytes
Packets * 65535 >= Bytes

`ADDITION` examples

Expand ADDITION predicate examples

Field plus field:

{
  "name": "frame-budget",
  "lhs_term": {
    "kind": "addition",
    "field": {"names": ["Bytes"]},
    "other_field": {"names": ["Header"]}
  },
  "operators": ["<="],
  "rhs_term": {
    "kind": "field",
    "field": {"names": ["MTU"]}
  }
}

Field plus constant:

{
  "name": "seq-offset",
  "lhs_term": {
    "kind": "addition",
    "field": {
      "context_family": "tcp.seq",
      "window_only": true
    },
    "constant": {
      "mode": "field_constants",
      "kinds": ["addition"]
    }
  },
  "operators": ["="],
  "rhs_term": {
    "kind": "field",
    "field": {
      "context_family": "tcp.seq",
      "window_only": true
    }
  }
}

Possible generated predicates:

Bytes + Header <= MTU
tcp.seq_ctx0 + 1 = tcp.seq_ctx1

Quantifier templates

Each quantifier_templates entry is a QuantifierTemplateSpec.

Expand quantifier template fields and example

Field	Meaning	Notes
`name`	Template name	Appears in predicate provenance
`quantifier`	Quantifier kind	`forall` or `exists`
`selector`	Context-family selector	Usually points at windowed numeric families
`operators`	Allowed comparators	Same operator set as predicate templates
`constant`	Constant selector	Often `profile`
`aggregator_projection`	Optional projection hint	Accepted by the schema; current v1 lowering derives the projection automatically
`description`	Free-text metadata	Optional

NetNomos projects monotone quantified window predicates into finite predicates:

forall X[k] >= c -> min(X_*) >= c
exists X[k] >= c -> max(X_*) >= c
equality and inequality forms fall back to finite conjunction or disjunction

Example:

{
  "name": "payload-forall",
  "quantifier": "forall",
  "selector": {
    "context_family": "tcp.len",
    "window_only": true
  },
  "operators": [">=", "<="],
  "constant": {
    "mode": "profile",
    "quantiles": [0.25, 0.5, 0.75]
  }
}

Possible generated predicate:

min(tcp.len_ctx0, tcp.len_ctx1, tcp.len_ctx2) >= p50

How grammars interact with dataset schemas

The grammar does not operate on raw files directly. It operates on the prepared schema. That means:

selectors see post-rename, post-preprocessing field names
window_only only works if the dataset defines context_window
context_family only works on windowed fields
field_constants only works when the dataset schema declares matching constants
role-based numeric comparisons only work if the dataset fields are tagged consistently
profiled constants are computed from the prepared dataset, not from the raw source

Example rule shapes

A learned rule is built from generated predicates. Examples you should expect in artifacts:

Bytes > Mtu
Packets * 65535 >= Bytes
frame.len_ctx0 <= p50 or tcp.len_ctx2 = 0
(tcp.seq_ctx0 + 1) = tcp.seq_ctx1

Interpreted artifacts use:

enum_labels for categorical readability
semantic_values.json for profiled constants such as p50 and top1

Shipped example specs

CIDDS: examples/datasets/cidds.json + examples/grammars/network_flow.json
Netflix PCAP: examples/datasets/pcap_tcp.json + examples/grammars/pcap_window.json
MAWI PCAP: examples/datasets/pcap_tcp.json + examples/grammars/pcap_window.json
MetaDC: examples/datasets/metadc.json + examples/grammars/metadc_agg.json

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cpp		cpp
data		data
examples		examples
netnomos		netnomos
rules		rules
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

NetNomos: Logic Rule Mining for Network Data 📏

Citation

1. Introduction

What NetNomos Does

Key Concepts

Dataset Schema

Grammar

Rule Learning

2. Installation & Setup

Requirements

Setup

3. CLI Usage

netn --help

Subcommands

Flag reference

Global flags

Dataset and grammar selection

Learning and artifact control

Example commands

Inspect specs

Prepare data

Learn rules for the shipped datasets

Validate, interpret, and query saved artifacts

Where outputs go

4. Dataset Schema Specification

Top-level schema fields

source

fields

value_type

roles

constants

enum_labels

Variable selection

Preprocessing

Context windows

Derived variables

5. Grammar Specification

Top-level grammar fields

Operators

Variable selectors

Constant selectors

Term templates

Predicate templates

Variable-variable example

Variable-constant example

SCALAR example

ADDITION examples

Quantifier templates

How grammars interact with dataset schemas

Example rule shapes

Shipped example specs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`netn --help`

`source`

`fields`

`value_type`

`roles`

`constants`

`enum_labels`

`SCALAR` example

`ADDITION` examples