Extending Splunk Stream Vocabularies using IPFIX

5 minute read

Splunk Stream, NetFlow and IPFIX

One of my favorite tools in my Splunk arsenal Splunk Stream.  Splunk Stream allows you to capture and analyze network traffic, and then index that data in Splunk. It works great for analyzing DNS, email, DHCP and more. But what if you have your own types of traffic that you want to capture?. After a recent Splunk Application Development engagement for a customer, we wanted to share what we learned. For this post we are going to focus on extending the vocabulary and receive custom IPFIX and NetFlow elements.

What are NetFlow and IPFIX?

NetFlow was developed by Cisco for collecting records of network traffic statistics from network equipment. NetFlow collectors collect these exported records for searching and analyzation. These record typically will contain information about the start of a network connection, how long it lasted, source IP and port, destination IP and port, and the amount of data transferred. There are different versions of NetFlow, and Stream supports a few of them.

IPFIX (IP Flow Information Export) is very similar to NetFlow, but is newer and is actually developed by a working group at the Internet Engineering Task Force (IETF) which helped create a more open standard for network flow data collection. IPFIX flows are made up of elements which have been standardized by the Internet Assigned Numbers Authority (IANA). More information on these elements can be found in the IANA Documentation. IANA assigns each element an Enterprise ID, and the Enterprise ID is assigned to a specific company. For this post, we will use a made-up element to illustrative purposes. The full list can be found here. Individual enterprises define the element id, name, and description for their elements.

Before we jump in

This is advanced configuration technique, and it is suggested to contact Professional Services if you run into trouble.

Preparing

First assemble the elements to be added into Stream, and build a table to represent the data that we are adding. This row describes our IANA element.  Splunk Stream uses the element term id internally, so it is good to document what we are configuring. Once we have these items, we can extend Stream.

IANA Enterprise ID Element ID Element Name Element Term ID Element Description
50198 100 parsecs aplura.parsecs The number of parsecs required to navigate a specified navigational route.

We assume that Splunk Stream is installed. If you need help with that, you should check out Splunk’s excellent documentation.

Once Stream is installed, navigate to the command line to continue adding the extension. We are writing for Linux, but you can extend Windows the same way.

streamfwd.conf

The initial setup is to create/modify the streamfwd.conf. This file is found in splunk_app_stream/local and Splunk_TA_stream/local. If it doesn’t exist, create and edit it in both locations. The first step is to listen for NetFlow data. Below we have provided a sample configuration. Each of these settings falls under the [streamfwd] stanza.

netflowReceiver.0.ip = ##IP
netflowReceiver.0.port = ##PORT
netflowReceiver.0.protocol = udp
netflowReceiver.0.decoder = netflow

Obviously, replace ##IP with the IP of the interface to listen on, and ##PORT with the port to listen on. This will allow your Splunk Stream instance to listen for NetFlow data.

Next, in the same file, we will add in the IANA element.

netflowElement.0.enterpriseid = 50198
netflowElement.0.id = 100
netflowElement.0.termid = aplura.parsecs

We use the values from the table to populate these configuration items. When you add additional terms, you will increment the number (0, in this case) by one. As a best practice, don’t skip numbers.

NOTE: When extending Stream to collect non-IPFIX proprietary elements with Netflow v9 (using technologies such as Cisco HSEL/NSEL), you must omit the netflowElement.X.enterpriseid option for the elements specific to that technology. Refer to the technology documentation for specific information.

UPDATED NOTE: If using version 7.1.2 of the Stream binaries, you can additionally add netflowElement.X.termtype = ipaddress to the configuration. This is used if the value of that term will be an IP address.

aplura.xml

Stream uses a set of vocabularies to define the elements. Each technology can be within its own xml file. In our case, we are using aplura.xml but it is best to name the file according to the technology you are configuring. For consistency, we are placing this file (aplura.xml) in both splunk_app_stream/default/vocabularies and Splunk_TA_stream/default/vocabularies. These are located in default, so on an upgrade of Stream, they will be deleted, so you should keep backup copies for after upgrade.

<?xml version="1.0" encoding="UTF-8"?>
<CmConfig xmlns="http://purl.org/cloudmeter/config" version="7.1.1">
  <Vocabulary id="aplura">
    <Locked>true</Locked>
    <Name>Aplura Netflow Protocol Vocabulary</Name>
     <Term id="aplura.parsecs">
        <Type>uint64</Type>
        <Comment>The number of parsecs required to navigate a specified navigational route.</Comment>
      </Term>
   </Vocabulary>
</CmConfig>

There are a few points to consider here:

  1. The id of the xml tag Vocabulary should match the name of the file.

  2. The version attribute of the CmConfig tag needs to match the Splunk Stream version installed.

  3. The Type tag must consist of one of the following supported data types: uint8, uint16, uint32, uint64, shortstring, string, longstring, blob.

  4. All elements sending via this protocol should be encoded with UTF-8, otherwise they will not be decoded correctly.

netflow

You will find the netflow file in splunk_app_stream/default/streams. This is a default stream file, and you will need to edit it. Upgrading Stream, will overwrite this file, so keep a backup copy. We are showing only the JSON object that we are adding to the file. The netflow file is in JSON, and the element with the object is called fields. It is an array, and we will add a new element to it. Since this is JSON, watch your commas.

{
"aggType": "value",
"desc": "The number of parsecs required to navigate a specified navigational route.",
"enabled": true,
"name": "parsecs",
"term": "aplura.parsecs"
}

Finishing Up

Again, some points to consider. The term element is the term id as we specified in the table above. The name element is actually the field name that Splunk will be storing. It is a best practice to name this field according to the Common Information Model . If this field is not in one of the CIM fields, you can name it pretty much anything.

And that’s it! Make sure these files are in their proper locations, and then restart Splunk. Once Splunk is restarted, configure a new Stream (of netflow type). After configuring and enabling the stream, you should start seeing your custom elements right in the stream data! If you have questions or problems, please place a question on https://answers.splunk.com, ask in the Slack channel, or contact Splunk Support.

Updated: