Network debugging on Windows

Sometimes, things may not work as you expect. The most common error is probably when you try to connect to a database through Gallium Data, and you get an error. But of course, there are many other scenarios that can result in an error.

To debug the problem in depth on Windows, you can capture the network traffic to and from Gallium Data into a file, using Microsoft's pktmon tool, which is normally included in modern versions of Windows. This network capture file will almost always allow Gallium Data support to help you figure out what is going wrong.


1 - Start a Gallium Data instance

Make sure this instance is on a different machine than both the database server and the database client. This is because pktmon cannot capture loopback traffic.

Next, we need to know the two ports we're going to monitor.

In Gallium Data, take a look at the connection you want to debug, and note the incoming and outgoing ports (they might be the same).

Now, take a look at how you map ports in the Docker command you use to run Gallium Data. We're interested in the incoming port, which is mapped in Docker to the port labelled "Local port" in the Gallium Data connection. In the diagram below, that would be port 1432, assuming the Gallium Data instance is run with a command along the lines of:

docker run -d -p 8089:8080 -p 1432:1431 galliumdata/gallium-data-engine:X.X.X

So in this example, the two ports we're interested in are 1432 -- the port to which the database client connects -- and 1433 -- the port that Gallium Data uses to connect to the database server. It's possible for them to be the same number. We'll need these two ports in a minute.

2 - Make sure you can reproduce the problem

Typically that means logging into the database from a database client, or performing a specific database operation.

Whatever it is, make sure you can reproduce the problem consistently. The less traffic we capture, the easier it will be to debug, so ideally the problem should be reproducible with just one operation, e.g. logging in, or running a specific query.

Once you are satisfied that the problem can be reproduced consistently, set things up so that you can reproduce it one more time, but don't perform the operation that causes an error yet.

3 - Turn on network tracing

In step 1, you took note of which ports are in use. We'll call these <portA> and <portB> -- in real life, they might be 1432 and 1433, for instance.

On the Windows machine that is running Gallium Data in Docker, open an administrator command line and create a directory in a convenient place to hold the network traffic files. You should not typically need much disk space, a few megabytes should be enough. Make this directory the current directory.

Run the following commands:

pktmon filter add -p <portA> -t tcp psh fin

If <portA> and <portB> are not the same, you'll also need to run:

pktmon filter add -p <portB> -t tcp psh fin

To make sure these filters are in place, run:

pktmon filter list

The output should be similar to the following (you may have only one filter, and the port numbers may be different, of course):

Packet Filters:

# Name Protocol Port

- ---- -------- ----

1 <empty> TCP (FIN PSH) 1432

2 <empty> TCP (FIN PSH) 1433

If you see other filters that you did not just create, you may want to remove them, otherwise they will pollute the output with irrelevant traffic.

We're now ready to start tracing with the following command:

pktmon start -c --pkt-size 9000

From that point forward, all network traffic on the specified ports will be saved into a file.

You should see a status message similar to the following:

Logger Parameters:

Logger name: PktMon

Logging mode: Circular

Log file: C:\Users\Administrator\Documents\Capture\PktMon.etl

Max file size: 512 MB

Memory used: 128 MB


Collected Data:

Packet counters, packet capture


Capture Type:

All packets


Monitored Components:

All


Packet Filters:

# Name Protocol Port

- ---- -------- ----

1 <empty> TCP (FIN PSH) 1432

2 <empty> TCP (FIN PSH) 1433

Important: there is no need to rush, but you should avoid having pktmon active for more than a minute or two, otherwise it may capture a lot of network traffic unnecessarily and fill up your disk.

Now reproduce the problem by performing whatever operation causes it, like you did in step 2.

As soon as the problem has been reproduced, run the following commands:

pktmon stop

pktmon pcapng PktMon.etl -o traffic.pcapng

pktmon filter remove

At this point, you should have a file called traffic.pcapng containing the network traffic on the ports you specified. You can zip it if desired, it should typically get over 90% compressed.