This entry is part 4 of 4 in the series Intro to PyShark

So far in this series we’ve done a lot with capturing packets and working with the capture object, but finally we’re going to get to the fun part and finally start playing with some PACKETS!!!!

When we have captured packets in a capture object, they are stored as a list of packet objects.  These packet objects will have methods and attributes that give us access to the header and payload info of each packet.  As stated in a previous post we have control for how much info about the packets we store in each packet option through the only_summaries argument in the LiveCapture and ReadCapture modules.

Packet Summary Attributes

Setting only_summaries to True during capture will give us a fixed set of attributes, regardless of the protocols present in the packet. The most useful attributes available are:

  • delta: Delta (difference) time between the current packet and the previous captured packet
  • destination: The Layer 3 (IP, IPv6) destination address
  • info: A brief application layer summary (e.g. ‘HTTP GET /resource_folder/page.html’)
  • ip id: IP Identification field used for uniquely identifying packets from a host
  • length: Length of the packet in bytes
  • no: Index number of the packet in the list
  • protocol: The highest layer protocol recognized in the packet
  • source:  Layer 3 (IP, IPV6) source address
  • stream: Index of the TCP stream this packet is a part of (TCP packets only)
  • summary_line: All the summary attributes in one tab-delimited string
  • time: Absolute time between the current packet and the first packet
  • window: The TCP window size (TCP packets only)

There’s a lot that you can do with just these items; printing out packet summaries is just the beginning! Great visual charts can be made to illustrate IP conversations, bandwidth usage, protocol breakdowns, and application performance measurements (round-trip-times in the same TCP stream). Wow, that’s a lot of useful analysis.. but wait, there’s more?!?

Full Packet Attributes

If you’re wanting to get more than just the summary info out of the capture packets then you’re in the right place. Using the dissectors available in Wireshark and tshark, PyShark is able to break out all packet details by layer. For example, let’s dig into this DNS packet first by looking at the attributes of the parent packet object:

There are several generic packet info attributes for length, frame_info, and time, and a pretty_print() method for displaying the packet in a very readable format (similar to Wireshark’s packet detail view). If you look close though you’ll see attributes that represent absolute layers (eth and ip) as well as attributes that change based on the protocols present in each packet (transport_layer, highest_layer, dns).  If you’re looking for  specific traffic, these attributes make it very easy to locate the packets containing interesting information, like this script that will print out all DNS query and response names:

Which gives the output:

 Dynamic Layer References

Using the dynamic layer attributes I mentioned earlier gives us some flexibility when analyzing packets. If you try to access the pkt.dns.qry_resp attribute of every packet, you will receive an AttributeError if the packet doesn’t have DNS info. This also applies to transport layer, since each packet will have either a TCP or UDP layer. We can print out the source and destination addresses and ports (for IP conversation mapping) and use a try/except loop to protect against the AttributeError if the packet is neither TCP nor UDP:

Which gives the output:

 Endless Possibilities

As you can see from just these few examples is that PyShark gives access to all packet details with ease. Conditional statements could be used to create dynamic logic for categorizing and working with many different protocols, or you can search for certain types of traffic that have a specific attribute (make sure to protect for AttributeErrors).


I hope you’ve enjoyed these last few posts. I’d love to hear about projects that you come up with using PyShark, so feel free to share! If you’d like to see one example of PyShark out in the wild, check out my project Cloud-Pcap.

Series Navigation<< PyShark – Using the capture Object

This article has 24 comments

  1. Pingback: PyShark – Using the capture Object | thePacketGeek

  2. m007averick

    I am using pyshark and I like it very much. However I am trying to generate visual charts and this line of yours got me interested –
    “Great visual charts can be made to illustrate IP conversations”. Can you tell me how to do this? What utility are you using? 

    1. Mat

      Since there’s no graphing utilities inherent in PyShark, it’s really up to you to pick a graphing utility that works with the platform you are developing with. I mostly work on web projects and I’ve had a good experience with HighCharts, but you can also check out Google Charts or D3.js. If you’re making a native Python app with a GUI using QT, something like PyQTgraph might work for you.

      Something I had imagined when writing that line about IP conversation graphing are some charts similar to the ‘IP Conversations’ chart in Wireshark. Also, depending on what you’re analyzing, it’d be interesting to see packet size or TCP window size over time (TCP slow-start). PyShark would be the means to get the numerical data and then you can tie that into a chart using a chart library similar to the ones referenced above.

      If you’re interested, this topic might make a great blog post that I can write-up sometime soon. I’d probably use the flask web framework and HighCharts for a easy and quick start into packet graphing.

      1. Adam

        This topic has really caught my attention, can we expect your new blog post series to come up soon? Trying to learn how to work with pyshark more better, I believe that new blog post you mentioned would be extremely helpful!

        Thank you alot

          1. aaamod

            Great article, has helped me alot! I’ve put together the required functions I need to analyse the required packets from my pcap files but was struggling of ways to output the information I needed effectively and properly through a flask web app.

            Looking forward to the next part. Thanks 😀

  3. aaamod

    Hi Mat,

    Been playing around with your cloud-pcap project a little bit lately as I have been trying to learn a little bit more for both Uni and work. I’ve been attempting to start a new hobby project that involves being able to perform some simple analysis on pcap files, what process did you go through to gather data from the pcaps (ive generated a few pcaps for different scenarios – malware use, dos attack etc…) and store the data into a database? I’ve just started learning python and can’t find any resources online into getting the pcap data into a db. 

    Are you able to give me some tips, or guide me on the process you used?

    Cheers

    1. Mat

      Hi David,

      For the analysis done in cloud-pcap I process the packets in real-time and don’t store the individual packets in a DB. Not to say it can’t be done though, I can definitely understand a need to store packets in a DB to tie together relationships and quicker/easier filtering. I actually used a document-store database (MongoDB) for a different project called meteorshark. This was done more of a PoC and sending packets to a remote DB in real-time doesn’t scale 🙂

      I highly recommend using an ORM to ease the DB creation, insertion, querying, etc of packets. The hardest part is deciding which fields you want to create as columns since there’s so many fields to choose from. I’ve been using SQLAlchemy with PostgreSQL and absolutely love it. I will probably be writing up a couple articles soon about parsing packets and representing the data in graphs.

      1. aaamod

        Thanks for the quick response! One of the things I had originally thought with my initial design would be how large the database could potentially become. I only need to grab roughly 5 fields from the packets and perform some simple analysis/produce graphs from them. I’m interested in how you done your analysis – would the best place to look be the pcap_helper.py file?

        Thanks again.

        1. Mat

          It’s good to keep that in mind as you have to consider the performace & storage impact of reading fields from a DB vs. reading the pcap each time. Let’s say per packet you’re storing four strings (Src IP, Dst IP, Packet Summary) at 32/64 bytes each, four integers (Src port, dest port, protocol) at 4 bytes each, and a timestamp at 8 bytes. That is about 200 bytes per packet, which is more efficient than storing entire packets with their payload. Each 10,000 packet sniff would cost you ~200 MB of storage, and since each packet is the same size, the storage projections are very linear.

          If you look at the decode_packet() function inside the decode_capture_file_summary() function in pcap_helper.py, you can see how easy PyShark makes accessing the packet fields you’re interested in. In order to find out what’s available, I highly recommend using a python REPL with tab auto-complete (like iPython, bpython, or ptpython) to help you dig into the packets and see all the available fields. Also, the python built-in dir() function is very helpful for expanding all the available attributes and methods of an object.

          1. aaamod

            Will definitely have a look into what you said! Thanks again for the help, looking forward to any other future articles you write.

            Cheers.

    1. Mat

      I’m glad you find the blog useful, thanks for your kind words!

      If you’re using pyshark to sniff using the LiveCapture() object (like in the examples above), the packets will be stored in the object. So, in the example above, cap is the LiveCapture object with a sniff() method. When you execute the sniff (cap.sniff()), the packets will be accessible from the LiveCapture object much like accessing items in a list. You can use a for loop to iterate through all packets or access by index for a specific packet number.

      If you assign a packet to a variable, like packet = cap[0], you will be able to access the packet attributes just like any other python object: packet.length or packet.transport_layer to access the packet data.

      1. Zubair Khalid

        Thanks Mat, your reply is very helpful and I think I am almost there.
        Now when I write the following, I am unable to access DATA field.
        All other objects are access-able e.g. pkt.transport_layer. What is the data object ?

        import pyshark
        cap= pyshark.LiveCapture()
        def print_conversation_header(pkt):
            try:
                    print pkt
                    #print pkt.transport_layer
                    print “====================”
            except AttributeError as e:
                #ignore packets that aren’t TCP/UDP or IPv4
                pass
        cap.apply_on_packets(print_conversation_header, timeout=100)

        1. Mat

          Oh, sorry, I think I misunderstood your question. So, you are trying to access the payload of the packet? 

          The data in the payload is going to be different depending on the highest protocol in the packet. For example, if the packet is DNS, you can see the DNS query/response info in the pkt.dns.qry_name and pkt.dns.resp_name fields. If the packet is HTTP, you will see the HTTP data fields in packet attributes like pkt.http.request_uri and `pkt.http.user_agent.

          1. Zubair Khalid

            Dear Mat

            Thanks for your reply, I am so thankful to you.
            I am sending the packets from one PC to the other in an infrastructure mode, I want to access the whole
            packet on the other side. I can see the packet in the WireShark software the snap is in the link  

            https://www.dropbox.com/lightbox/home/Public

            I use the code below to read the packets in Python, I can access all the fields but unable to access the Payload (Data).  

            import pyshark
            cap= pyshark.LiveCapture()

            def print_conversation_header(pkt):
                try:
                    if pkt.highest_layer == “DATA”:
                        protocol =  pkt.transport_layer
                        src_addr = pkt.ip.src
                        src_port = pkt[pkt.transport_layer].srcport
                        dst_addr = pkt.ip.dst
                        dst_port = pkt[pkt.transport_layer].dstport
                        if src_port == ‘5555’ or  src_port == ‘5556’or src_port == ‘5558’ or src_port == ‘5559’:
                             print ‘%s  %s:%s –> %s:%s’ % (protocol, src_addr, src_port, dst_addr, dst_port)
                             print pkt.tcp
                             print “=======================================”

                except AttributeError as e:
                    #ignore packets that aren’t TCP/UDP or IPv4
                    pass
            cap.apply_on_packets(print_conversation_header, timeout=100) 

          2. Zubair Khalid

            Dear Mat

            Thanks for this article Ur are doing a great JOB. I have finally found the command for data access 

            Data= pkt[pkt.highest_layer].data

            This will provide the data in the Hex format.

            Data=Data.decode(‘hex’)

            This will convert Hex to ASCII and you can read the data (Y)

          3. Mat

            Thanks for posting your solution. I was trying out some other methods but didn’t find what you posted above. That’s great!

  4. Huy Giang

    Hi Mat, thank you very much for these PyShark articles.
    Could you please share the full attribute list for the Packet object (like packet.dns.xxx.xxx …)? I could not find them anywhere, or please point me to the correct path.
    For ex, I am trying to get Packet’s dns.a (as on the wireshark docs on https://www.wireshark.org/docs/dfref/d/dns.html) but PyShark packet.dns.a does not output anything.
    Thank you.

    1. Mat

      There’s no defined list anywhere because the attributes available vary with the protocol and available fields of each packet. The way to view all available attributes of a specific packet is using python’s dir() function. If you have a DNS packet in packet, you can view the attributes available at each level like this:
      dir(packet)
      dir(packet.dns)
      etc..

      That will list out the available attributes. Also, if you have tab-complete setup in your python REPL or use a tool like iPython that has tab-complete built-in, you can simple use the tab key to see available attributes:

      packet. #press tab button to complete
      packet.dns. #press tab button

Leave a Reply

Your email address will not be published. Required fields are marked *