Pythoning with Pandas

Introduction

I usually work on my data (i.e. stats files coming from network elements we maintain) using spreadsheets. It’s a routine task that I do –> analyze trends, check for possible anomalies and optimization. So I thought maybe I should automate it. The search led me to the discovery of Python’s data management library “Pandas”. I think I’m now addicted with it 🙂

Since then I moved from excel-based analysis to Python for automated analysis together with the powerful data management library – ‘pandas’.

My Used Cases

Case 1:

My team manage several network nodes, network elements are geo-redundant. To evaluate the performance, traffic occupancy and stability of these nodes stats files generated by the nodes are collected and analyzed.

For this case, we want to generate a new dataframe (or csv file) by merging and sum the counters generated by a pair of node.

Let have a look at the (sample) data/counters of the node’s signaling links

Sample Data 1 (node1)

version=9.0
date,instance,MSU's transmitted,MSU's received,MSU's octets transmitted,MSU's octets received,MSUs received requiring GTT,MSU octets received for MSUs requiring GTT,Duration of link set inactivity
2020-10-26-00:05,LKSET_1_TEAS-TEAS,528,570,11000,26459,0,0,0
2020-10-26-00:05,LKSET_5_TIHO-TIHO,18,10,375,228,0,0,0
2020-10-26-00:05,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,202,179,23462,16303,159,15883,0
2020-10-26-00:05,LKSET_1_LISTP6017NLS-LISTP6017NLS,21495,19685,2183024,3279609,19675,3279259,0
2020-10-26-00:05,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,20,20,700,700,0,0,0
2020-10-26-00:05,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,586,511,16769,12248,0,0,0
2020-10-26-00:10,LKSET_1_TEAS-TEAS,337,410,6623,19562,0,0,0
2020-10-26-00:10,LKSET_5_TIHO-TIHO,18,10,369,225,0,0,0
2020-10-26-00:10,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,187,191,21560,16699,170,16261,0
2020-10-26-00:10,LKSET_1_LISTP6017NLS-LISTP6017NLS,21669,19157,2185071,3187426,19147,3187076,0
2020-10-26-00:10,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,20,20,700,700,0,0,0
2020-10-26-00:10,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,349,289,11126,6949,0,0,0
2020-10-26-00:15,LKSET_1_TEAS-TEAS,324,337,6438,17652,0,0,0
2020-10-26-00:15,LKSET_5_TIHO-TIHO,30,10,600,211,0,0,0
2020-10-26-00:15,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,142,143,15848,12923,123,12503,0
2020-10-26-00:15,LKSET_1_LISTP6017NLS-LISTP6017NLS,20341,18522,2053465,3086120,18512,3085770,0
2020-10-26-00:15,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,20,20,700,700,0,0,0
2020-10-26-00:15,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,247,295,7634,7535,0,0,0

Sample Data 2 (node2)

version=9.0
date,instance,MSU's transmitted,MSU's received,MSU's octets transmitted,MSU's octets received,MSUs received requiring GTT,MSU octets received for MSUs requiring GTT,Duration of link set inactivity
2020-10-26-00:05,LKSET_1_TEAS-TEAS,978,440,40479,21912,0,0,0
2020-10-26-00:05,LKSET_5_TIHO-TIHO,25,31,510,757,0,0,0
2020-10-26-00:05,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,186,184,21524,16797,164,16377,0
2020-10-26-00:05,LKSET_1_LISTP6017NLS-LISTP6017NLS,21618,22411,2184153,3382136,19615,3269682,0
2020-10-26-00:05,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,20,20,700,700,0,0,0
2020-10-26-00:05,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,542,583,14488,13280,0,0,0
2020-10-26-00:10,LKSET_1_TEAS-TEAS,753,337,31402,17535,0,0,0
2020-10-26-00:10,LKSET_5_TIHO-TIHO,23,30,478,821,0,0,0
2020-10-26-00:10,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,200,176,22367,15239,156,14819,0
2020-10-26-00:10,LKSET_1_LISTP6017NLS-LISTP6017NLS,21613,22072,2189857,3325865,19293,3213985,0
2020-10-26-00:10,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,20,20,700,700,0,0,0
2020-10-26-00:10,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,331,446,9410,10521,0,0,0
2020-10-26-00:15,LKSET_1_TEAS-TEAS,756,357,32251,17968,0,0,0
2020-10-26-00:15,LKSET_5_TIHO-TIHO,27,48,561,1910,0,0,0
2020-10-26-00:15,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,161,146,17802,12141,125,11703,0
2020-10-26-00:15,LKSET_1_LISTP6017NLS-LISTP6017NLS,19821,20965,2022165,3179670,18516,3080600,0
2020-10-26-00:15,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,20,20,700,700,0,0,0
2020-10-26-00:15,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,304,419,8419,10012,0,0,0

We want a result file to sum the counters of each instance (i.e. link)

Output

date,instance,MSU's transmitted,MSU's received,MSU's octets transmitted,MSU's octets received,MSUs received requiring GTT,MSU octets received for MSUs requiring GTT,Duration of link set inactivity
2020-10-26-00:05,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,1128,1094,31257,25528,0,0,0
2020-10-26-00:05,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,388,363,44986,33100,323,32260,0
2020-10-26-00:05,LKSET_1_LISTP6017NLS-LISTP6017NLS,43113,42096,4367177,6661745,39290,6548941,0
2020-10-26-00:05,LKSET_1_TEAS-TEAS,1506,1010,51479,48371,0,0,0
2020-10-26-00:05,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,40,40,1400,1400,0,0,0
2020-10-26-00:05,LKSET_5_TIHO-TIHO,43,41,885,985,0,0,0
2020-10-26-00:10,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,680,735,20536,17470,0,0,0
2020-10-26-00:10,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,387,367,43927,31938,326,31080,0
2020-10-26-00:10,LKSET_1_LISTP6017NLS-LISTP6017NLS,43282,41229,4374928,6513291,38440,6401061,0
2020-10-26-00:10,LKSET_1_TEAS-TEAS,1090,747,38025,37097,0,0,0
2020-10-26-00:10,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,40,40,1400,1400,0,0,0
2020-10-26-00:10,LKSET_5_TIHO-TIHO,41,40,847,1046,0,0,0
2020-10-26-00:15,LKSET_1_BOMAR2675NLS-BOMAR2675NLS,551,714,16053,17547,0,0,0
2020-10-26-00:15,LKSET_1_BOOKSCP92NLS-BOOKSCP92NLS,303,289,33650,25064,248,24206,0
2020-10-26-00:15,LKSET_1_LISTP6017NLS-LISTP6017NLS,40162,39487,4075630,6265790,37028,6166370,0
2020-10-26-00:15,LKSET_1_TEAS-TEAS,1080,694,38689,35620,0,0,0
2020-10-26-00:15,LKSET_4_PAXMAR01NLS-PAXMAR01NLS,40,40,1400,1400,0,0,0
2020-10-26-00:15,LKSET_5_TIHO-TIHO,57,58,1161,2121,0,0,0

Let’s go through the code

# import pandas (aliased as 'pd'), glob and os libraries to make it available for use
import pandas as pd
import glob
import os
# glob to retrieve files in the data directory of the nodes
list_of_files = glob.glob('/data/SUBOK/sample_data_1/*.csv')
list_of_files = glob.glob('/data/SUBOK/sample_data_2/*.csv')
# data directory contains daily files and we are only interested with latest file
latest_file1 = max(list_of_files, key=os.path.getctime)
latest_file2 = max(list_of_files, key=os.path.getctime)
# skiprows=1 : in the sample data we want to drop row=0 and keep row=1 as header
# pd.read_csv : read the file (i.e. latest file for this example
# pd.concat to concatenate the (latest) files
df = pd.concat([pd.read_csv(latest_file1,skiprows=1),pd.read_csv(latest_file2,skiprows=1)])
# df.groupby : group values in the dataset based on the column selected
# sum() : sum up the counters in the group
result=df.groupby(['date','instance']).sum()
# direct the output to a result file
result.to_csv('/data/SUBOK/results/result_linkset.csv')

suboktech - rranches

Although more than 20years working on Mobile Telecommunications – Core Network. Always wanting to learn and try new things.