Saturday, July 19, 2014

VBA Useragent Parser Project - Part 1 Iterate through the rows in a column & Part 2 Adding Regex to parse for information

Recently I have web logs handed to me in Excel format, not wanting to export it to text for processing, i thought i would work on a Excel VBA for parsing useragent data.

Apparently there isnt any that i can find from the internet, its probably a terrible idea to use excel for parsing web logs.

Anyway i will post my progress for the coding on this site for my future reference.

The code below is for looping through column B in the excel sheet.
 Sub Parse()  
 '  
 ' Parse Macro  
 '  
 '  
  Dim i As Long  
  Do While True  
  Count = Count + 1  
   If IsEmpty(Cells(Count, "B").Value) Then  
     Cells(Count, "B").Value = "end"  
     Exit Do  
   End If  
  Cells(Count, "B").Font.Color = vbRed  
  Loop  
 End Sub  


Based on the code above, i have added regular expression to parse for OS, Webkit and Browser information. Feel free to try it: (Place Useragent information in column B of your excel sheet and execute this macro)
 Sub Parse()  
  '  
  ' Parse Macro  
  '  
  '  
  Dim i As Long  
  Dim UACol As String  
  'Browser type and version'  
  Dim Browser(3) As String  
  Dim BrowserVer(4) As String  
  Dim BrowserCol As String  
  Dim BrowserVerCol As String  
  'OS type and version'  
  Dim OSVer(4) As String  
  Dim OSCol As String  
  Dim OSVerCol As String  
  'Device Type'  
  Dim DEVICEVer(0) As String  
  Dim DEVICEVerCol As String  
  'Webkit Version'  
  Dim WEBKITVer(0) As String  
  Dim WEBKITVerCol As String  
  UACol = "B"  
  BrowserCol = "C"  
  BrowserVerCol = "D"  
  OSCol = "E"  
  OSVerCol = "F"  
  DEVICEVerCol = "G"  
  WEBKITVerCol = "H"  
  Browser(0) = "Chrome"  
  Browser(1) = "IE"  
  Browser(2) = "Safari"  
  Browser(3) = "Firefox"  
  BrowserVer(0) = "(Chrome)\/([^\s\;]+)"  
  BrowserVer(1) = "(Firefox)\/([^\s\;]+)"  
  BrowserVer(2) = "(IE)[\/\s]([^\s\;]+)"  
  BrowserVer(3) = "(Trident)\/([^\s\;]+)"  
  BrowserVer(4) = "(Safari)\/([^\s\;]+)"  
  'iPad; CPU OS 7_1_1 like Mac OS X'  
  'iPhone; CPU iPhone OS 7_1_1 like Mac OS X'  
  'iPhone; U; CPU iPhone OS 2_1 like Mac OS X; en-us'  
  OSVer(0) = "(ip[honead]+)\; .+ ([^\s\;]+) like Mac OS X"  
  'Android 4.0.2'  
  OSVer(1) = "(android) ([^\s\;]+)"  
  'User Agent in BlackBerry 10'  
  'Mozilla/5.0 (BB10; <Device Type>) AppleWebKit/537.10+ (KHTML, like Gecko) Version/<BB Version #> Mobile Safari/537.10+'  
  OSVer(2) = "(BB10|BlackBerry).+ BB Version[/\b\s]([^\s\;]+)"  
  OSVer(3) = "(windows) (NT [^\s\;]+)"  
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.13+ (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2'  
  'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; sv-se) AppleWebKit/85.7 (KHTML, like Gecko) Safari/85.5'  
  OSVer(4) = "(Macintosh).+ (MAC OS [^\s\;]+)"  
  'Mozilla/5.0 (Linux; Android 4.1.2; GT-N7000 Build/JZO54K) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.123 Mobile Safari/537.22 OPR/14.0.1025.52315'  
  'Android.+ (G[^\s\;]+-[^\s\;]+)'  
  'Mozilla/5.0 (Linux; U; Android 4.0.2; en-us; Galaxy Nexus Build/ICL53F) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 :'  
  'Android.+ (Galaxy [^\s\;]+)'  
  DEVICEVer(0) = "Android.+ ((Galaxy |GT-|GN-)([^\s\;]+))"  
  WEBKITVer(0) = "(([^\s]+webkit)\/([^\s\;]+))"  
  Set objRegExp_1 = CreateObject("vbscript.regexp")  
  objRegExp_1.Global = True  
  objRegExp_1.IgnoreCase = True  
   'Regex Test Start'  
   'objRegExp_1.Pattern = "(Chrome)\/([^\s]+)" '  
   'objRegExp_1.Pattern = "Android.+ ((Galaxy |GT-|GN-)([^\s\;]+))"  
   'strToSearch = "Mozilla/5.0 (Linux; Android 4.1.2; GT-N7000 Build/JZO54K) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.123 Mobile Safari/537.22 OPR/14.0.1025.52315"  
   'strToSearch = "Mozilla/5.0 (Linux; U; Android 4.0.2; en-us; Galaxy Nexus Build/ICL53F) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.123 Mobile Safari/537.22 OPR/14.0.1025.52315"  
   'Set regExp_Matches = objRegExp_1.Execute(strToSearch)  
   'For Each objMatch In regExp_Matches  
   'DEVICE = objMatch.SubMatches(0)  
   'DEVICEVERSION = objMatch.SubMatches(1)  
   'MsgBox ("This string is a valid email address." & DEVICE & " " & DEVICEVERSION)  
   'Next  
   'Regex Test end'  
  Do While True  
  Count = Count + 1  
   If IsEmpty(Cells(Count, UACol).Value) Then  
    Cells(Count, UACol).Value = "end"  
    Exit Do  
   End If  
  'Browser Check'  
  For Each element In BrowserVer  
   objRegExp_1.Pattern = element  
   Set regExp_Matches = objRegExp_1.Execute(Cells(Count, UACol).Value)  
   For Each objMatch In regExp_Matches  
   Cells(Count, BrowserCol).Value = objMatch.SubMatches(0)  
   Cells(Count, BrowserVerCol).Value = objMatch.SubMatches(1)  
   GoTo OSCheck  
   Next objMatch  
  Next element  
 OSCheck:  
 'OS Check'  
  For Each element1 In OSVer  
   objRegExp_1.Pattern = element1  
   Set regExp_Matches = objRegExp_1.Execute(Cells(Count, UACol).Value)  
   For Each objMatch In regExp_Matches  
   Cells(Count, OSCol).Value = objMatch.SubMatches(0)  
   Cells(Count, OSVerCol).Value = objMatch.SubMatches(1)  
   GoTo DEVICECheck  
   Next objMatch  
  Next element1  
 DEVICECheck:  
 'Device Check'  
  For Each element In DEVICEVer  
   objRegExp_1.Pattern = element  
   Set regExp_Matches = objRegExp_1.Execute(Cells(Count, UACol).Value)  
   For Each objMatch In regExp_Matches  
   Cells(Count, DEVICEVerCol).Value = objMatch.SubMatches(0)  
   GoTo WEBKITCheck  
   Next objMatch  
  Next element  
 WEBKITCheck:  
 'WEBKIT Check'  
  For Each element In WEBKITVer  
   objRegExp_1.Pattern = element  
   Set regExp_Matches = objRegExp_1.Execute(Cells(Count, UACol).Value)  
   For Each objMatch In regExp_Matches  
   Cells(Count, WEBKITVerCol).Value = objMatch.SubMatches(0)  
   GoTo NextLoop  
   Next objMatch  
  Next element  
 NextLoop:  
  Loop  
  End Sub  

References required to produce this:

iOS Useragent info
http://www.webapps-online.com/online-tools/user-agent-strings/dv/operatingsystem51849/ios
http://www.enterpriseios.com/wiki/Complete_List_of_iOS_User_Agent_Strings

A more detailed Javascript UA parser
https://github.com/faisalman/ua-parser-js/blob/master/src/ua-parser.js
http://faisalman.github.io/ua-parser-js/

VBA Regular Expression
https://www.udemy.com/blog/vba-regex/
https://www.udemy.com/blog/vba-regex/

2 comments:

  1. Lovely piece of work, thank you.
    I had to tweak it for IE11 as we don't generally know it as "Trident" and I added a small table to my spreadsheet to translate "NT x.y" to something understandable:
    NT 10.0 10
    NT 6.3 8.1
    NT 6.2 8
    NT 6.1 7
    NT 6.0 Vista
    NT 5.1 XP

    I'm happy to share my changes if the OP is interested.
    (Produced with Office 2013)

    ReplyDelete
    Replies
    1. I would love to have your additions for IE 11/Trident and NT translations!

      Delete