Image Text

Introduction

It’s been ten days since I published an article on my initial solution for the IronScripter challenge Building a PowerShell Command Inventory. That solution relied on regular expressions, most commonly called regex.

The article included a primer on regex mechanics and how to use regex in PowerShell. However, the regex for each code type that I wanted to discover produced some false positives.

I asked myself, How could I get better, more accurate, code discovery?

I then remembered that, over the last few years, I have seen a smidgen of articles on PowerShell’s abstract tree syntax (AST), most notably Mike F. Robbins’ series of articles on Learning about the PowerShell Abstract Syntax Tree (AST).

General PowerShell scripters most likely would not have encountered AST in the wild. At least, not encountered and known about it.

This short article will not go into PowerShell AST; please see Mike’s articles for a deep dive. However, I will explain how I use it in my code.

Intermediate Challenge Revisited

My original solution for the intermediate challenge was a function called Measure-PSCodeLine. It iterated through each line and matched on regex \S.

This way was a bit slow.

Jeff Hicks suggested I take a look at Measure-Object, which has a parameter set for line, word, and character count. Using this, I updated my function and renamed it since it really isn’t tied to PowerShell files specifically.

My new function, Measure-FileLine, is much faster; just check out this improvement.

Comparison of Measure-PSCodeLine and Measure-FileLine

The properties in the output from both are a bit different, the main thing to focus on is TotalCodeLines in Measure-PSCodeLine and TotalLines in Measure-FileLine. They should be identical, and since they are not, I will err on the side of the one using Measure-Object.

Updated Intermediate Challenge Solution

Advanced Challenge Revisited

My original solution for the advanced challenge attempted to detect commands and declarations for functions, classes, variables, and more using regex.

The new-and-improved solution uses the PowerShell .Net class [System.Management.Automation.Language.Parser] and the methods ParseFile() and ParseInput(). The former is used to read a file while the latter will parse a bare scriptblock.

Jeff’s expanded solution also uses this class and should definitely be reviewed to see how he built the module, handled cross-platform execution, used Write-Information, and created a PowerShell class.

Parser Class

The Parser class requires two referenced variables that store output, an array of AST tokens an array of parse errors. Checking out the documentation for the Token class, we see that the class includes a TokenFlags property. This eventually leads us to the TokenFlags enum documentation where we can see that one of the fields in the bitwise enum is CommandName. Using that we can find all commands, including cmdlets, aliases, and executables.

$Tokens = $ParseErrors = $null
$null = [System.Management.Automation.Language.Parser]::ParseFile($File.FullName,[ref]$Tokens,[ref]$ParseErrors)

CommandName TokenFlag

Let’s take a look at sample Token[] output where the TokenFlags contains CommandName.

PS> $Tokens.Where{$_.TokenFlags -eq 'CommandName'} | ft -AutoSize
Value                     Text                       TokenFlags       Kind HasError Extent
-----                     ----                       ----------       ---- -------- ------
Get-Content               Get-Content               CommandName    Generic    False Get-Content
Robocopy.exe              Robocopy.exe              CommandName    Generic    False Robocopy.exe
                          mkdir                     CommandName Identifier    False mkdir
Test-Path                 Test-Path                 CommandName    Generic    False Test-Path
Get-ChildItem             Get-ChildItem             CommandName    Generic    False Get-ChildItem
                          GetPowerShellCode         CommandName Identifier    False GetPowerShellCode
Get-CommandsFromAstTokens Get-CommandsFromAstTokens CommandName    Generic    False Get-CommandsFromAstTokens
                          Group                     CommandName Identifier    False Group
                          Select                    CommandName Identifier    False Select
Sort-Object               Sort-Object               CommandName    Generic    False Sort-Object
Get-ElapsedTimeText       Get-ElapsedTimeText       CommandName    Generic    False Get-ElapsedTimeText
Write-Information         Write-Information         CommandName    Generic    False Write-Information

We see that Kind can be Generic or Identifier. We also see that aliases could be of Identifier kind and that executables could be Generic.

I wanted to be able to include the command type, such as cmdlet, alias, filter, function, or executable. I also wanted to include the file and location where the command appeared.

Check Each Command

Each command would need to pass certain criteria.

Does it appear in the Verb-Noun format?
Then it should be considered a cmdlet.
Is the command a question mark ? or two ???
If a single question mark, then it’s an alias for Where-Object; if double, then it’s not a command.

If it fails these, then I use Get-Command to retrieve the CommandType.

For aliases, I decided to use the DisplayName property which shows the definition of the alias. This makes it easier to know where you need to look to replace the aliases in your code.

I opted to use a [hashtable] to collect all processed commands so I wouldn’t waste time going through them again.

Parameters

Measure-PSCommand includes Raw, First, and Last parameters.

Raw
Returns all commands without grouping
First
Used by an internal Select-Object to return the first n commands
Last
Used by an internal Select-Object to return the last n commands

Updated Advanced Challenge Solution

Summary

As with most things in life, there is almost always more than one way to do something in PowerShell. Even when Jeff and I used the same base code, the Parser class in this case, we still took our code in different directions.

And for my two code line counting solutions, between regex and Measure-Object, I believe the latter is the best way to go. Regardless, either way would help to bolster your skill in PowerShell.

And that is the point of the IronScripter challenges: practice, think, research, and more practice. Sacrificing just a few hours out of the month could really ramp up your PowerShell knowledge.

I encourage anyone reading this to go through the IronScripter challenges. Invest the time in your most valuable asset, you!

If you have any general questions on PowerShell, feel free to leave them in the comments or ask me on Twitter.

Thanks for reading!

Leave a comment