Category Archives: Scripting

A Quick and Dirty Algorithm for fuzzy matching

The VBA program below is an easy-to-use Excel function that can be used to quickly check two strings through a fuzzy logic.  I created this program to locate some phonetic equivalents in Tamil.

Input 1 Input 2 Fuzzy 1 Fuzzy 2 Result
Madrass Madras mrs mrs Match
Rangoon Yangooon rnkn nkn Not a Match
Thiruvizhaa Thiruvila rvl rvl Match
Aan Azhagan Aanazhagan anlkn anlkn Match

Feel free to apply your language equivalents that you think are relevant.

Option Explicit

'FuzzyLogic
Function Cleanup(Text As String) As String
    Dim firstChar As String
    firstChar = Mid(Text, 1, 1)
    Text = Mid(Text, 2) 'keep the first char
    Text = LCase(Text) 'change the case
    Text = RemoveSplChars(Text) 'remove spl characters
    Text = RemoveVowels(Text) 'remove vowels
    Text = ReplaceLangEquivalents(Text) 'replace lang equivalents
    Text = RemoveRepeatingChars(Text) ' remove repeating  (or duplicate) chars
    Text = LCase(firstChar & Text) 'include the first char

    Text = ReplaceLangEquivalents(Text) 'replace lang equivalents once more

    Cleanup = Text
End Function

Private Function RemoveSplChars(Text As String) As String
    Text = Replace(Text, ".", "")
    Text = Replace(Text, " ", "")
    RemoveSplChars = Text
End Function

Private Function RemoveVowels(Text As String) As String
    Text = Replace(Text, "a", "", , , vbTextCompare)
    Text = Replace(Text, "e", "", , , vbTextCompare)
    Text = Replace(Text, "i", "", , , vbTextCompare)
    Text = Replace(Text, "o", "", , , vbTextCompare)
    Text = Replace(Text, "u", "", , , vbTextCompare)
    RemoveVowels = Text
End Function

Private Function RemoveRepeatingChars(Text As String) As String
    Dim i As Integer
x:
    'remove 2 chars
    For i = 1 To Len(Text) - 2
        If Mid(Text, i, 2) = Mid(Text, i + 2, 2) Then
            Text = Replace(Text, Mid(Text, i, 2) & Mid(Text, i + 2, 2), Mid(Text, i, 2), , , vbTextCompare)
            GoTo x:
        End If
    Next

y:
    'remove 1 chars
    For i = 1 To Len(Text) - 1
        If Mid(Text, i, 1) = Mid(Text, i + 1, 1) Then
            Text = Replace(Text, Mid(Text, i, 1) & Mid(Text, i + 1, 1), Mid(Text, i, 1), , , vbTextCompare)
            GoTo y:
        End If
    Next
    RemoveRepeatingChars = Text
End Function

Private Function ReplaceLangEquivalents(Text As String, Optional Lang As String = "ta") As String
    Select Case (Lang)
        Case "ta":

            Text = Replace(Text, "gh", "g", , , vbTextCompare)
            Text = Replace(Text, "th", "d", , , vbTextCompare)
            Text = Replace(Text, "dh", "d", , , vbTextCompare)
            Text = Replace(Text, "zh", "l", , , vbTextCompare)
            Text = Replace(Text, "sh", "s", , , vbTextCompare)
            Text = Replace(Text, "dr", "r", , , vbTextCompare)

            Text = Replace(Text, "g", "k", , , vbTextCompare) 'sakodhara, sagodhara
            Text = Replace(Text, "y", "", , , vbTextCompare) 'koyil, kovil, koil
    End Select
    ReplaceLangEquivalents = Text
End Function

Comparing Excel Worksheets using VBA

My wife asked me if I could help her with a macro that will compare two Excel Worksheets to generate a variance or a diff report. Here it is;

Sub CompareSheets(Sheet1 As Worksheet, Sheet2 As Worksheet, RangeToCompare As String, SkipFirstRow As Boolean)
    Dim i As Integer, j As Integer
    Dim val1, val2
    Dim ResultSheet As Worksheet
    Set ResultSheet = Application.Sheets.Add()
    ResultSheet.Name = "ResultSheet" & Format(DateTime.Date, "yyyymmdd") & "-" & Format(DateTime.Time, "hhmmss")
    For i = 1 To Sheet1.UsedRange.Rows.Count
        For j = 1 To Sheet1.UsedRange.Columns.Count
            If j = 1 Or (SkipFirstRow = True And i = 1) Then
                ResultSheet.Cells(i, j) = Sheet1.Cells(i, j)
            Else
                val1 = Sheet1.Cells(i, j)
                val2 = Application.WorksheetFunction.VLookup(Sheet1.Cells(i, 1), Sheet2.Range(RangeToCompare), j, False)
                ResultSheet.Cells(i, j) = (val1 = val2)
            End If

        Next j
    Next i
End Sub

This Sub-routine can be simply called like this:

Sub Test()
    CompareSheets Application.Sheets(2), Application.Sheets(3), "A:E", True
End Sub

Levenshtein and Damerau–Levenshtein edit distance

I have put the functions that can be embedded as excel functions to calculate the distance between two strings using the algorithms Levenshtein and Damerau–Levenshtein. The four functions available are:

  • Levenshtein – returns the edit distance between two words
  • Damerau – returns the edit distance between two words (a variant of Levenshtein with Transposition)
  • NormalizedLevenshtein – returns Levenshtein score between 0 and 1
  • NormalizedDamerau – returns Damerau-Levenshtein score between 0 and 1
Function Levenshtein(ByVal String1 As String, ByVal String2 As String) As Long

    Dim i As Long, j As Long
    Dim lngString1 As Long
    Dim lngString2 As Long
    Dim Distance() As Long

    lngString1 = Len(String1)
    lngString2 = Len(String2)
    ReDim Distance(lngString1, lngString2)

    For i = 0 To lngString1
        Distance(i, 0) = i
    Next

    For j = 0 To lngString2
        Distance(0, j) = j
    Next

    For i = 1 To lngString1
        For j = 1 To lngString2
            If Asc(Mid$(String1, i, 1)) = Asc(Mid$(String2, j, 1)) Then
                Distance(i, j) = Distance(i - 1, j - 1)
            Else
                Distance(i, j) = Application.WorksheetFunction.Min _
                (Distance(i - 1, j) + 1, _
                 Distance(i, j - 1) + 1, _
                 Distance(i - 1, j - 1) + 1)
            End If

        Next
    Next

    Levenshtein = Distance(lngString1, lngString2)

End Function

Function Damerau(String1 As String, String2 As String) As Long
    Dim i As Long, j As Long
    Dim lngString1 As Long
    Dim lngString2 As Long
    Dim Distance() As Long
    Dim cost As Integer

    lngString1 = Len(String1)
    lngString2 = Len(String2)
    ReDim Distance(lngString1 + 1, lngString2 + 1)

    For i = 0 To lngString1
        Distance(i, 0) = i
    Next

    For j = 0 To lngString2
        Distance(0, j) = j
    Next

    For i = 1 To lngString1
        For j = 1 To lngString2
            If Asc(Mid$(String1, i, 1)) = Asc(Mid$(String2, j, 1)) Then
                cost = 0
            Else
                cost = 1
            End If

            Distance(i, j) = Application.WorksheetFunction.Min _
            (Distance(i - 1, j) + 1, _
             Distance(i, j - 1) + 1, _
             Distance(i - 1, j - 1) + cost)

            If (i > 1 And j > 1) Then
                If (Asc(Mid$(String1, i, 1)) = Asc(Mid$(String2, j - 1, 1)) And Asc(Mid$(String1, i - 1, 1)) = Asc(Mid$(String2, j, 1))) Then
                Distance(i, j) = Application.WorksheetFunction.Min _
                    (Distance(i, j), _
                     Distance(i - 2, j - 2) + cost)
                End If
            End If

        Next
    Next

    Damerau = Distance(lngString1, lngString2)

End Function

Function NormalizedDamerau(ByVal String1 As String, ByVal String2 As String) As Double
    Dim lngDamerau
    lngDamerau = Damerau(String1, String2)
    NormalizedDamerau = 1 - lngDamerau / IIf((Len(String1) + 1 > Len(String2) + 1), Len(String1) + 1, Len(String2) + 1)
End Function

Function NormalizedLevenshtein(ByVal String1 As String, ByVal String2 As String) As Double
    Dim lngLevenshtein
    lngLevenshtein = Levenshtein(String1, String2)
    NormalizedLevenshtein = 1 - lngLevenshtein / IIf((Len(String1) + 1 > Len(String2) + 1), Len(String1) + 1, Len(String2) + 1)
End Function

Splitting large files using VBScript

Ever thought that you wanted to split a large multi-gigabit file into multiple chunks. Here is a simple VB script file that will do the trick for you. It may take a longer time for larger files, but it will serve its purpose.

''This simple VBScript spilts large text files into multiple files

Dim  Counter
Const InputFile = "C:\input.txt"
Const OutputFile = "C:\output"
Const RecordSize = 200000
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile (InputFile, ForReading)
Counter = 0
FileCounter = 0
Set objOutTextFile = Nothing

Do Until objTextFile.AtEndOfStream
	if Counter = 0 Or Counter = RecordSize Then
		Counter = 0
		FileCounter = FileCounter + 1
		if Not objOutTextFile is Nothing then objOutTextFile.Close
		Set objOutTextFile = objFSO.OpenTextFile( OutputFile & "_" & FileCounter & ".csv", ForWriting, True)
	end if
	strNextLine = objTextFile.Readline
	objOutTextFile.WriteLine(strNextLine)
	Counter = Counter + 1
Loop
objTextFile.Close
objOutTextFile.Close
Msgbox "Split process complete"

Constants that may require changes are InputFile-the input file name, OutputFile-the output file path along with the output file prefix and Recordsize-number of line in each output file. Multiple files beginning with [OutputFile] will get created.

VBScript to read an Excel file

Here is a small program that will just use VBScript to read an excel file:

Const ForWriting = 2
Folder = "C:\Users\Prabhuram\Desktop\"
Set objExcel = CreateObject("Excel.Application")
Set objWorkbook = objExcel.Workbooks.Open( Folder + "Applications.xls")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile (Folder + "Out-File.txt", ForWriting, True)
For Each mySheet In objWorkbook.Worksheets
  Set myRng = mySheet.UsedRange
  For Each myCell In myRng.Cells
    objTextFile.WriteLine(cstr(myCell.Value))
  Next
Next
objWorkbook.close
objTextFile.Close

Using VBA to show an excel file as HTML

Here is the HTML code to show an excel file created using MS EXCEL 2003 as a web page. Any excel file can be used and the path has to the changed in the following code, though there is a small drawback which is the absolute path as shown below.

<html>
<head>
    <style>
        body
        {
            font-family: "fixedsys";
        }
    </style>
    <head>
        <body>
        <script type="text/vbscript">
            Set objExcel = CreateObject("Excel.Application")
            Set objWorkbook = objExcel.Workbooks.Open _
                ("E:\MulMatchLookup.xls")
            intRow = 1
            Do Until objExcel.Cells(intRow,1).Value = ""
                intColumn = 1
                Do Until objExcel.Cells(intRow,intColumn).Value = ""
                    document.write(objExcel.Cells(intRow, intColumn).Value + "|")
                    intColumn = intColumn + 1
                Loop
                document.write("<br/>")
                intRow = intRow + 1
            Loop
            objExcel.Quit
        </script>
</body>
</html>

XML to CSV using XSL and VBScript

We have seen XML Transformation using XSL: A sample before to transform an XML to a HTML using XSLT. This time we will see how to convert an XML to a CSV file.

The output will be a simple tab separated list with carriage return as the delimiter like this:

Empire Burlesque        Bob Dylan       USA
Roja    AR Rehman       India
Eagles  Eagles  USA

We will use the same catalog.xml which we used in our earlier example. And our XSL is going to generate a text instead of html which can done by this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:output method="text"/>
	<xsl:strip-space elements="*"/>
	<xsl:param name="separator" select="'&#9;'"/>
	<xsl:param name="line-separator" select="'&#13;&#10;'"/>

	<xsl:template match="/">
		<xsl:for-each select="catalog/cd">
			<xsl:value-of select="title"/>
			<xsl:value-of select="$separator"/>
			<xsl:value-of select="artist"/>
			<xsl:value-of select="$separator"/>
			<xsl:value-of select="country"/>
			<xsl:value-of select="$separator"/>
			<xsl:value-of select="$line-separator"/>
		</xsl:for-each>
	</xsl:template>
</xsl:stylesheet>

A simple vbscript will do the trick to transform the xml to a file.

 

The complete code is available here: catalog-csv.zip (1.05 kb)

Changing Delimiters in CSV files using VBScript

Here is a VBScript file to transform CSV files from one delimiter to another. The sample here will convert a Tilde(~) delimited file to Tab delimited. Feel free to take the code and change it. Four identifier values in VBS file should be changed before executing it. The identifiers are:

  1. Input folder (reads all files from this folder)
  2. Output folder
  3. Find Delimeter
  4. Replace Delimiter

Always run this file from command prompt or from a batch file, else change the "stdout" in the code. After execution the program gives a summary about the file and record information. To run the script simply execute the following commands:

C:\>wscript /h:cscript /B
C:\>ChangeDelimiter.vbs

Download source here: ChangeDelimiter.vbs (1.43 kb)

Creating Excel Charts on the fly with Macros

To generate a pie chart on the fly, all I have is an excel template with a macro and on a need basis I simply copy data from other sources to the template (with is basically an excel file with macros enabled) and with the click of a button I generate a chart that gets copied to clipboard and simply I paste it wherever necessary.

Download macro here: CopyAsChart.vb (1.03 kb)

Hiding/Masking Credentials (User name and Password) in Batch Files

I had few ISQL and BCP jobs that are scheduled to run from SQL Servers. Hmm, yes there is a problem here when you have to use the login credentials like this:

"C:\Program Files\Microsoft SQL Server\90\Tools\Binn\osql\osql.exe" ^
    -U%username% -P%password% -S%server%

I was thinking of a way to encrypt the user name and password. The fact is that there is no easy way to do this. But there is a work around.

The solution is:

  • Create a configuration file in a remote folder to which only a particular user account has access to. Eventually since the job will be executed as a scheduled job, the user account that runs the original batch task should have access to this remove folder and not all the user.
  • List the user crendentials that needs to be securely kept. Say for example, create an INI file Credentials.INI (53.00 bytes) with required settings:
    • server=mySQLServer
    • username=sa
    • password=P@@ssw0rd
  • Read the parameters from your batch file ReadINI.bat (163.00 bytes) or any file that wishes to use the credentials. Note that the Crendentials.INI can refer to a
    • for /f "tokens=1,2 delims==" %%a in (Credentials.INI) do (
      set %%a=%%b
      )

Now you can simple access the parameters in your INI as %username%, %server%, etc.