Category Archives: Programming

Retrieving Oracle table results from Python

A simple two liner to retrieve data results from Oracle.

# 1. Download and unzip 
#      Instant Client Package - Basic (64 bit or 32 bit same as your Python client) 
#      from http://www.oracle.com/technetwork/topics/winx64soft-089540.html
# 2. Add ..\instantclient_12_2 to environment variable "Path"
# 3. Install module cx_Oracle: 
#      py!pip install C:/cx_Oracle-6.0.2-cp36-cp36m-win_amd64.whl

import cx_Oracle as ora
import pandas as pd
conn = ora.connect("uname", "pwd", "host:port/sid")
dset = pd.read_sql("SELECT * FROM TABLE", conn)

Combining multiple files to one Excel file

This python code below allows you to choose multiple files (.csv, .xls, .xlsx) and create a multi-worksheet Excel file.


# -*- coding: utf-8 -*-

from tkinter import filedialog
from tkinter import *
from pathlib import Path
import pandas as pd

root = Tk()
infiles = filedialog.askopenfilenames(
                title = "Select the file you wish to convert", 
                filetypes = (("Excel files", "*.xls;*.xlsx"),("all files","*.*")))
counter = 1
if (len(infiles) > 0):
    p = Path(infiles[0])
    outfile = str(p.parent) + "\\newfile.xlsx"
    print(outfile)
    writer = pd.ExcelWriter(outfile, engine='xlsxwriter')

for infile in infiles:
    p = Path(infile)
    if (p.suffix == ".csv"):
        inp=pd.read_csv(infile, sep = ",")
    elif (p.suffix in [".xlsx", ".xls"]):
        inp=pd.read_excel(infile, dype = str)

    counter = counter + 1
    sheetname = p.name
    sheetname = sheetname.replace(p.suffix, "")
    inp.to_excel(writer, index = False, sheet_name = sheetname[0:30])

try:
    writer.save()
except NameError:
    print("No file selected")

Converting CSV to Table rows in SQL Server and Oracle

Another useful block of code that will help in converting CSV to a table row.

with temp as
(
    select 108 Id, 'australia, australia, united states of america, iran' str
    UNION ALL
    select 109, 'india, iran, iraq'
    UNION ALL
    SELECT 109, 'israel, australia'
), xmlData as
(
     SELECT Id,  
         CAST ('<M>' + REPLACE(str, ',', '</M><M>') + '</M>' AS XML) AS Data  
     FROM  temp
)
SELECT xmlData.Id,  
     ltrim(rtrim(Split.a.value('.', 'VARCHAR(100)'))) AS Data  
FROM  xmlData CROSS APPLY Data.nodes ('/M') AS Split(a)

and in Oracle,

with temp as
(
    select 108 Id, 'australia, australia, united states of america, iran' str  from dual
    UNION ALL
    select 109, 'india, iran, iraq' from dual
    UNION ALL
    SELECT 109, 'israel, australia' FROM dual
)
SELECT DISTINCT ID, ltrim(regexp_substr(str, '[^' || ',' || ']+', 1, LEVEL)) str
FROM temp T
CONNECT BY regexp_substr(str, '[^' || ',' || ']+', 1, LEVEL) IS NOT NULL
ORDER BY ID;

Preparing NULL counts for SQL and Oracle data tables

I am posting this one after a very long time. Just thought this information will be useful for someone.

Here is the T-SQL for SQL Server that will fetch the columns in a table and provide you the NULL, ZERO, NEGATIVE and UNIQUE count of each table columns.

USE <database>
GO

BEGIN
	SET NOCOUNT ON
	DECLARE @TAB NVARCHAR(100) = <table_name>
	DECLARE @SQL NVARCHAR(500)
	DECLARE @RECORDS INT, @TOTRECORDS INT
	DECLARE @OBJECT_ID INT,
		@TOTAL INT
	DECLARE @COL NVARCHAR(100),
		@COLTYPE NVARCHAR(100),
		@TS NVARCHAR(100)
	SELECT @OBJECT_ID = OBJECT_ID FROM SYS.TABLES WHERE NAME = @TAB

	DECLARE col_cursor CURSOR FOR 
	SELECT	COLUMNS.NAME COLNAME, SYSTYPES.NAME COLTYPE
	FROM SYS.COLUMNS 
		INNER JOIN SYS.systypes ON SYSTEM_TYPE_ID = XUSERTYPE
		WHERE object_id = @OBJECT_ID;

	OPEN col_cursor
	FETCH NEXT FROM col_cursor INTO @COL, @COLTYPE

	set @sql = N'SELECT @ptotrecords = count(1) FROM ' + @TAB 
	EXECUTE sp_executeSQL
		@sql,
        N'@ptotrecords INT OUTPUT', @ptotrecords=@totrecords OUTPUT

    WHILE @@FETCH_STATUS = 0
    BEGIN
		set @sql = N'SELECT @precords = count(1) FROM ' + @TAB + ' WHERE ' + @COL + ' IS NULL'
		EXECUTE sp_executeSQL
			@sql,
            N'@precords INT OUTPUT', @precords=@records OUTPUT
		PRINT(convert(varchar, CURRENT_TIMESTAMP) + CHAR(9)+ @TAB + '.' + @COL + '[' + @COLTYPE + ']' + CHAR(9) + 'NULL' + CHAR(9) + convert(varchar, @records) + CHAR(9) + convert(varchar, @totrecords)+ CHAR(9) + convert(varchar, @records*1.00/@totrecords*100))

		set @sql = N'SELECT @precords = count(distinct ' + @COL + ') FROM ' + @TAB  + ' WHERE ' + @COL + ' IS NOT NULL'
		EXECUTE sp_executeSQL
			@sql,
            N'@precords INT OUTPUT', @precords=@records OUTPUT
		PRINT(convert(varchar, CURRENT_TIMESTAMP) + CHAR(9)+ @TAB + '.' + @COL + '[' + @COLTYPE + ']' + CHAR(9) + 'UNIQUE' + CHAR(9) + convert(varchar, @records) + CHAR(9) + convert(varchar, @totrecords)+ CHAR(9) + convert(varchar, @records*1.00/@totrecords*100))

		IF @COLTYPE IN ('MONEY', 'NUMERIC', 'INT') 
		BEGIN
			set @sql = N'SELECT @precords = count(1) FROM ' + @TAB + ' WHERE ' + @COL + ' = 0'
			EXECUTE sp_executeSQL
				@sql,
				N'@precords INT OUTPUT', @precords=@records OUTPUT
			PRINT(convert(varchar, CURRENT_TIMESTAMP) + CHAR(9)+ @TAB + '.' + @COL + '[' + @COLTYPE + ']' + CHAR(9) + 'ZERO' + CHAR(9) + convert(varchar, @records) + CHAR(9) + convert(varchar, @totrecords)+ CHAR(9) + convert(varchar, @records*1.00/@totrecords*100))
			set @sql = N'SELECT @precords = count(1) FROM ' + @TAB + ' WHERE ' + @COL + ' < 0'
			EXECUTE sp_executeSQL
				@sql,
				N'@precords INT OUTPUT', @precords=@records OUTPUT
			PRINT(convert(varchar, CURRENT_TIMESTAMP) + CHAR(9)+ @TAB + '.' + @COL + '[' + @COLTYPE + ']' + CHAR(9) + 'NEGATIVE' + CHAR(9) + convert(varchar, @records) + CHAR(9) + convert(varchar, @totrecords)+ CHAR(9) + convert(varchar, @records*1.00/@totrecords*100))
		END
		FETCH NEXT FROM col_cursor INTO @COL, @COLTYPE
    END
	CLOSE col_cursor;
	DEALLOCATE col_cursor;
END

PL/SQL code below does a similar job in Oracle.

SET SERVEROUTPUT ON SIZE 1000000
DECLARE
   L_ROW_COUNT                        PLS_INTEGER;
   L_TOTAL_COUNT                        PLS_INTEGER;
   COL  VARCHAR(200);
BEGIN
  FOR Y IN (SELECT * FROM ALL_TABLES WHERE TABLESPACE_NAME = '<table space name>' AND TABLE_NAME LIKE '<table name>%'  ORDER BY OWNER, TABLE_NAME ) LOOP
    EXECUTE IMMEDIATE    'SELECT COUNT(*)FROM ' || Y.OWNER || '.' || Y.TABLE_NAME INTO L_TOTAL_COUNT;
     FOR x IN (SELECT * FROM all_tab_columns WHERE table_name = Y.TABLE_NAME and owner = Y.OWNER ORDER BY  column_id ) LOOP
        COL := X.TABLE_NAME || '.' || X.COLUMN_NAME || '[' || X.DATA_TYPE || ']' || CHR(9) ;
  
        EXECUTE IMMEDIATE    'SELECT COUNT(*) FROM ' || X.OWNER || '.' || X.TABLE_NAME || ' WHERE ' || X.TABLE_NAME || '.' || X.COLUMN_NAME || ' IS NULL' INTO L_ROW_COUNT;
        DBMS_OUTPUT.PUT_LINE (COL || 'NULL' || CHR(9) || TO_CHAR (l_row_count, 'fm999,999,999,990') || CHR(9) || TO_CHAR (L_TOTAL_COUNT, 'fm999,999,999,990') || CHR(9) || TO_CHAR (L_ROW_COUNT*1.0/L_TOTAL_COUNT*100.0, 'fm999,999,999,990'));
  
        EXECUTE IMMEDIATE    'SELECT COUNT(DISTINCT ' || X.TABLE_NAME || '.' || X.COLUMN_NAME || ') FROM ' || X.OWNER || '.' || X.TABLE_NAME INTO L_ROW_COUNT;
        DBMS_OUTPUT.PUT_LINE (COL || 'UNIQUE' || CHR(9) || TO_CHAR (L_ROW_COUNT, 'fm999,999,999,990')|| CHR(9) || TO_CHAR (L_TOTAL_COUNT, 'fm999,999,999,990') || CHR(9) || TO_CHAR (L_ROW_COUNT*1.0/L_TOTAL_COUNT*100.0, 'fm999,999,999,990'));
        IF X.DATA_TYPE IN ( 'NUMBER') THEN
          EXECUTE IMMEDIATE    'SELECT COUNT(*) FROM ' || X.OWNER || '.' || X.TABLE_NAME || ' WHERE ' || X.TABLE_NAME || '.' || X.COLUMN_NAME || ' = 0' INTO L_ROW_COUNT;
          DBMS_OUTPUT.PUT_LINE (COL || 'ZERO' || CHR(9) || TO_CHAR (L_ROW_COUNT, 'fm999,999,999,990')|| CHR(9) || TO_CHAR (L_TOTAL_COUNT, 'fm999,999,999,990') || CHR(9) || TO_CHAR (L_ROW_COUNT*1.0/L_TOTAL_COUNT*100.0, 'fm999,999,999,990'));
          
          EXECUTE IMMEDIATE    'SELECT COUNT(*) FROM ' || X.OWNER || '.' || X.TABLE_NAME || ' WHERE ' || X.TABLE_NAME || '.' || X.COLUMN_NAME || ' < 0' INTO L_ROW_COUNT;
          DBMS_OUTPUT.PUT_LINE (COL || 'NEGATIVE' || CHR(9) || TO_CHAR (L_ROW_COUNT, 'fm999,999,999,990')|| CHR(9) || TO_CHAR (L_TOTAL_COUNT, 'fm999,999,999,990') || CHR(9) || TO_CHAR (L_ROW_COUNT*1.0/L_TOTAL_COUNT*100.0, 'fm999,999,999,990'));
        END IF;
     END LOOP;
  END LOOP;
END;

Extracting XML Element and Attribute values in Oracle

Let’s assume that you have a XML (xmltype) table column.

<?xml version="1.0" encoding="UTF-8"?>
  <form>
     <data k="a1">
        <value>row 1</value>
     </data>
     <data k="a2">
       <value>row 2</value>
     </data>
</form>

One way to easily extract data is given below

Select Xmltype(t.Col).Extract('//form/data[@k="a1"]/value/text()').Getstringval()  a1
From Mytable t

and reading an attribute value can be like this:

Select Xmltype(t.Col).Extract('//form/data[1]/@k').Getstringval()  k
From Mytable t

A Quick and Dirty Algorithm for fuzzy matching

The VBA program below is an easy-to-use Excel function that can be used to quickly check two strings through a fuzzy logic.  I created this program to locate some phonetic equivalents in Tamil.

Input 1 Input 2 Fuzzy 1 Fuzzy 2 Result
Madrass Madras mrs mrs Match
Rangoon Yangooon rnkn nkn Not a Match
Thiruvizhaa Thiruvila rvl rvl Match
Aan Azhagan Aanazhagan anlkn anlkn Match

Feel free to apply your language equivalents that you think are relevant.

Option Explicit

'FuzzyLogic
Function Cleanup(Text As String) As String
    Dim firstChar As String
    firstChar = Mid(Text, 1, 1)
    Text = Mid(Text, 2) 'keep the first char
    Text = LCase(Text) 'change the case
    Text = RemoveSplChars(Text) 'remove spl characters
    Text = RemoveVowels(Text) 'remove vowels
    Text = ReplaceLangEquivalents(Text) 'replace lang equivalents
    Text = RemoveRepeatingChars(Text) ' remove repeating  (or duplicate) chars
    Text = LCase(firstChar & Text) 'include the first char

    Text = ReplaceLangEquivalents(Text) 'replace lang equivalents once more

    Cleanup = Text
End Function

Private Function RemoveSplChars(Text As String) As String
    Text = Replace(Text, ".", "")
    Text = Replace(Text, " ", "")
    RemoveSplChars = Text
End Function

Private Function RemoveVowels(Text As String) As String
    Text = Replace(Text, "a", "", , , vbTextCompare)
    Text = Replace(Text, "e", "", , , vbTextCompare)
    Text = Replace(Text, "i", "", , , vbTextCompare)
    Text = Replace(Text, "o", "", , , vbTextCompare)
    Text = Replace(Text, "u", "", , , vbTextCompare)
    RemoveVowels = Text
End Function

Private Function RemoveRepeatingChars(Text As String) As String
    Dim i As Integer
x:
    'remove 2 chars
    For i = 1 To Len(Text) - 2
        If Mid(Text, i, 2) = Mid(Text, i + 2, 2) Then
            Text = Replace(Text, Mid(Text, i, 2) & Mid(Text, i + 2, 2), Mid(Text, i, 2), , , vbTextCompare)
            GoTo x:
        End If
    Next

y:
    'remove 1 chars
    For i = 1 To Len(Text) - 1
        If Mid(Text, i, 1) = Mid(Text, i + 1, 1) Then
            Text = Replace(Text, Mid(Text, i, 1) & Mid(Text, i + 1, 1), Mid(Text, i, 1), , , vbTextCompare)
            GoTo y:
        End If
    Next
    RemoveRepeatingChars = Text
End Function

Private Function ReplaceLangEquivalents(Text As String, Optional Lang As String = "ta") As String
    Select Case (Lang)
        Case "ta":

            Text = Replace(Text, "gh", "g", , , vbTextCompare)
            Text = Replace(Text, "th", "d", , , vbTextCompare)
            Text = Replace(Text, "dh", "d", , , vbTextCompare)
            Text = Replace(Text, "zh", "l", , , vbTextCompare)
            Text = Replace(Text, "sh", "s", , , vbTextCompare)
            Text = Replace(Text, "dr", "r", , , vbTextCompare)

            Text = Replace(Text, "g", "k", , , vbTextCompare) 'sakodhara, sagodhara
            Text = Replace(Text, "y", "", , , vbTextCompare) 'koyil, kovil, koil
    End Select
    ReplaceLangEquivalents = Text
End Function

Comparing Excel Worksheets using VBA

My wife asked me if I could help her with a macro that will compare two Excel Worksheets to generate a variance or a diff report. Here it is;

Sub CompareSheets(Sheet1 As Worksheet, Sheet2 As Worksheet, RangeToCompare As String, SkipFirstRow As Boolean)
    Dim i As Integer, j As Integer
    Dim val1, val2
    Dim ResultSheet As Worksheet
    Set ResultSheet = Application.Sheets.Add()
    ResultSheet.Name = "ResultSheet" & Format(DateTime.Date, "yyyymmdd") & "-" & Format(DateTime.Time, "hhmmss")
    For i = 1 To Sheet1.UsedRange.Rows.Count
        For j = 1 To Sheet1.UsedRange.Columns.Count
            If j = 1 Or (SkipFirstRow = True And i = 1) Then
                ResultSheet.Cells(i, j) = Sheet1.Cells(i, j)
            Else
                val1 = Sheet1.Cells(i, j)
                val2 = Application.WorksheetFunction.VLookup(Sheet1.Cells(i, 1), Sheet2.Range(RangeToCompare), j, False)
                ResultSheet.Cells(i, j) = (val1 = val2)
            End If

        Next j
    Next i
End Sub

This Sub-routine can be simply called like this:

Sub Test()
    CompareSheets Application.Sheets(2), Application.Sheets(3), "A:E", True
End Sub

Using R

I have recently started using R instead of Excel for performing statistical analysis. Let me start with a simple example so that you can compare it with other tools that you may be already using.

mydata = read.table("input_file.txt", TRUE) #imports an input file to a variable

In this example I am reading a very large fixed-format file (with column names as the first line), so to simply display the few records, you can run

head(mydata)

Say if you simply want to transform this fixed-format file to a CSV file all you need to do is something like this:

write.table(mydata, file.path(getwd(), "test.csv"), row.names = TRUE, sep = ",", quote = TRUE)

The tool is lightning fast, and the console is simple. Moreover it supports multiple statistical functions to generate graphical bars and pie charts. To learn more about R, go to http://www.r-project.org/

Levenshtein and Damerau–Levenshtein edit distance

I have put the functions that can be embedded as excel functions to calculate the distance between two strings using the algorithms Levenshtein and Damerau–Levenshtein. The four functions available are:

  • Levenshtein – returns the edit distance between two words
  • Damerau – returns the edit distance between two words (a variant of Levenshtein with Transposition)
  • NormalizedLevenshtein – returns Levenshtein score between 0 and 1
  • NormalizedDamerau – returns Damerau-Levenshtein score between 0 and 1
Function Levenshtein(ByVal String1 As String, ByVal String2 As String) As Long

    Dim i As Long, j As Long
    Dim lngString1 As Long
    Dim lngString2 As Long
    Dim Distance() As Long

    lngString1 = Len(String1)
    lngString2 = Len(String2)
    ReDim Distance(lngString1, lngString2)

    For i = 0 To lngString1
        Distance(i, 0) = i
    Next

    For j = 0 To lngString2
        Distance(0, j) = j
    Next

    For i = 1 To lngString1
        For j = 1 To lngString2
            If Asc(Mid$(String1, i, 1)) = Asc(Mid$(String2, j, 1)) Then
                Distance(i, j) = Distance(i - 1, j - 1)
            Else
                Distance(i, j) = Application.WorksheetFunction.Min _
                (Distance(i - 1, j) + 1, _
                 Distance(i, j - 1) + 1, _
                 Distance(i - 1, j - 1) + 1)
            End If

        Next
    Next

    Levenshtein = Distance(lngString1, lngString2)

End Function

Function Damerau(String1 As String, String2 As String) As Long
    Dim i As Long, j As Long
    Dim lngString1 As Long
    Dim lngString2 As Long
    Dim Distance() As Long
    Dim cost As Integer

    lngString1 = Len(String1)
    lngString2 = Len(String2)
    ReDim Distance(lngString1 + 1, lngString2 + 1)

    For i = 0 To lngString1
        Distance(i, 0) = i
    Next

    For j = 0 To lngString2
        Distance(0, j) = j
    Next

    For i = 1 To lngString1
        For j = 1 To lngString2
            If Asc(Mid$(String1, i, 1)) = Asc(Mid$(String2, j, 1)) Then
                cost = 0
            Else
                cost = 1
            End If

            Distance(i, j) = Application.WorksheetFunction.Min _
            (Distance(i - 1, j) + 1, _
             Distance(i, j - 1) + 1, _
             Distance(i - 1, j - 1) + cost)

            If (i > 1 And j > 1) Then
                If (Asc(Mid$(String1, i, 1)) = Asc(Mid$(String2, j - 1, 1)) And Asc(Mid$(String1, i - 1, 1)) = Asc(Mid$(String2, j, 1))) Then
                Distance(i, j) = Application.WorksheetFunction.Min _
                    (Distance(i, j), _
                     Distance(i - 2, j - 2) + cost)
                End If
            End If

        Next
    Next

    Damerau = Distance(lngString1, lngString2)

End Function

Function NormalizedDamerau(ByVal String1 As String, ByVal String2 As String) As Double
    Dim lngDamerau
    lngDamerau = Damerau(String1, String2)
    NormalizedDamerau = 1 - lngDamerau / IIf((Len(String1) + 1 > Len(String2) + 1), Len(String1) + 1, Len(String2) + 1)
End Function

Function NormalizedLevenshtein(ByVal String1 As String, ByVal String2 As String) As Double
    Dim lngLevenshtein
    lngLevenshtein = Levenshtein(String1, String2)
    NormalizedLevenshtein = 1 - lngLevenshtein / IIf((Len(String1) + 1 > Len(String2) + 1), Len(String1) + 1, Len(String2) + 1)
End Function

Splitting large files using VBScript

Ever thought that you wanted to split a large multi-gigabit file into multiple chunks. Here is a simple VB script file that will do the trick for you. It may take a longer time for larger files, but it will serve its purpose.

''This simple VBScript spilts large text files into multiple files

Dim  Counter
Const InputFile = "C:\input.txt"
Const OutputFile = "C:\output"
Const RecordSize = 200000
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile (InputFile, ForReading)
Counter = 0
FileCounter = 0
Set objOutTextFile = Nothing

Do Until objTextFile.AtEndOfStream
	if Counter = 0 Or Counter = RecordSize Then
		Counter = 0
		FileCounter = FileCounter + 1
		if Not objOutTextFile is Nothing then objOutTextFile.Close
		Set objOutTextFile = objFSO.OpenTextFile( OutputFile & "_" & FileCounter & ".csv", ForWriting, True)
	end if
	strNextLine = objTextFile.Readline
	objOutTextFile.WriteLine(strNextLine)
	Counter = Counter + 1
Loop
objTextFile.Close
objOutTextFile.Close
Msgbox "Split process complete"

Constants that may require changes are InputFile-the input file name, OutputFile-the output file path along with the output file prefix and Recordsize-number of line in each output file. Multiple files beginning with [OutputFile] will get created.