ramlicious Blogs by Tina & Prabhu

March 4, 2018

Mass Shooting in the USA

Filed under: News,R — Prabhuram @ 8:52 pm

January 20, 2018

Creating a Table of Word Frequencies in R

Filed under: R — Prabhuram @ 1:35 pm

Function word_freq_table() takes a text and returns a table with the words within the text and the number of occurrences of that word. A list of optional attributes are also available to cleanse the resulting list of words:

  • to_upper: a logical parameter to inform if the text should be converted to upper case.
  • remove_punct: informs the punctuation/special characters to be removed.
  • remove_numbers: a logical parameter to inform if the numbers/letters 0 through 9 must be removed.
  • replace_CR: informs if a “carriage return” character must be replaced with something else.
  • replace_LF: informs if a “line feed” characters must be replaced with something else.
  • remove_repetitive_space: a logical parameter to inform if repetitive blank spaces must be replaced with a single blank space.
  • wordlength_atleast: informs the minimum length of words to be listed in the resulting table.

Here is the code block to call the function.

> word_freq_table(txt, replace_LF = " ", wordlength_atleast = 5)   %>%
+   head()
# Words Total: 228, Unique: 202
# String Length Minimum: 5, Maximum: 14
          Word Freq
1 ACCELERATION    1
2    ACCORDING    1
3        AFTER    2
4      AGAINST    1
5    AGREEMENT    1
6    APPOINTED    2

The function is listed below.

word_freq_table <-
  function(txt,
           to_upper = TRUE,
           remove_punct = "[[:punct:]]",
           remove_numbers = TRUE,
           replace_CR = "\r",
           replace_LF = "\n",
           remove_repetitive_space = TRUE,
           wordlength_atleast = 1) {
    if (to_upper)
      txt <- stringr::str_to_upper(txt)
    if (remove_punct != "")
      txt <- stringr::str_replace_all(txt, remove_punct, "")
    if (remove_numbers)
      txt <- stringr::str_replace_all(txt, "[0-9]", "")
    txt <- stringr::str_replace_all(txt, "[\r]", replace_CR)
    txt <- stringr::str_replace_all(txt, "[\n]", replace_LF)
    if (remove_repetitive_space)
      txt <- stringr::str_replace_all(txt, "[ ]{2,}", " ")
    
    Word <- stringr::str_split(txt, " ")
    freq <- table(Word)
    freq <- as.data.frame(freq)
    
    freq <- dplyr::filter(freq, stringr::str_length(Word) >= wordlength_atleast)
    cat(sprintf("# Words Total: %s, Unique: %s\r\n", sum(freq$Freq), nrow(freq)))
    cat(sprintf(
      "# String Length Minimum: %s, Maximum: %s\r\n",
      min(stringr::str_length(freq$Word)),
      max(stringr::str_length(freq$Word))
    ))
    freq
  }

December 20, 2017

Graphs with GrViz through R-DiagrammeR

Filed under: R — Prabhuram @ 5:11 pm

It is very easy to draw flowcharts or graphs through DiagrammeR. Below is a quick snippet to encourage you to look for more about GrViz and DiagrammeR.


library(DiagrammeR)

grViz("
digraph nicegraph {
graph[rankdir = TB]
node [fontname = Helvetica,
shape = rectangle, fixedsize = true, width = 2]
node [fillcolor = 'YellowGreen', style = filled]
A[label = 'ABC']
D[label = 'XYZ']

node [fillcolor = 'Orange', style = filled]
B[label = 'ABC']
C[label = 'ABD']
E[label = 'DEF']
F[label = 'MNO']

edge [arrowhead = vee]
A->B;A->C;D->E;D->F
}
")

May 28, 2013

Using R

Filed under: R — Prabhuram @ 4:27 pm

I have recently started using R instead of Excel for performing statistical analysis. Let me start with a simple example so that you can compare it with other tools that you may be already using.

mydata = read.table("input_file.txt", TRUE) #imports an input file to a variable

In this example I am reading a very large fixed-format file (with column names as the first line), so to simply display the few records, you can run

head(mydata)

Say if you simply want to transform this fixed-format file to a CSV file all you need to do is something like this:

write.table(mydata, file.path(getwd(), "test.csv"), row.names = TRUE, sep = ",", quote = TRUE)

The tool is lightning fast, and the console is simple. Moreover it supports multiple statistical functions to generate graphical bars and pie charts. To learn more about R, go to http://www.r-project.org/

Powered by WordPress