Category Archives: R

Creating a Table of Word Frequencies in R

Function word_freq_table() takes a text and returns a table with the words within the text and the number of occurrences of that word. A list of optional attributes are also available to cleanse the resulting list of words:

  • to_upper: a logical parameter to inform if the text should be converted to upper case.
  • remove_punct: informs the punctuation/special characters to be removed.
  • remove_numbers: a logical parameter to inform if the numbers/letters 0 through 9 must be removed.
  • replace_CR: informs if a “carriage return” character must be replaced with something else.
  • replace_LF: informs if a “line feed” characters must be replaced with something else.
  • remove_repetitive_space: a logical parameter to inform if repetitive blank spaces must be replaced with a single blank space.
  • wordlength_atleast: informs the minimum length of words to be listed in the resulting table.

Here is the code block to call the function.

> word_freq_table(txt, replace_LF = " ", wordlength_atleast = 5)   %>%
+   head()
# Words Total: 228, Unique: 202
# String Length Minimum: 5, Maximum: 14
          Word Freq
2    ACCORDING    1
3        AFTER    2
4      AGAINST    1
5    AGREEMENT    1
6    APPOINTED    2

The function is listed below.

word_freq_table <-
           to_upper = TRUE,
           remove_punct = "[[:punct:]]",
           remove_numbers = TRUE,
           replace_CR = "\r",
           replace_LF = "\n",
           remove_repetitive_space = TRUE,
           wordlength_atleast = 1) {
    if (to_upper)
      txt <- stringr::str_to_upper(txt)
    if (remove_punct != "")
      txt <- stringr::str_replace_all(txt, remove_punct, "")
    if (remove_numbers)
      txt <- stringr::str_replace_all(txt, "[0-9]", "")
    txt <- stringr::str_replace_all(txt, "[\r]", replace_CR)
    txt <- stringr::str_replace_all(txt, "[\n]", replace_LF)
    if (remove_repetitive_space)
      txt <- stringr::str_replace_all(txt, "[ ]{2,}", " ")
    Word <- stringr::str_split(txt, " ")
    freq <- table(Word)
    freq <-
    freq <- dplyr::filter(freq, stringr::str_length(Word) >= wordlength_atleast)
    cat(sprintf("# Words Total: %s, Unique: %s\r\n", sum(freq$Freq), nrow(freq)))
      "# String Length Minimum: %s, Maximum: %s\r\n",

Graphs with GrViz through R-DiagrammeR

It is very easy to draw flowcharts or graphs through DiagrammeR. Below is a quick snippet to encourage you to look for more about GrViz and DiagrammeR.


digraph nicegraph {
graph[rankdir = TB]
node [fontname = Helvetica,
shape = rectangle, fixedsize = true, width = 2]
node [fillcolor = 'YellowGreen', style = filled]
A[label = 'ABC']
D[label = 'XYZ']

node [fillcolor = 'Orange', style = filled]
B[label = 'ABC']
C[label = 'ABD']
E[label = 'DEF']
F[label = 'MNO']

edge [arrowhead = vee]

Using R

I have recently started using R instead of Excel for performing statistical analysis. Let me start with a simple example so that you can compare it with other tools that you may be already using.

mydata = read.table("input_file.txt", TRUE) #imports an input file to a variable

In this example I am reading a very large fixed-format file (with column names as the first line), so to simply display the few records, you can run


Say if you simply want to transform this fixed-format file to a CSV file all you need to do is something like this:

write.table(mydata, file.path(getwd(), "test.csv"), row.names = TRUE, sep = ",", quote = TRUE)

The tool is lightning fast, and the console is simple. Moreover it supports multiple statistical functions to generate graphical bars and pie charts. To learn more about R, go to