A Corpus based Research Study Of Novel The Tragedy of Hamlet, Prince of Denmark

Corpora have been put to many different uses in fields as varied as natural language processing, critical discourse analysis and applied linguistics, to mention just a few. It is an area of studies based on the simultaneous analyses of texts. The aim of corpus linguistics is to search linguistic patterns or language units like keywords, lexemes, phrase-logical patterns, grammatical associations, etc. The most eminent way of studying phrases in Corpus Linguistic Approach is collocation. Collocation is the statistical tendency of words to co-occur. For example, the words big, good and great are collocations of deal as a noun, and/or a great deal.

In contrast, software that presents concordance lines simply identifies the target item (usually a word or phrase) each time it occurs in the corpus and presents each instance, or as many as are required, to the corpus user. Usually this is done with the target item in the center of the screen and a few words to the left and right of that item. This ‘key word in context’ presentation, as it is known, has a number of uses. Even the small amount of context is usually enough to show what the word or phrase means, what phrases it often occurs in, and/or the discourse function that it has. Quantitative information about word meaning and function that is not available automatically can therefore be calculated.

The character of a corpus is determined by the type of texts that constitute it. Whereas Dr. Johnson’s corpus consisted largely of works by Shakespeare, Milton, Dryden and other literary figures, a modern general corpus will contain both written and transcribed spoken material from a wide range of media such as: Books, Magazines, Newspapers, Emails, Television, Radio, and Conversations. To this end, I apply corpus to analysis of Shakespeare’s Hamlet. The research questions are:

  • What lexical patterns are characteristics of Shakespeare’s play?
  • What textual meaning do those patterns suggest?
  • What are strengths and limitations of the corpus approach in the study of fictional prose?

In the following sections I first give a brief introduction, afterwards an account of the methodology adopted in this study, comprising explanations about corpus descriptive tools and data preparation. Then, the results of the study are reported, followed by a discussion on the strengths and limitations of the corpus-driven approach to literary texts, as observed from a corpus of Shakespeare’s play Hamlet.


The study begins with an observation that a corpus linguistic technique has been adopted infrequently in a stylistic analysis of literary texts. A corpus-driven approach was applied to an analysis of Shakespeare’s play; Hamlet in order to see how well this method works with literary texts. The limitations of automatic analysis of texts should always be recognized; after all, computers will still only do what humans tell them to do.


  • Descriptive Tools:

There are three corpus descriptive linguistic tools that are used in this study to explore stylistic features and their textual functions in Shakespeare’s play The Tragedy of Hamlet, Prince of Denmark. The tools are: “keyness”, “concordances”, and “collocations”.

  • Keyness:

To answer the research questions stated above, the concept of “keyword” in corpus linguistics is a starting point in the analytical procedures. According to Scott and Tribble (2006), keywords are lexical items of significance to a text in question, because of their “unusual frequency in comparison with a reference corpus of some suitable kind”. The “unusual frequency” here refers to both “unusually high” and “unusually low” frequency. For the purpose of the present study, only items with “unusually high” frequency are considered.

Keywords are not necessarily the most frequent words found in a text. Keywords are important to the text because they are used “unusually often” when compared with other texts. To illustrate, if we consider a corpus of Shakespeare’s Hamlet alone, or even all of Shakespeare’s works, the definite article “the” is found to be the most frequent word in Shakespeare’s plays. But we cannot say that it marks Shakespeare’s writing style because “the” is an article, it is likely to be used very often in any piece of writing, not just in Shakespeare’s plays. Therefore, we need to compare Shakespeare’s works with other authors’ so that we can see if he really used “the” significantly more than others. And when compared with other novelists’ writing, the article does not turn up as a keyword in Shakespeare’s works because it is also used frequently by other authors. On theother hand, the word “blood”, whose frequency is lower than that of “the”, appears in the keyword list of Shakespeare’s Hamlet. This means that Shakespeare used the word “blood” significantly more often than other writers. Therefore, as Baker (2006) puts it, a keyword list gives a measure of saliency, not just frequency, of the lexical items in a text and hence can suggest further examination of their textual functions. This is the reason why keywords are fundamental to the corpus-stylistic analysis of Shakespeare’s play Hamlet in the present study: through the Keyword function in Rayson’s (2007) Wmatrix Tools (see below), lexical items that are characteristic of Hamlet are extracted, some of which will be further investigated in detail.

According to Scott and Tribble (2006), three kinds of words usually come out of a comparison as keywords: proper nouns, words that “human beings would recognize” as key, which tend to indicate a text’s “aboutness”, and words that are not usually identified consciously by readers as key but nonetheless occur in significantly high frequencies and so can be indicators of the style of a text, rather than of its content.

  1. Collocations:

After keywords are extracted, their significance in the six major novels needs to be explained. To this end, the concept of collocation, defined by Hoey (1991: 67) as “the relationship a lexical item has with items that appear with greater than random probability in its (textual) context”, is drawn upon. This definition emphasizes that collocation of a word is not just a random co-occurrence of words, e.g. “she + is”, but the co-occurrence takes place in a text for some reason, as seen from the phrase “with greater than random probability in its (textual) context”. For example, as shown by Stubbs (2001: 28), common collocations of the word “seek” include “help”, “advice” and “support”. An examination of the collocational patterns of a word in a text can therefore allow us to see the relationship between lexical items in a text, which in turn enables us to see the way words are used to create meanings in a text. To find out what keywords are used “with greater than random probability” in Hamlet, a computer-assisted extraction of collocates, through the statistical measure Mutual Information (MI), is adopted.


Table 1: Key Semantic Fields in Shakespeare’s Hamlet


Key Semantic Field

Sample Words In The Semantic Field


Degree boosters

so, more, most, very, much



father, mother, brother



dead, blood, murder, sword, poison



my lord, majesty, court



love, heaven, God, ghost, madness, hell, alas


Strong obligation or necessity

Should, must, have to



Table 2: Key Grammatical Categories in Hamlet


Key Grammatical Categories

Sample Words In The Grammatical Categories


Degree adverb

so, more, most, very, much


Be – infinitive



Modal auxiliary

should, would, could, might


Third person singular objective personal pronoun

he, him, her


Have – infinitive



  1. Concordance:

Concordance is an alphabetical list of all the words used in a book or set of books, with information about where they can be found and usually about how they are used. In other words it can be said that concordance deals with the use of different words of a literary work in all the contexts they’re used in.to find concordances in Shakespeare’s play Hamlet, we searched through our discourse data for all the instances of a word, and then the surrounding context for each instance that was found. The result was called a key-word-in context concordance. We type in a certain word (the key word), and in a few seconds we have a list of neatly lined up examples of that word as found in the data. Then we sorted the list of words in a variety of ways, in order to get a clearer picture of how it relates to its context. No matter how we sorted the words, their context remained available.

Table 3: Concordance in Shakespeare’s playHamlet



No. of hits




















Interpretations of Overall Findings

After making tables of key semantic field and key grammatical field it is observed that number of fields overlap with each other whether it is same category or across different linguistic groups. Keywords “so”, “more”, “most” are parts of key grammatical field “degree adverb” as well as key semantic field “degree boosters”. This shows that these overlapping items are special category of Shakespeare’s play “Hamlet”.

These overlapping categories or items when put in to groups result in to following groups of key linguistic features that marks the style of Shakespeare’s play.

(1) Words related to tragedy, comprise of keywords “dead”, “sword”, “poison”, “blood” and “murder”.

They lie in key semantic field “tragedy”.

(2) Words showing family relationships, comprise of keywords “father”, “mother” and “brother” have key semantic field “kin”.

(3) Words showing royalty or noble class comprise of “my lord”, “majesty” and “court” are part of key semantic field “royalty”.

(4) Words related to high degree comprising key words “so”, “more”, “most”, “very” and “much” lie in two fields i.e. key semantic field “degree booster” and key grammatical category “degree adverb”.

Words comprising of keywords “should”, “would”, “could” and “might” are part of key grammatical category “modal auxiliary”.

On the basis of Scott and Tribble’s categorization of keywords above groups of linguistic features can be divided in to two main groups. The first group consists of lexical items that can be identified through observation and suggests the content of the text. This group comprises of Groups(3), (4) and (6) of table. The lexical items of these groups match with what has been discussed in literary criticism of Shakespeare’s play “hamlet” .for instance  this play deals with royal life that can be represented by keyness of words related to “my lord”, “majesty” and “court”  and semantic field “royalty”. This play is tragic, indicated by keywords like “dead”, “sword”, “poison”, “blood” and “murder” in semantic field “tragedy”.

Other group consists of such lexical items about which Scott and Tribble state that, they are not consciously identified by the readers as important but still their occurrence is significantly in high frequencies and so they can be marked as indicators of style of text rather than content.

Words Related To High Degree:

In the group of key linguistic features words related to high degree are most characteristic of Shakespeare’s play “Hamlet”. It  is shown by their occurance in two different linguis tic categories as

  • Key words:

“So”, “more”, “most”

  • Key semantic field :

“degree booster”

  • Key grammatical category:

“degree adverb”

It is not only the density of words related to high degree but the degree of their Keynes also mark their greater significance to “Hamlet”.

After examining the concordance, lines of words in this group it is observed that words that denote a high degree are used in closed proximity to one another.

A strong density of high degree words at some case in play constitute an exaggerated discourse in “Hamlet”. This exaggeration then encourages the readers to feel that the part of text they are reading cannot be interpreted at face value.

Auxiliary “Be” And “Have”:

Auxiliaries “be” and “have” are helping verbs which generally do not express the content of the text. But in this play it seems that their role is greater than other groups. The occurrence of “be” is 222 times and hat of “have” is 183 times.

In order to find out that in what sort of textual environment these verbs are used we used clusters with frequency in parenthesis

Word clusters of “be”

To be (34)

Be , - - be (22)

be- - (24)

Word clusters of “have”

have you (15)

you have (15)

have  ? (18)

Thus corpus based approach has directed the attention to see that it is this group of lexical items that are used strategically for creating and hinting at meaning between lines in this play

Author :Center for Recent Innovations - European Union
Pulished date: 05-09-16
Springer with the Collaboration of LearnRnd
Source URL:http://www.learnrnd.com/detail.php?id=A_Corpus_based_Research_Study_Of_Novel_The_Tragedy_of_Hamlet,_Prince_of_Denmark

