As Gauker (1994) puts it, there are two ways that a person can talk to themselves: out loud and in silence. In the context of inner speech, psychologists use the terms egocentric and/or private speech to refer to the speech heard when a person is talking to themselves. This can be distinguished from normal speech because egocentric speech is usually only meant to be heard by the speaker as a set of instructions or a way to keep focus. Talking in silence, on the other hand, is not so easy to define. Since inner speech is such an abstract process, many people disagree on what can be classified as inner speech.
One way to define inner speech is that it is the articulation of words without sound. Almost as if somebody had taken a TV remote and muted your voice. In a study on articulation of silent speech Oppenheim and Dell (2010) wrote, “...we have demonstrated that articulation changes inner speech, and this demonstration implies that inner speech cannot be independent of the movements that a person would use to express it” (p. 1158). This is not entirely false. You can have silent articulation of words, but as a result of inner speech, it should not be considered inner speech itself. This silent articulation of words is what we perceive as that vivid voice we “hear” in our heads. This vivid voice, or inner voice, is a form of verbal imagery (Gauker, 1994). Verbal imagery is the way we observe our own inner speech. Another belief is that inner speech is the same as thought. It is important to know that thought and inner speech are not one in the same. Inner speech is more like the carrier of thought (Solokov, 1995), just as the silent articulation of words can sometimes be the carrier of inner speech.
To further understand the distinction between silent articulation and inner speech, below (Figure 1) is Brown's model of the levels of speech. Brown (2010) says that inner speech involves incomplete perception and incomplete articulation. This fits with our previous definition of inner speech as a carrier of thought because there would not be a need for articulation and since the speech is being produced in the mind, there would be less activation of perceptual areas than if it had been said by another speaker.










