Nifwl Seirff

Honours: Reading

The links following each entry are to my summary and comments on each of the papers I researched in preparation for my honours project in Computer Science at Monash University. Links that don't work means I haven't typed my summary yet.

By surname of first author:

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z













Bonar, J. and Soloway, E., Preprogramming Knowledge: A Major Source of Misconception in Novice Programmers. 1985, In Human Computer Interaction Volume 1 Number 2 pp 133-161

Natural language is often used to 'fill the gaps' in programming language knowledge, as NL knowledge is relatively complete, as opposed to novices programming language knowledge. There are many common errors translating loops from NL to PL (and so on, assume continuous testing of conditions). The reason NL is fallen back to so often is that there are many similarities between PL and NL - both can do step by step, loops, conditions, etc; they both share common words even though the use of those words is different. A common error that novices make is assuming that the program knows that an increment should be done automatically, as it is done in maths.


Brooks, R. Towards a theory of the cognitive processes in computer programming in Journal of Man-Machine Studies, 1977 volume 9 pages 737-751

Categorises method finding behaviour into 3 types:

It was observed that an experience programmer spent 21.2% of the time solving a problem in method-finding activities, 6.9% understanding the problem, and 71.9% coding.


Bruckman, A. and Edwards, E., Should We Leverage Natural-Language Knowledge? An Analysis of User Errors in a Natural-Language-Style Programming Language. 1999, CHI Papers, pp 207-214

This paper analysed an online programmable MUD MOOSE Crossing, used mostly by children. This MUD was programmable in a language called MOOSE, specifically designed to be close to English - the aim was to make the language as natural language like as possible, yet maintain a regular syntax1. (pg 208 para 2)
It was found that children as young as 7 have been able to program using this language, and they find it easy to read each other's programs. (pg 210 para 1)
This paper analysed the errors made by a random selection of the students, having categorised the errors into the following categories: syntax, guessing a command, literal interpretation of a metaphor, and operator precedence/combination. (abstract)
314 of 2970 total errors were natural language related, but, 41.1% were recovered easily, and 20.7 recovered with some difficulty. (pg 214 para 4)
The conclusion of this study that utilising pre-existing natural language knowledge was useful for programming language design for novice programmers. (pg 214 para 4)
1 Moose language designed by Amy Bruckman with guidance from Pavel Curtis, Mitchel Resnick, Brian Silverman, and assistance from MIT students Austina DeBonte, Albert Lin and Trevor Stricker.


Brusilovsky, P. Towards and Intelligent Environment for Learning Introductory Programming Editors : Lemut, E. and Du Boulay, B and Dettori, G. in book : Cognitive Models and Intelligent Environments for Learning Programming, Publisher: Springer-Verlag, 1993, pp 114 - 124

Mentions that teaching support should contain the following elements:

They developed an AI tutoring system that integrated a visual tool that can step through programs, and an interactive environment for program development, as well as an intelligent pogramming environment. He comments that this approach can be used to develop an intelligent programming environment for C, or any other language.


Brusilovsky, P. and Kouchnirenko, A. and Miller, P. and Tomek, I. Teaching programming to novices: a review of approaches and tools. Proceedings of ED-MEDIA'94 - World Conference on Educational Multimedia and Hypermedia. Vancouver, Canada, Jun 25-30, Ottman, T. and Tomek, I. (eds), pp 103-110

Defines methodologies to help teaching novice programming:

Notes that Execution of a program should visually reveal the semantics of the language constructs, and visual cues enable novices to understand semantics which help prevent the development of misconceptions.

KyMir developed to reduce wasted time by teacher and students due to trivial technical problems. Uses marginal notes by beginners to write and debug programs without I/O commands.


Conway, D. Criteria and Consideration in the Selection of a First Programming Language Technical Report no 93/192, Monash University, December 1993

For a language to be representing a paradigm, the syntax should be concise and semantics straightforward. Fo communicating ideas, the jargon should be consistent with a natural expression of ideas, and transparent enough to not bog the students down in the implementation of those ideas. When a language and environment are used for practicing and experimenting with concepts, they should be powerful to support and encourage experimentation, flexible to allow several different ways to solve a problem, robust to be bug-free or graceful in failure, intelligent to identify and assist good programming styles and simple enough for a novice to gain proficiency easily.
Compilers or editors with serious bugs complicate the learning task, as do compilers that do not implement the language completely, or overlap two languages (C and C++). Interactive programming environments are an easier introduction to programming for complete novices, but are typically not used in more advanced programming or in 'real-life'.
The choice of an introductory language relies on the importance of:


Du Boulay, B. and Matthew, I. Fatal error in pass zero: how not to confuse the novices. Journal of Behaviour and Information Technology, 1984, Vol 3. No. 2 pp109-118

References a couple of efforts to detext and address semantic errors in PASCAL programs, including their own. Prototype Error checker:


Du Boulay, B. Some Difficulties of Learning to Program. in book Studying the Novice Programmer, Editoris: Soloway, E. and Spohrer, J.C. Publishers: Lawrence Erlbaum Associates, 1989, pp 283 - 299

Propose that there are 3 main kinds of errors:

It is also commented that students often think the program/computer will do what they mean, rather than what they say. They suggest that using normal english words in a language and in fact in teaching a language when referring to the program can mislead students, encouraging them to think that the computer has the human ability of inferring what is meant from what is said.

They note that the use of compiled languages causes other problems for the novice and there are usually few debugging aids to help the diagnosis of errors, and the run-time system has little or no access to source code in order to construct useful error messages.

Assignment is difficult - one misconception A = B, means that B is linked to A, and any changes that happen to A in future, also affect B. They also mention that students don't assume that the variable will 'remember' it's previous value unless explicitly changed, or the contents of the memory erased. They note that a common mistake is not initialising counters to 0, or some other value, the analogy being that a 'box' is empty until you put something in it, so it must be 0 to begin with. Some students think that the assignment A = 8 + 3 means that A actually holds the unevaluated expression, rather than the result.

Students find arrays difficult - getting confused with rows and cells, indices and values. Loops cause beginners many troubles, FOR loops as the counter is incremented 'behind the scenes' at the end of each cycle. It is commented that a WHILE loops is often seen as generating an interrupt when it's condition becomes false, exiting the loop no matter where inside the control is.

I/O also causes problems as in most languages, the fact that there is some kind of assignment is hidden. Students often do not understand that a READ instruction will halt execution until it receives something typed by the user.


Dyck, J. and Mayer, R. BASIC versus Natural Language: Is There One Underlying Comprehension Process? CHI'85 Proceedings, Comference on Human Factors in Computing Systems, San Francisco, April, 1985

Study found evidence that the same cognitive processes are used to comprehend English and Basic procedural statements. Many of the errors made and misconceptions therefore seem to be due to the unfamiliarity of the language.


Eisenstadt, M. and Lewis, M. Errors in an Interactive Programming Environment: Causes and Cures. in Novice Programming Environments: Explorations in Human-Computer Interaction and Artificial Intelligence, Chapter 5, 1992, Editors: Eisenstadt, M. and Keane, M. and Rajan, T., Publisher: Lawrence Erlbaum Associates.

This paper discussed the types and relative frequencies of errors that novices made using a language called SOLO, that was specifically designed to be accessible to beginners. They compared the errors that resulted with the analysis of LOGO errors done by duBoulay (1979) and produced a table:

Type of ErrorLOGOSOLO
Wrong number of arguments passed1818
No line number129
Call to undefined procedure129

It was also proposed that pre-empting the above problems would reduce the errors by 43.3%, leaving a greater proportion of error free code. 1069 of 2468 problematic lines of code contained the above mentioned errors. Or another way of looking at it 6960/9428 (73.8%) of a program have trouble free lines, which would increase to 8029/9428 (85.2%) if the problems above were pre-empted.


Endres, A. An Analysis of Errors and Their Causes in System Programs in IEEE Transactions on Software Engineering Vol SE-1 No 2, June, 1975

Concludes that syntax errors comprise less than 15% of all errors in systems programs (as opposed to application programs). The study was conducted on DOS (VS) Relase 28) released to customers in middle 1973. It analysed the types of errors discovered during internal tests of the componentsi before being released. Obviously this was written by experienced programmers. Errors were classified as follows:

It was found that 46% of errors are in group A, 38% in group B. It is proposed that using better programming methods and more comprehensive testing tools could avoid about half of the errors, the other half attacked by better methods of program definitions, and understanding basic system concepts.


Gugerty, L. and Olson, G. Debugging by Skilled and Novice Programmers. in proceedings - CHI'86, April, 1986, Publishers : ACM pages 171-174

This paper analyses a study done where debugging behaviour of expert and novice programmers was analysed and compared. The programs the programmers were to debug were all syntactically correct and produced output when run. It was found that simple programs both novices and experts found most of the errors (experts 89% of the time, novices 72%), however experts found the bugs on average in 7 mins, where novices were 18.2 minutes. It was also discovered that experts found a bug on their first test 56% of the time, whereas novices only 21%. It was found that novices introduced more errors into the programs when making changes to test for bugs. only 1 expert added a bug to the simple programs in the process of fixing the program.

The more difficult Pascal program test had a similar outcome - 9/10 experts found the error on average in 14.2 minutes. 5/10 novices found the error on average in 33.1 minutes (a maxiumum of 40 minutes was allowed for this test). No expert added a bug, and 3/5 of the novices who couldn't find the error added a bug of their own.

It was proposed that experts found errors more reliably and more quickly than novices due to the ease of understanding the program and what is supposed to do.


Hsi, S. and Soloway, E. Learner Centered Design in SigCHI bulletin Volume 30, Number 4 October 1998, pp 53 - 55

States that learners need more guidance at the beginning, but these supports need to change as they build competance, and become more independent in their learning. Many issues were raised during the workshop:
Some Do's of Designing for Learning:


Joni, S.A. and Soloway, E. But My Program Runs! Discourse Rules for Novice Programmers in Journal of Educational Computing Research, Volume 2 Number 1, 1986, pages 95-128

This paper studies working programs that are badly designed and coded - which is not all that applicable to the AntiCompiler project. However a couple of comments are useful. It was seen that novice programmers incorrectly initialised variables - either by not doing so, or doing so more than once (init'ing a variable each time within a loop when it was meant to be done outside a loops is a semantic error). It is also noted that novice programmers 'merge goals' - single integrated plan to acheive more than one goal - often leading to programs with more bugs.


Levy, S.P. Computer Language Usage in CS1: Survey Results SIGCSE Bulletin, volume 27 Number 3, pp 21-26 1995

Many schools have turned to C/Scheme/Ada/C++ as a first programming language, as they are more 'real world' than languages designed for teaching (Pascal,Smalltalk). in 1994 31.8% of languages taught in CS1 was C, the most popular of all the languages looked at (ada 28.1%, C++ 12.3%, Scheme 10.5%). 16% of instructors believed that language was not important, it was simply a tool. Many courses are focussed on learning the language, rather than learning the concepts. Instructors using C and C++ feel it is important to use a language that is used in industry, ADA instructors feel it is most important to teach good software engineering concepts, and Scheme instructors preferred students to write interesting programs sooner.
A comment by one repondant to the survey summed up what many others had said: "It doesn't matter much in CS1 (which language), since students are still mastering the basics found in most languages. In CS1, the emphasis should be on problem solving, algorithm development and logical thinging; not on the intricacies or advanced features of a particular language."


Mayer, R. The Psychology of How Novices Learn Computer Programming. 1986, Studying the Novice Programmer, Baywood Publishing Co. Inc., pp 129-159

Meaningful learning instaed of rote learning is where learner connects new material with existing knowledge in memory - a process called assimilation.

      new material                            response
      --------------> Short Term Memory -------------> 
                          |        ^
		          |        |
		          |        | existing knowledge
		          |        |          |
		          |        |          |
		          |        |          |
		          |        |          |must be in here for learning to occur
		         \/        |          | or memorisation by rote will happen
		      Long Term Memory <------

Elaboration in own words can help the learner connect new material to that existing in long term memory.


Mayer, R. Cognitice Aspects of Learning and Using a Programming Language in book Interfacing Thought, Editor: Carroll, J. MIT Press, London, 1987 pp 61 - 79

Learning a new language requires syntactic knowledge and semantic knowledge, and mappings between these two kinds of knowledge. (note - see logbook for paper ref) It requires building semantic knowledge and syntactic knowledge simultaneously, consequently students may 'map' it onto their existing natural language knowledge and representations of constructs (pre-conceptions).

The tests used in this paper consist of presenting statements in BASIC and natural language. The first test concluded that success with BASIC representation was highly correlated to underlying conceptual knowledge. The second test showed that the time needed for comprehension of a BASIC program was strongly related to the complexity (number and difficulty of transactions, and number of statements in the program). The third test indicated novices that learned BASIC under standard instructional conditions (book) seemed to gain conceptions that were wrong or incomplete. Finally it was shown that instruction of the transactions (concepts) improved the conceptual knowledge of the poor students, improving their ability to use BASIC to solve problems, and showed a strong relationship between knowing the correct transactions for BASIC statements, and solving programming problems. This shows that a novice's wrong conceptions must be replaced with more useful ones.


McIver, L. The Effect of Programming Language on Error Rates of Novice Programmers. Proceedings, Twelfth Annual Meeting of the Psychology of Programming Interest Group. Corigliano Calabro, Italy, April, 2000.

Study found simpler/intuitive language may reduce semantic logic errors. Classified errors only as syntactic or semantic.


McIver, L. and Conway, D. Seven Deadly Sins of Introductory Programming Language Design. 1996, Proc. Software Engineering: Education and Practice (SE:E&P'96), University of Otago, Dunedin, NZ, pp.309-316, IEEE Computer Society

Syntax vs semantics complicates the teaching of a programming language (C) and therefore the learning. Proposes 2 things for better language design - Start where the novices are (use a language they know), provide better error diagnosis (current compilers can't find semantic errors, such a tool would improve teachability and learnability).


Mody, R. P. C in Education and Software Engineering SIGSCE Bulletin Volume 24 No 3 September 1993, pp 45 - 56

This article discusses the negative aspects of C as a language for learning programming, citing many problems such as inconsistencies, pointers, unnatural representations. He proposes that parallel programming is more natural than sequential programming. C is a medium level language, that does not support modularity, abstraction, data hiding, boolean operations, weakly typed, and unverifiable, and it is difficult to do any complex mathematical work.


Moylan, P. J. The Case Against C Technical Report EE9240, Centre for Industrial Control Science, Department of Electrical and Computer Engineering, The University of Newcastle, NSW 2308 Australia, July 1992

Modularity is about data encapsulation and information hiding, although C supports primitive modularity (different files), and uses header files to link them together. Modularity is acheiveable only if rigid rules are followed, but most programmers do not stick to them as the compiler does not enforce them.
The author has noted that even he, an experienced programmer takes twice as long to get a C program working, as an equivalent in a better designed language, due to twice the amount of time spend debugging C programs.


Murnane, J.S., The Psychology of Computer Languages For Introductory Programming Courses. 1993, New Ideas in Psychology Vol. 11 No. 2, pp 213-228

This paper discussed the history of the design of languages, and proposed 2 different models of programming language - one linguistically easy to comprehend, and another specifically designed for the field (lambda calculus).

"From late 1950's ... almost all language development has been driven by technical considerations - theories originating in Computer Science or in advances made in the application area itself." pg 213 para 1)
C was designed to enhance the utility of the language, Pascal to reduce the possibility of errors, or to pass the responsibility for solving the problem to the language (Prolog). (pg 214 para 1)
Many studies discuss natural language, as ideas or as a source of ambiguity and misunderstanding, but they don't generally consider theories of language aquisition. (pg 215 para 2)
"The readability of programs is far more important than the writability . . . It is vital that the syntax of a programming language should reflect human though patterns, rather than the more 'elegant' but much more obscure patterns of, for example, the lambda calculus." 2 (pg 216 para 3)
"... there is no reasont o accept a teaching tool (computer language) into which no particulat Educational Psychology (learning theory) has gone." (pg 217 para 2)
It is proposed that while all children can naturally and easily aquire language, only some learn to think scientifically. As children don't have to 're-learn' the grammar of their natural language, it follows that language design may benefit from being as close to their natural language as possible. (pp218 - 219)
"Many of the problems in current programming languages (where the syntax is highly restricted) are in fact semantic in origin despite the avowedly context-free grammar." (pg 220 para 2)
Students using Hypertalk 3 find it very 'natural' and insoluble problems are no where near as frequently encountered as in Logo or Logowriter. They have problems, but manage to find a way around them. (pg 221 para 1)
2 Tremblay, G.P. and Sorenson, P.G. The Theory and Practice of Compiler Writing 1985 New York: McGraw-Hill
3 HyperTalk closely resembles English, but was developed over a number of versions, and is not developed with a strong linguistic base.


Pane, J. F. and Myers, B. A., The Influence of the Psychology of Programming on a Language Design: Project Status Report. Proceedings, Twelfth Annual Meeting of the Psychology of Programming Interest Group. Corigliano Calabro, Italy, April, 2000. pp 193-205

2 studies - looking at how novice/non-programmers devise solutions to programming prblems, - how boolean pbolems are solved and interpreted. Proposed that programming languages force programmers to present solutions unnaturally. Stated that technical demands/innocations drive language design, not focussed on ease of learning/usability.
HCI principles:

are generally not applied in language design. Reiterated that loops are not represented in languages ina way that matches a user's mental model.. Found that common boolean names used in language cause errors are they are ambiguous in natural language.
Studies found novice programmers preferred:


Pane, J. F. and Ratanamahatana C. and Myers, B. A., Studying the language and structure in non-programmers' solutions to programming problems. Int. J. Human-Computer Studies (2001) 54, pp 237-264

"'conventional' programming languages require the programmer to make tremendous transformations from the intended tasks to the code design" p 239
Proposes that languages are easier to learn if the language better matches the beginners existing problem solving language. The mismatch between programming language expressing a solution (ie summing) and the natural way people think about the solution makes it difficult for novices to learn a language, even making programming tasks more difficult for experts. Found a mix of styles were used to solve problems (event-based, constraint oriented, OO, etc) so it might be good not to limit a language to a single style.


Pea, Roy D., Language-Independent Conceptual "Bugs" in Novice Programming, Journal Educational Computing Research, Volume 2(1), 1986

Identifies classes of bugs:

Considers language independent 'bugs', and shows the difficulty in give instructions to the computer. Novices work intuitively as if conversing with humans. Proposes that the prior 'bug' categories occur due to a 'super-bug' - the computer controlling a mind that can reason, look ahead, and extrapolate (probably due to how interaction in NL work). Proposes that a repeated statement that is evaluated once is odd in NL "if you want to, I'll take you" - available all the time, not just once.


McQuire, A. and Eastman, C. The Ambiguity of Negation in Natural Language Queries to Information Retrieval Systems in Journal of the American Society for Information Science, Vol 49 Number 8, 1998, pp 686-692

It was found that the higher the number of disjunction, conjunctions and prepositions used in a statement with a negation, the higher the ambiguity of the statement. ie (Books on NLP but not semantics and parsing since 1989). It was noted that negation is difficult, due to the ambiguity of which components are negated and which aren't.


Pennington, N. Comprehension Strategies in Programming in book - Empirical Studies of Programmers: Second Workshop, Publisher: Ablex Publishing Corporation, New Jersey, 1987, pp 100-112

The study analysed comprehesion of 5 aspects of a program:

It was found that the most errors were on questions about state and function aspects, fewer on data clow, and the least on control flow and operations. This study was done on experienced programmers.


Perkins, D.N., Hancock, C., Hobbs, R., Martin, F. and Simmons, R. Conditions of Learning in Novice Programmers. 1986, Studying the Novice Programmer, Baywood Publishing Co. Inc., pp 261-279

This paper presented two types of learning, some causes of problems in learning programming.
Two types of learners -

One common error is that students project their intentions into the code, not noticing that the code doesn't actually do what they think it does. (pg 271 para 2)
It was proposed that students lacking a clear mental model of the language (primitives/syntax/use etc) were unable to break a problem down. This problem was often made worse as they attempted understanding of how to implement it in the language during coding, instead of working it out in advance. (pg 275 para 3)


Soloway, E. A Cognitively-Based Methodology for Designing Languages/Environments/Methodologies, In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium of Practical Software Environments, ACM Press 1984 pp 193-196

Suggests that Expert Programmers have the following knowledge that novices dont:

Also analyses how designers design - noting the experts tend to have better information management skills - being able to remember constraints and assumptions better than novices. Proposes that the design of a language/environment/method should be a better cognitive fit between the user and the l/e/m.


Soloway, E. and Bonar, J. and Ehrlich, K. Cognitive Strategies and Looping Constructs: An Empirical Study in book Studying the Novice Programmer, Editors: Soloway, E. and Spohrer, J.C. Publisher: Lawrence Erlbaum Associates, 1989, pp 191 - 207

The study was based on a problem where a program needs to be written to read in numbers until it reads the integer 99999, it then should print out the correct average (not counting the final 99999). Novices, intermediate and advances students were all tested. Program plans were written and found that all three groups preferred the READ/PROCESS method (82%, 91%, 67% in order) over the PROCESS/READ method. Impelmenting the programs shows similar preference: READ/PROCESS (86%, 72%, 60%). All groupd were tested with two representations of the language - one that supported READ/PROCESS (loop ... leave ... again) which found that a higher percentage got the programs correct (24%, 61%, 96% = av 52%) as compared to the other group (for, repeat, while) (14%, 36%, 69% av = 33%).

Pascal and C both support the second type - PROCESS/READ methods as in a while loop.


Spohrer, J. C. and Soloway, E. and Pope, E., Where the Bugs Are. Proceedings, 1985, CHI, pp 47-53

Proposes in 200 examples of novice programs, only 'collaborated' effors are the same. Produced a program PROUST which identifies semantic bugs 75% of the time for moderately complex programs. Explains that research in this area is important for 2 reasons - 1 looking at bugs provides a window into misconceptions, 2 useful in devloping a detailed model of how students learn programming. Most novice programmers try to merge goals' as in real life which leads to a much harder to implement solution, therefore many more bugs. Especially seen in proceedures, ordering and looping. Classified errors as: