Input and the scanf function

Although output is essential to useful programs, since it lets users know what the program has done, it is not enough to make programs really useful. For example, if you think about the "hello world" program, you may wonder what the point is. After all, if you want to see "Hello world" on the screen, you might as well just type it using a word processor instead of going to the trouble of writing a program to display it. Programs are really only meaningful and useful when it is possible to run them multiple times and have them produce something new each time. What allows a program to do this is the ability to give the program different input each time it runs. Consider a very simple program that can perform an arithmetic function on a pair of operands. One possible program that could do this would begin by initializing the two operands to some arbitrary values, add them and print the result.

#include <stdio.h>
main ()
{
  int operand1 = 2, operand2 = 5, result;
  result = operand1 + operand2;
  printf ("%d + %d = %d\n", operand1, operand2, result);
}

The problem with this program is that it is extremely boring! Every time we run it, we get exactly the same result. If we could instead provide the values for the operands each time we run the program, it would become a little more interesting, because we would get a different result each time. All we need to make the program more useful is to have a way to allow the user to provide input to the program.

The scanf function is the counterpart to the printf function, except that scanf allows input to come from a user. Using scanf should seem fairly familiar to you, since the argument(s) to scanf are almost exactly like those we pass to printf. The action is almost the same, it is just the direction of information flow that differs. The printf function takes information from the program and presents it to the outside world, whereas the scanf function takes information from the outside world and presents it to the program. Just as for printf, the first argument for scanf is a format string. The major difference is that it is rare (and even a little dangerous) for the format string passed to scanf to include anything besides format specifiers and blank spaces, although it is legal to have other text in the string.

As you learned earlier, each format specifier in the format string you pass to printf requires an additional argument. If those arguments were not available, printf would print garbage. If the additional arguments are not present with the scanf function, the results are more serious. The scanf function takes those additional arguments as the addresses of the memory locations into which it should write the user's input. If the programmer fails to provide enough arguments, or even if the arguments are in the wrong order, scanf takes garbage values as addresses. It is possible for those garbage values to be the addresses of other variables in your program, or even of the instructions that make up the program itself. If scanf writes to the address of some other variable, that variable will have the wrong value. In addition, the variable that should have the value that the user types will have the wrong value too. If scanf writes over an instruction for the program, then that will make the whole program wrong and its behavior will be unpredictable. This type of error is extremely difficult to identify and correct, so it is essential that you take every precaution to ensure that you have enough arguments in the correct order.

In the preceding paragraph, we said that scanf takes the additional arguments as the addresses of the locations where it should store the input values. Ordinarily, when we use a variable's name in a program, the name produces the variable's value, not its address. For instance, when we gave a variable name as an additional argument to the printf function, it printed the value of the variable, not its address. Because the job of scanf is to write a value to a variable, the current value is not important; scanf will erase it when it writes the new value. This is why scanf needs the address of the variable. It also means that when we pass additional arguments to scanf, we must do something special to tell the compiler that it should pass the address of a variable, rather than its value. To specify an address, we place an ampersand (&) in front of the variable's name. When we read a program and we see an ampersand, we read it as "the address of." Thus, if we have a variable named number, we read "&number" as "the address of number."

At this point, we can alter the program we initially wrote so that it allows the user to get the sum of any pair of numbers by providing the operands to the program. Notice that we begin the program by requesting input from the user. It is important to print such a prompt so the user knows what to do. Otherwise, he or she would see nothing more than a blank screen with a flashing cursor.

#include <stdio.h>
main ()
{
  int operand1, operand2, result;
  printf ("Enter two integers to sum, separated by a blank space\n");
  scanf ("%d %d", &operand1, &operand2);
  result = operand1 + operand2;
  printf ("%d + %d = %d\n", operand1, operand2, result);
}

You may have observed that the two format specifiers in the call to scanf have a blank between them. The blank is not really necessary. We could just as well have written the call as:
scanf ("%d%d", &operand1, &operand2);
Most programmers will prefer to leave a blank, simply because it is somewhat more readable. In addition, it looks more like the format that we are requesting of the user, that is, two numbers separated by a blank. If you type this program, compile it and run it, you will discover another interesting fact. If you type one number, then press the Enter key, the program will wait for you to type the second number and press the Enter key again. This happens because scanf skips over white space when it reads numeric data such as the integers we are requesting here. White space characters are those characters that affect the spacing and format of characters on the screen, without printing anything visible. The white space characters you can type as a user are the blank space (spacebar), the tab character (Tab key), and the newline (Enter Key).

If you try running the program again, but place a comma between the two numbers, you will see that something different happens. Instead of reading the two numbers correctly, the second number will be garbage. This is because scanf will only skip white space and the comma is not white space. What actually happens when you ask scanf to read numeric data is that it first skips any white space it finds and then it reads characters until the character it reads cannot form part of a number (that is, it is not a digit or a plus or minus sign). In this case, when it encounters the comma, it stops reading. Since it has not read any digits, there is no number for it to store, so it simply leaves the original value. You can prove this to yourself by initializing operand2 to some value (for instance, 10) at declaration time. Now, if you type a number, then a comma, and then the second number, you will see that the printf statement will print the initialization value, rather than the second number you typed. If you used %f format specifiers to read floating point numbers, you would see that scanf behaves in the same way.

When you ask scanf to read character data, it behaves differently. To understand why, you first have to realize that anything you type at the keyboard is a character, even the digits, punctuation marks, blanks, tabs, and the newline you get from pressing the Enter key. When you use a format specifier such as %d or %f to read numeric values, scanf actually reads the characters the user types and converts them to their numeric equivalents. When you ask scanf to read a character, it will read whatever is next as the value of the character. To see how this works, try typing and compiling the following program. When you run it and the program requests three characters, type the letter 'a', a comma, a blank space, and then the digit '1' before pressing the Enter key.

#include <stdio.h>
main ()
{
  char ch1, ch2, ch3;
  printf ("Enter three characters\n");
  scanf ("%c%c%c", &ch1, &ch2, &ch3);
  printf ("ch1 is %c, ch2 is %c, and ch3 is %c\n", ch1, ch2, ch3);
  printf ("Enter two more characters\n");
  scanf ("%c%c", &ch1, &ch2);
  printf ("ch1 is %c, ch2 is %c\n", ch1, ch2);
} 

In the following "snapshot" of the screen, the characters the user types appear in red:

Enter three characters
a, 1
ch1 is a, ch2 is ,, and ch3 is 
Enter two more characters
ch1 is 1, ch2 is 

|

You will notice several things when you run the program with this input. First, the output you see may not be exactly what you expected. When printf prints the first three characters, the value of ch1 is 'a', which is perfectly normal. The value of ch2 is ',' and that is not too surprising either. It is likely that you might expect the value of ch3 to be '1', since that is the next visible character on the input line, but notice that scanf does not skip white space when it is reading character data, as it does when it reads numeric values. Instead, the value of ch3 is the blank space. Because the space is invisible, it appears that something is wrong with the output--it looks like the value of ch3 did not print. The blank is there, but it is impossible to see it. The next surprising thing is that when the program prints the second prompt, it does not wait for you to type two more characters. Instead, it immediately prints the last line. The reason is that when the program calls scanf the second time, two unread characters remain: the digit '1' and the newline produced when you pressed the Enter key. The scanf function reads these two remaining characters as the new values of ch1 and ch2. Again, it is a little difficult to see the value of ch2, but if you look closely, you can see that printf has written two newlines to the screen. The first is the value of ch2 which sends the cursor to the beginning of the next line. The second newline appears as a result of the escape sequence \n at the end of the format string.

Quite often, this behavior is not what you will want from your program. Instead, you will want scanf to read only the visible characters, skipping white space (including the newline) as it does for numeric data. The secret to getting scanf to perform this way is to put a blank in the format string before the %c format specifier. The blank tells scanf to skip white space and it will actually skip any number of white space characters before reading and storing a character. If we change the two scanf statements in the program in this way and run it again, using exactly the same input, you will see what a difference these spaces in the format string will make.

#include <stdio.h>
main ()
{
  char ch1, ch2, ch3;
  printf ("Enter three characters\n");
  scanf (" %c %c %c", &ch1, &ch2, &ch3);
  printf ("ch1 is %c, ch2 is %c, and ch3 is %c\n", ch1, ch2, ch3);
  printf ("Enter two more characters\n");
  scanf (" %c %c", &ch1, &ch2);
  printf ("ch1 is %c, ch2 is %c\n", ch1, ch2);
} 

This time the user has typed several spaces between the comma and the '1' to demonstrate that a single space in the format string will cause scanf to skip any number of white space characters in the input. After the second prompt, the user will need to type two more characters before the program will continue. Notice that the user has not put any white space between the 'x' and 'y'. This shows that scanf does not insist that there be white space in the input, even though there is a blank in the format string.

Enter three characters
a,       1
ch1 is a, ch2 is ,, and ch3 is 1
Enter two more characters
xy
ch1 is x, ch2 is y
|

Using blank spaces is necessary to get scanf to read only visible characters. It is also an example of how we can use text other than format specifiers in the format string. In a sense, the format string is a "picture" of what scanf expects the user to type. To see this more clearly, we will make one more small change in the program--we will place a comma after the first %c format specifier in the first call to scanf. If you run the program again, typing the same input, you will see that something new happens.

#include <stdio.h>
main ()
{
  char ch1, ch2, ch3;
  printf ("Enter three characters\n");
  scanf (" %c, %c %c", &ch1, &ch2, &ch3);   /*comma inserted
        here ^ */
  printf ("ch1 is %c, ch2 is %c, and ch3 is %c\n", ch1, ch2, ch3);
  printf ("Enter two more characters\n");
  scanf (" %c %c", &ch1, &ch2);
  printf ("ch1 is %c, ch2 is %c\n", ch1, ch2);
} 

When you type the first line of input, you will notice that the program stops and waits. You will need to type another character to make it go on.

Enter three characters
a, 1
,
ch1 is a, ch2 is 1, and ch3 is ,
Enter two more characters
xy
ch1 is x, ch2 is y
|

When you add text other than white space in a format string that you pass as an argument to scanf, the user must type input exactly as it appears in the format string. Run this last program again, but this time, instead of typing the comma after the 'a', just type the 'a', a blank and the '1'. You will see that the values of the variables are not correct. Although it is possible include characters other than white space in the format string, it is generally best not to. If you do, you are essentially requiring the user to type input in a very particular way. The more specific the requirements for input, the greater the chance for errors. If you do need the user to type things in a specific fashion, at the very least, you should make sure that the prompts you print make the formatting requirements of the input quite clear.

You need to be aware of one final point about the scanf function. When you use scanf to read numeric data, the format specifier determines how scanf will read the input, not what the user types. You have already seen that if the user types a character that cannot be part of a number, scanf stops reading. Of course, the characters that can form part of a number differ according to the kind of number that scanf is expecting. If you use a %f format specifier, a decimal point is a permissible character (but only the first decimal point). If you use a %d format specifier, a decimal point will stop the reading. Furthermore, if you use a %f format specifier and the user types an integer, scanf will convert the integer to a floating point value. Thus, if the user types 15,  scanf will store 15.0.