Learning Assembly — Part 2

Get to Grips with Binary Numbers and Other Data Representations!

How can we represent data in our code?

Andrew Blance
codeburst
Published in
13 min readJul 31, 2020

--

A cute picture of a computer
Photo by Delia Giandeini on Unsplash

In my first post, we got to see a very brief look at how assembly, and specifically the 6502 variant, works. We built up an idea of what assembly is from the most basic binary patterns that a processor will want to accept. We then touched on the idea that a computer program will be made up of three parts: the instructions, data, and memory locations. To get a good idea of how to effectively program with the 6502 assembly language we will need to get a better understanding of all of them.

One of the many cool things about the 6502 is the numerous methods with which it can access locations in memory. Some of the methods though are slightly obtuse and can be hard to understand when beginning to learn. For this reason, we will wait until we have explored more of the architecture before delving into this part. The next option is to go straight into learning the instruction set. However, without knowing how the input and output data behave we are not able to fully appreciate what is happening. The last (but by no means least) option then is to look at how data will be represented in our programs.

This week, we will take a long look at binary numbers and the maths surrounding them. Here, we will aim to build up our understanding of binary, looking at where problems in the system lay and how different types of binary get around them. Then, we will look at ASCII and hexadecimal to round out different ways we can interact with the 6502. This won’t be a complete look at all representations that the processor can understand but it will give us a knowledge of the most useful (in my opinion).

As we start to venture into the language and hardware we sadly cannot avoid our discussion getting more technical and dense. Whether you follow along precisely or not, the main takeaway here will be that our ability to do arithmetic in the system will be hugely influenced by the hardware itself. I go into detail about binary, which I admit can be nuanced. However, the main way we will interact with data in Assembly will end up being hexadecimal, so don’t be too alarmed.

Binary

Binary is something that is not necessary to know when doing modern, high-level programming. The 6502 can do many operations (add, subtract, moving things in memory) for which you will not need to understand how binary works. However, there are operations we might like to do, such as multiplication and divide, which are not things the 6502 can do natively. We need to create these functions ourselves. This will require some understanding of the intricacies of binary calculation. Not least because internally, the 6502 will represent everything in the form of binary. We saw this when we discussed how the processor will try to “translate” everything down to binary patterns. At a very minimum, being familiar with reading binary numbers and with terms like bit, byte, and nibble will be useful to understand generally how the 6502 operates. Luckily though for us, binary is intuitive and fun.

As of right now, the only practical way to store data in electronics is in a form of two-state logic. Something can either be “On” or “Off”, “True” or “False”. In a computer a singular binary digit, the most basic way to store information is called a bit. A bit can either be a 1 or a 0. Data in a computer is stored in groups of bits. In many cases, such as the 6502, the preferred way is in a group of eight (something like 0001 0101). This is called a byte. We can also have a smaller grouping of four bits called a nibble (lol).

To adequately represent numeric data a few cases need to be looked at. Firstly, we will begin with unsigned binary. The eight bits ( b ) of one byte will be represented as:

b₇b₆b₅b₄b₃b₂b₁b₀

We count the bits from right to left, indexing from 0. Each bit can be seen as being related to a binary number. The rightmost bit represents 2⁰, the next one to the left is 2¹, then 2², and so on. It will, therefore, be useful to remember the powers of 2, which are:

2⁷ = 128
2⁶ = 64
2⁵ = 32
2⁴ = 16
2³ = 8
2² = 4
2¹ = 2
2⁰ = 1

To translate from binary to something we recognize we will need to sum all the appropriate powers of 2. If a bit is set to 1 we will include the corresponding power in our sum, if not we will discard it. This may already seem complicated, but looking at an example will hopefully help clear it up. If we take the binary number 0010 0011 we can see the only bits set are 0, 1, and 5. We can then "sum" them as such:

  b₅ + b₁ + b₀
= 2⁵ + 2¹ + 2⁰
= 32 + 2 + 1
= 35

You can also try to go the other way, from decimal to binary. We do this by dividing the number by two multiple times and tracking the remainder. The remainder bit (which will always be a 1 or a 0) will map to the bits in a byte. For example, if you divide 35 by 2 you get 17 remainder 1. Therefore, bit 0 will be a 1. Again, this calculation is less strange than it sounds when you see it performed. Let's try 35 (or 0010 0011) again:

35 ÷ 2 = 17r1→ 1
17 ÷ 2 = 8r1 → 1
8 ÷ 2 = 4r0 → 0
4 ÷ 2 = 2r0 → 0
2 ÷ 2 = 1r0 → 0
1 ÷ 2 = 0r1 → 1
0 ÷ 2 = 0r0 → 0
0 ÷ 2 = 0r0 → 0​

This method gives us the result0010 0011, which is exactly what we started with!

Addition

Now that we are comfortable reading and writing binary numbers we are going to want to try to do some math with them. Addition works “normally”. When you add two normal numbers (let's try 9 and 2) you “carry” everything that goes above ten to the next digit. So, 9 + 2 will give you 11. In binary, you perform a similar process. Single bits follow these rules:

​0 + 0 =   0
1 + 0 = 1
0 + 1 = 1
1 + 1 =(1)0​

​The (1) represents a carry. It's the same as us saying 9 + 2 = (1)1. Using what we have learned before we can see that 10 in binary is the equivalent of 2 in decimal, exactly what we would expect from 1 + 1!

If we try to do 3 (0011) + 1 (0001) we can see another example of how carrying in binary works:

  0011
+ 0̲0̲0̲1̲
0100

We can see here something has carried from the 1st bit to the 2nd. What if we add a larger number? Let’s try 128 and 129:

   10000001
+̲1̲0̲0̲0̲0̲0̲0̲0̲
(1)00000001​

We have now encountered our first problem. We know the answer should be 257, however, looking at what is in the first 8 bits it looks like the answer is 1. Only with the extra carry bit does the true answer become apparent. If we are going to be saving our binary numbers in 8-bits we will need a way to store and track this type of carry. The 6502 can do exactly that, phew! The 6502 has a set of flags built within it, one of which is called the Carry Flag. This can be set as 1 or 0. Whenever an addition happens for binary numbers that result in the answer being greater than 256 the Carry Flag is set to 1. This allows us to keep track of our maths and gives us a way to extend our arithmetic past 8-bit numbers.

Let's take a minute though to think about what we have just said because I think it has a profound implication. It is not software that will keep track of the carry in our maths, it is hardware. The carry flag is a register inside the 6502, an “actual place”. If I had a big microscope I could zoom in and be able to say “hey, that's the place where the carry bit is stored!”. This is what I think is really exciting about learning 6502 assembly — you are in a space where there is a huge overlap between the hardware you have and the code you are able to write. This can really shape the way you think about coding in this system. When I write Python on my Mac I have very little sense of how it connects to it — I write my script, some magic happens, then I get some numbers back. When you discuss assembly you cannot avoid the idea that what you are doing has a strong connection to your machine, I find it much less abstract.

So, we can also represent our positive integers correctly and if we include a carry bit our binary numbers can add correctly. However, there are two problems:

  • We can only count up to 255 (if we only use 8 bits).
  • We can only represent positive numbers.

We need to be able to solve the latter issue. To do this we will introduce Signed Binary.

Signed Binary

For unsigned binary, all 8-bits were used to store the value. This allowed us to have a range between 0 and 256. In signed binary, we will use bit 7 as a way to keep track of the sign of the number. However, this will come at the expense of a reduced maximum value our binary numbers can take.

In this new system, we will say that 1 is: 0000 0001 and -1 is: 1000 0001. Immediately, we have reduced the magnitude from 255 to 127 (oops) but we do have a way to represent negative numbers. Let us now go back and try doing a calculation with this new method. This is 7 + −5:

 00000111
+̲1̲0̲0̲0̲0̲1̲0̲1̲
10001100​

Again, we have a problem, 1000 1100 is −12. This is wrong. To find a solution to this we will need to get to two's complement. To get to this point we will first have to introduce one's complement.

One and Two’s Complement

In One’s Complement the positive number stays the same, ie, 1 is still0000 0001. However, to get the negative we "swap" all the bits - all 0's become 1's and all 1's become 0's. This means -1 is: 1111 1110. As with signed binary, all positive numbers start with 0 and all negatives start with 1.

Let's try adding -4 and +6:

 11111011
+̲0̲0̲0̲0̲0̲1̲1̲0̲​
00000001​

We have got a result of 1, instead of 2. Wrong, but we are getting there. We will have to expand one’s complement before we get the correct addition.

To arrive at Two’s Complement you transform your value to One’s Complement, then, for the negative number, add 1. So, 3 is 0000 0011 while -3 is 1111 1101.

Let's try adding -4 and 6 again:

 11111100
+̲​0̲​0̲​0̲​0̲​0̲​1̲​1̲​0̲​​
00000010​

Which gives us 2. So now, finally, everything works! Getting here was slightly convoluted but two’s complement is a popular and widely used way to represent binary numbers. Below is a table of two’s complements with some 16-bit values shown as well.

table of two’s complements
Binary Table — Source

The Overflow

We saw that when adding unsigned numbers there can be a problem where we need to keep track of the carry. This was handled by the 6502’s built-in Carry Flag. A similar problem can happen with signed binary numbers too. Here is 64 + 65:

 01000000
+̲0̲1̲0̲0̲0̲0̲0̲1̲
​ 10000001​

This has given us an answer of -127. When the result of signed binary addition should be either >127 or <-127 we get an Overflow. This looks to us like two positive numbers summing to a negative, or two negatives summing to a positive. Again, the 6502 has a way to deal with this. The Overflow flag will be set when what is being carried into bit 7 does not match what is being carried out. Signed numbers can also set the Carry Flag too, whenever the result of the sum will need to be stored in more than 8-bits.

So, what should you do if you set the Overflow flag — what is the solution? Well, an unexpected overflow in practice is an exception and should not happen too often. While the flag is there to tell you something has gone wrong it is not going to help you fix it. You may need to go back, check your inputs, and decide on the correct answer yourself.

Something neat is that when adding, the 6502 cannot tell the difference between signed and unsigned binary — it makes no distinction. During calculation, the flags are set automatically. We are able then to use either signed or unsigned numbers, whatever we prefer. It is then also up to us, as the programmers, to be able to tell if the Carry and Overflow Flags being set are relevant to our problem. It also worth remembering that carrying and overflowing are entirely independent. We have just discussed addition so far but it should be said that all this is (mostly) true for subtraction as well. There is a lot to say about this topic but hopefully, this provides a brief introduction to it. It can take a while to convince yourself how Carry and the Overflow works, it did for me at least. I have left a few really useful links at the bottom to help get a better understanding of the problem.

Hexadecimal

Now we understand more about flags and how the 6502 will attempt calculations we can try to find a nicer way to represent the data. Binary is nice, but it can be mentally taxing to have to read and write it. The way we will improve this is by making a switch to hexadecimal. Here, the decimal digits 0 to 15 are represented by the symbols 0 to 9, then A to F (decimal 0 = hex 0, …, decimal 15 = hex F). This range will allow us to make the most of a nibble (4-bits, remember?!), using each combination of bits to represent a symbol. For example, 9 would be1001, A would become 1010 and F would be 1111. Therefore, we can use a byte to hold 2 digits. We can convert from a hexadecimal digit (like FA) to decimal easily. You multiply the first digit by 16¹, the second by 16⁰, and then sum! For example:

FA = 1111 1010
= 15×16¹ + 10×16⁰
= 250​

Now, as we write our code for the 6502 we will aim to use hexadecimal to represent numbers. The same issues we discussed may still exist but trying to understand the data will be less difficult than if it were all just ones and zeroes. I have included a decimal-to-binary-to-hexadecimal table to give a better idea of how the representations relate to each other.

decimal — binary — hex convertor
Hex Table — Source

Alphanumeric Data

We have discussed how binary and hexadecimal is used to represent numbers but we also require an encoding for other symbols too. For this, we will use ASCII. We will want to encode the 26 letters in both upper and lower case, plus 10 numbers, and then around 20 special symbols. This can all be accomplished in 7 bits (which allows for 128 unique codes). The eighth bit could then (if you wanted..) be used to verify that the contents of the byte have not been accidentally changed. This bit is called the parity bit. For reference, here is a table with all the ASCII encodings that will be of use for us:

Ascii encodings
ASCII Table — Source

Conclusion

I have tried to give a brief look at how data can be represented in the 6502. It has been a longer piece than I originally expected but it is still by no means an exhaustive list. Other forms exist in the form of BCD, EBCDIC, and octadecimal. I decided to not cover these as binary, hexadecimal, and ASCII are all we will need for the moment. For some things (NES dev, for example) BCD compatibility is removed.

We have discussed a lot in this post though. There is a lot of slightly different types of binary that look very similar, so it can take some time to convince yourself that all of them work. Furthermore, when we write assembly we could use hexadecimal for everything. Regardless, if you take anything away from here it should be:

  • The 6502 will be able to accept multiple types of data as input (binary, hex, ASCII…). This will allow us to handle text and numbers.
  • The way we have constructed our arithmetic means it will not be dependant on how we chose to represent our data. I.e., the maths does not see any difference between signed and unsigned binary.
  • There is a strong link between our ability to do binary arithmetic correctly and the physical hardware of the 6502.

My original goal was to combine this with a discussion on 6502 hardware. As I started writing this though and realized the length I decided to save it for another time. How we represent data is important here and by starting to learn about the internal flags in the processor we have a good motivation to learn more about the architecture next time.

The end.

This is the second part of my “learning assembly” series:

Thank you for all the great responses about part 1! I really did not think people would be so interested! I even got some Reddit Gold, wow.

Some helpful links:

This article has been adapted from my personal blog. Most of the content I talk about will come from two main sources: “6502 Assembly Language Programming” by Lance A.Leventhal and “Programming the 6502” by Rodney Zaks.

--

--