SPO600.asm: April 2015

Wednesday, 22 April 2015

Project Wrap up

Project Switch

Project Wrap up

Concluding the final project for spo600 at Seneca College

Sljit

My last post gave a outline of what I was going to try to accomplish but unfortunately I did not get very far. There are many things dealing with this area that I believe would require a greater knowledge of the sljit compiler as a whole. Here is what I have done up to today, It's not much, but it's something.

The defines

For defining the number of float registers for ARM and x86 it should be pretty simple and similar to what he has currently. It could be implemented if unlike me you had a better understanding of saved float registers. Here is what it should look like:

#if (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64)
#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 32
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS XX

#elif (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64)
#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 8
#if (defined _WIN64)
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS 1
#else
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS XX

#else
#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 6
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS XX

(XX should be replaced with correct number)
Defined in the arm documents here is where I found the information for number of floating point registers, It states there is 32 registers for single precision floating point, which are also used as 16 registers for double precision floating point, so I am not sure whether it would be 16 or 32. For x86 I was confused regarding the number of floating point registers used. For example this site has a picture showing 8 registers for floating points, but the title of this page is legacy registers which may mean they are no longer used. In the Intel documentation on page 183 (8.1.2) it states there are 8 in the x87 fpu stack, this made me wonder if these are the legacy registers the other site mentions or if these add on to the legacy registers making it 16 in total. Also in X86 there are Vector floating point registers and from this page I am not sure how many(possibly 16) or if they even come into play in sljit.

Function entry and exit points

Here is the current code in sljit for the entry:

SLJIT_API_FUNC_ATTRIBUTE sljit_si sljit_emit_enter(struct sljit_compiler *compiler,
 sljit_si options, sljit_si args, sljit_si scratches, sljit_si saveds,
 sljit_si fscratches, sljit_si fsaveds, sljit_si local_size)
{
 sljit_si i, tmp, offs, prev, saved_regs_size;

 CHECK_ERROR();
 CHECK(check_sljit_emit_enter(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size));
 set_emit_enter(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size);

 saved_regs_size = GET_SAVED_REGISTERS_SIZE(scratches, saveds, 0);
 local_size += saved_regs_size + SLJIT_LOCALS_OFFSET;
 local_size = (local_size + 15) & ~0xf;
 compiler->local_size = local_size;

 if (local_size <= (63 * sizeof(sljit_sw))) {
  FAIL_IF(push_inst(compiler, STP_PRE | 29 | RT2(TMP_LR)
   | RN(TMP_SP) | ((-(local_size >> 3) & 0x7f) << 15)));
  FAIL_IF(push_inst(compiler, ADDI | RD(SLJIT_SP) | RN(TMP_SP) | (0 << 10)));
  offs = (local_size - saved_regs_size) << (15 - 3);
 } else {
  compiler->local_size += 2 * sizeof(sljit_sw);
  local_size -= saved_regs_size;
  saved_regs_size += 2 * sizeof(sljit_sw);
  FAIL_IF(push_inst(compiler, STP_PRE | 29 | RT2(TMP_LR)
   | RN(TMP_SP) | ((-(saved_regs_size >> 3) & 0x7f) << 15)));
  offs = 2 << 15;
 }

 tmp = saveds < SLJIT_NUMBER_OF_SAVED_REGISTERS ? (SLJIT_S0 + 1 - saveds) : SLJIT_FIRST_SAVED_REG;
 prev = -1;
 for (i = SLJIT_S0; i >= tmp; i--) {
  if (prev == -1) {
   prev = i;
   continue;
  }
  FAIL_IF(push_inst(compiler, STP | RT(prev) | RT2(i) | RN(TMP_SP) | offs));
  offs += 2 << 15;
  prev = -1;
 }

 for (i = scratches; i >= SLJIT_FIRST_SAVED_REG; i--) {
  if (prev == -1) {
   prev = i;
   continue;
  }
  FAIL_IF(push_inst(compiler, STP | RT(prev) | RT2(i) | RN(TMP_SP) | offs));
  offs += 2 << 15;
  prev = -1;
 }

 if (prev != -1)
  FAIL_IF(push_inst(compiler, STRI | RT(prev) | RN(TMP_SP) | (offs >> 5)));

 if (compiler->local_size > (63 * sizeof(sljit_sw))) {
  /* The local_size is already adjusted by the saved registers. */
  if (local_size > 0xfff) {
   FAIL_IF(push_inst(compiler, SUBI | RD(TMP_SP) | RN(TMP_SP) | ((local_size >> 12) << 10) | (1 << 22)));
   local_size &= 0xfff;
  }
  if (local_size)
   FAIL_IF(push_inst(compiler, SUBI | RD(TMP_SP) | RN(TMP_SP) | (local_size << 10)));
  FAIL_IF(push_inst(compiler, ADDI | RD(SLJIT_SP) | RN(TMP_SP) | (0 << 10)));
 }

 if (args >= 1)
  FAIL_IF(push_inst(compiler, ORR | RD(SLJIT_S0) | RN(TMP_ZERO) | RM(SLJIT_R0)));
 if (args >= 2)
  FAIL_IF(push_inst(compiler, ORR | RD(SLJIT_S1) | RN(TMP_ZERO) | RM(SLJIT_R1)));
 if (args >= 3)
  FAIL_IF(push_inst(compiler, ORR | RD(SLJIT_S2) | RN(TMP_ZERO) | RM(SLJIT_R2)));

 return SLJIT_SUCCESS;
}

I will be honest, I have no idea how this is doing what it is doing(saving registers). I could not code how to extend this to make it save floating point registers, but I can only theorize it has something to do with fscratches and fsaved parameters, that currently are unused within the function. Possibly creating code similar to what is there, for example this line

saved_regs_size = GET_SAVED_REGISTERS_SIZE(scratches, saveds, 0);

Could be replicated to function with floating point registers like this:

fsaved_regs_size = GET_SAVED_FLOAT_REGISTERS_SIZE(fscratches, fsaveds, 0);

But that is all in theory, I am afraid I do not know, or even know how to know, how to do these things in practice.

Register mapping

I did not really get to this part because it seems simple but I have trouble understanding how he chooses the map for the register. For the integer registers of ARM64 this is his register map:

static SLJIT_CONST sljit_ub reg_map[SLJIT_NUMBER_OF_REGISTERS + 8] = {
  31, 0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17, 8, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 29, 9, 10, 11, 30, 31
};

I just don't know how he came up with these numbers and also why 8 is added to the size of the register map.

Conclusions

In this post, there was definitely a phrase that came up a lot, it was "I don't know", I think simply after finally finding something to work on, it was just out of my range to complete, even though I am sure it would be an easy task for someone who is experienced in this kind of environment. I think I should have kept my options open and perhaps choose a easier project to work on. I am disappointed in myself for not producing anything worthy of a patch but I am out of time to contribute anything more, In the summer I will try to complete it for fun and will post results if I get anywhere. Regardless of my results I have learned a lot through this project and through this whole course and I think it is always worth challenging yourself to expand your knowledge of computers since there is so much to learn.

Thanks for reading, have a good summer.

Tuesday, 14 April 2015

More project updates

Project Switch

Project Progress: Stack-less Just in time compiler

This is an update about my project progress in SPO600, problems with the new project.

Sljit

After my previous post I found out what I have to do in sljit. Basically there are three areas that need changing in order to increase the number of floating point registers. First is the defines, which requires me to add architecture specific sections for the number of floating point registers in a architecture.Second is the function entry exit points which require me to add additional areas to save and restore floating point registers, because currently they only save and restore integer registers. Finally I would have to change the defines for the temporary registers and map the registers in order to get the real register index. This is a great area and I really want it to work but I am struggling with this and not to confident that I can complete this on time.

The defines

There is a file called sljitConfigInternal.h which has many defines for integer registers that look something like this:

#elif (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64)
#define SLJIT_NUMBER_OF_REGISTERS 25
#define SLJIT_NUMBER_OF_SAVED_REGISTERS 10
#define SLJIT_LOCALS_OFFSET_BASE (2 * sizeof(sljit_sw))

but when it comes to floating point registers, all that is there is this:

#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 6
#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) && (defined _WIN64)
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS 1
#else
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS 0
#endif

This is pretty much just assigning 6 float registers to any architecture no matter what, as you can imagine this is not ideal because many processors can use more than 6. My task was to expand this so that it would use more for x86 or arm system.

Function entry and exit points

In each architecture specific file(for arm64 it would be sljitNativeARM_64 there are functions that deal with the function entry and exit points these are called sljit_emit_enter() and sljit_emit_return(). What these functions currently do is save and restore the integer registers but if we have more floating point registers It would have to be changed to save and restore them aswell.

Register mapping

Right now in the ARM64 arch specific file two temporary floating point registers are being used with no mapping at all. We can compare the functionality used for in the integer registers to see what it means to have register mapping:

#define TMP_ZERO  (0)
 
 #define TMP_REG1  (SLJIT_NUMBER_OF_REGISTERS + 2)
 #define TMP_REG2  (SLJIT_NUMBER_OF_REGISTERS + 3)
 #define TMP_REG3  (SLJIT_NUMBER_OF_REGISTERS + 4)
 #define TMP_LR    (SLJIT_NUMBER_OF_REGISTERS + 5)
 #define TMP_SP    (SLJIT_NUMBER_OF_REGISTERS + 6)
 
 #define TMP_FREG1 (0)
 #define TMP_FREG2 (SLJIT_NUMBER_OF_FLOAT_REGISTERS + 1)
 
 static SLJIT_CONST sljit_ub reg_map[SLJIT_NUMBER_OF_REGISTERS + 8] = {
   31, 0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17, 8, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 29, 9, 10, 11, 30, 31
 };
 
 #define W_OP (1 << 31)
 #define RD(rd) (reg_map[rd])
 #define RT(rt) (reg_map[rt])
 #define RN(rn) (reg_map[rn] << 5)
 #define RT2(rt2) (reg_map[rt2] << 10)
 #define RM(rm) (reg_map[rm] << 16)
 #define VD(vd) (vd)
 #define VT(vt) (vt)
 #define VN(vn) ((vn) << 5)
 #define VM(vm) ((vm) << 16)

As we can see, there is only two lines for the floating point registers(the TMP_FREG lines), and it is much more complex for the integer registers. The reg_map is used in the macros at the bottom in order to provide the correct machine register index for that register. I would have to do something similar for the floating point registers.

Problems

There are a few problems that have stopped me from completing these changes. First I am weary of how many floating point registers there are in each architecture, When looking at the integer registers the numbers are quite specific, for example ARM64 is defined as having 25 registers, MIPS is defined as having 22 registers, I have found that arm is supposed to have 32 floating point registers but it seems strange that it would be such an even number but I will try it regardless. There is the other line, NUMBER_OF_SAVED_FLOAT_REGISTERS which I am having trouble where to find that out, Chris Tyler, my professor, directed me to the procedure call standard for arm but I was unsuccessful in finding anything there. This problem kind of has me stuck and confused on what to do. It would be easy if I could ask the maintainer where/how he determined the registers numbers but he has stopped responding to my emails. For about 5 days we were talking, I would send one email and then he would send one back in the morning and I would respond and so forth, but I sent him an email one day and he just stopped responding, so he either got really busy or something happened to him, lets hope he is just busy. For now I will just try to get it to work by using 32 floating point registers for arm and do some trial and error to find the SAVED registers allowed.

Conclusions

This project despite the problems seems really interesting, I think if i had gotten an earlier start I would have been able to really complete a patch but switching projects slowed me down quite a lot. I think I will continue trying to complete this or get some progress even after spo600 is done, perhaps the maintainer will get some free time to help me with it. I will try to have something to show for next week but I don't know if it will be much.

Thanks for reading

Friday, 3 April 2015

Project update

Project Switch

Project Progress: Stack-less Just in time compiler

This is an update about my project progress in SPO600, an area to work on has been found!

Sljit

After my previous post I began talking to one of the developers of sljit about contributing to the project. He was very helpful and after some talk of what would be a good area for me we settled on one. This area involves offering more floating point registers on arm or x86. Currently the sljit compiler only has 6 registers available for floating point operations on all cpus but could offer more if the cpu has more available. This will require me to save and restore floating point registers and also modify which registers get used as temporary floating point registers through register mapping.

Moving forward

For now I will be looking at all this in more detail will provide a more detailed update in the future. I am really glad that I finally have a solid direction to go in and I look forward to contributing to this project.

Thanks for reading