tag:blogger.com,1999:blog-12922907401436293652024-03-05T10:42:01.040-08:00SPO600.asmJames Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-1292290740143629365.post-44748045584442701792015-04-22T20:05:00.001-07:002015-04-22T20:19:29.200-07:00Project Wrap up<html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 95%;
line-height: 100%;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 1.5%;
margin-bottom: 2.5%;
padding-top: 0%;
color: yellow;
overflow-x: auto;}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Project Switch</title>
</head>
<body>
<div class = title>
Project Wrap up
</div>
<div class = intro>
Concluding the final project for spo600 at Seneca College
</div>
<div class = section>
<h3>Sljit</h3>
My last <a href="http://spo-asm.blogspot.ca/2015/04/more-project-updates.html">post</a> gave a outline of what I was going to try to accomplish but unfortunately I did not get very far. There are many things dealing with this area that I believe would require a greater knowledge of the sljit compiler as a whole. Here is what I have done up to today, It's not much, but it's something.
</div>
<div class = section>
<h3>The defines</h3>
For defining the number of float registers for ARM and x86 it should be pretty simple and similar to what he has currently. It could be implemented if unlike me you had a better understanding of saved float registers. Here is what it should look like:
<div class = codeblock>
<pre>#if (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64)
#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 32
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS XX
#elif (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64)
#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 8
#if (defined _WIN64)
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS 1
#else
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS XX
#else
#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 6
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS XX</pre>
</div>
(XX should be replaced with correct number)<br/>
Defined in the arm documents <a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/Chdidbba.html"> here</a> is where I found the information for number of floating point registers, It states there is 32 registers for single precision floating point, which are also used as 16 registers for double precision floating point, so I am not sure whether it would be 16 or 32. For x86 I was confused regarding the number of floating point registers used. For example this <a href="http://www.sandpile.org/x86/fp_old.htm">site </a> has a picture showing 8 registers for floating points, but the title of this page is legacy registers which may mean they are no longer used. In the <a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf">Intel documentation</a> on page 183 (8.1.2) it states there are 8 in the x87 fpu stack, this made me wonder if these are the legacy registers the other site mentions or if these add on to the legacy registers making it 16 in total. Also in X86 there are Vector floating point registers and from this <a href="http://www.sandpile.org/x86/fp_new.htm">page</a> I am not sure how many(possibly 16) or if they even come into play in sljit.
</div>
<div class = section>
<h3> Function entry and exit points</h3>
Here is the current code in sljit for the entry:
<div class = codeblock>
<pre>SLJIT_API_FUNC_ATTRIBUTE sljit_si sljit_emit_enter(struct sljit_compiler *compiler,
sljit_si options, sljit_si args, sljit_si scratches, sljit_si saveds,
sljit_si fscratches, sljit_si fsaveds, sljit_si local_size)
{
sljit_si i, tmp, offs, prev, saved_regs_size;
CHECK_ERROR();
CHECK(check_sljit_emit_enter(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size));
set_emit_enter(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size);
saved_regs_size = GET_SAVED_REGISTERS_SIZE(scratches, saveds, 0);
local_size += saved_regs_size + SLJIT_LOCALS_OFFSET;
local_size = (local_size + 15) & ~0xf;
compiler->local_size = local_size;
if (local_size <= (63 * sizeof(sljit_sw))) {
FAIL_IF(push_inst(compiler, STP_PRE | 29 | RT2(TMP_LR)
| RN(TMP_SP) | ((-(local_size >> 3) & 0x7f) << 15)));
FAIL_IF(push_inst(compiler, ADDI | RD(SLJIT_SP) | RN(TMP_SP) | (0 << 10)));
offs = (local_size - saved_regs_size) << (15 - 3);
} else {
compiler->local_size += 2 * sizeof(sljit_sw);
local_size -= saved_regs_size;
saved_regs_size += 2 * sizeof(sljit_sw);
FAIL_IF(push_inst(compiler, STP_PRE | 29 | RT2(TMP_LR)
| RN(TMP_SP) | ((-(saved_regs_size >> 3) & 0x7f) << 15)));
offs = 2 << 15;
}
tmp = saveds < SLJIT_NUMBER_OF_SAVED_REGISTERS ? (SLJIT_S0 + 1 - saveds) : SLJIT_FIRST_SAVED_REG;
prev = -1;
for (i = SLJIT_S0; i >= tmp; i--) {
if (prev == -1) {
prev = i;
continue;
}
FAIL_IF(push_inst(compiler, STP | RT(prev) | RT2(i) | RN(TMP_SP) | offs));
offs += 2 << 15;
prev = -1;
}
for (i = scratches; i >= SLJIT_FIRST_SAVED_REG; i--) {
if (prev == -1) {
prev = i;
continue;
}
FAIL_IF(push_inst(compiler, STP | RT(prev) | RT2(i) | RN(TMP_SP) | offs));
offs += 2 << 15;
prev = -1;
}
if (prev != -1)
FAIL_IF(push_inst(compiler, STRI | RT(prev) | RN(TMP_SP) | (offs >> 5)));
if (compiler->local_size > (63 * sizeof(sljit_sw))) {
/* The local_size is already adjusted by the saved registers. */
if (local_size > 0xfff) {
FAIL_IF(push_inst(compiler, SUBI | RD(TMP_SP) | RN(TMP_SP) | ((local_size >> 12) << 10) | (1 << 22)));
local_size &= 0xfff;
}
if (local_size)
FAIL_IF(push_inst(compiler, SUBI | RD(TMP_SP) | RN(TMP_SP) | (local_size << 10)));
FAIL_IF(push_inst(compiler, ADDI | RD(SLJIT_SP) | RN(TMP_SP) | (0 << 10)));
}
if (args >= 1)
FAIL_IF(push_inst(compiler, ORR | RD(SLJIT_S0) | RN(TMP_ZERO) | RM(SLJIT_R0)));
if (args >= 2)
FAIL_IF(push_inst(compiler, ORR | RD(SLJIT_S1) | RN(TMP_ZERO) | RM(SLJIT_R1)));
if (args >= 3)
FAIL_IF(push_inst(compiler, ORR | RD(SLJIT_S2) | RN(TMP_ZERO) | RM(SLJIT_R2)));
return SLJIT_SUCCESS;
}</pre></div>
I will be honest, I have no idea how this is doing what it is doing(saving registers). I could not code how to extend this to make it save floating point registers, but I can only theorize it has something to do with fscratches and fsaved parameters, that currently are unused within the function. Possibly creating code similar to what is there, for example this line
<div class = codeblock>
<pre>saved_regs_size = GET_SAVED_REGISTERS_SIZE(scratches, saveds, 0);</pre>
</div>
Could be replicated to function with floating point registers like this:
<div class = codeblock>
<pre>fsaved_regs_size = GET_SAVED_FLOAT_REGISTERS_SIZE(fscratches, fsaveds, 0);</pre>
</div>
But that is all in theory, I am afraid I do not know, or even know how to know, how to do these things in practice.
</div>
<div class = section>
<h3>Register mapping</h3>
I did not really get to this part because it seems simple but I have trouble understanding how he chooses the map for the register. For the integer registers of ARM64 this is his register map:
<div class = codeblock>
<pre>static SLJIT_CONST sljit_ub reg_map[SLJIT_NUMBER_OF_REGISTERS + 8] = {
31, 0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17, 8, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 29, 9, 10, 11, 30, 31
};</pre>
</div>
I just don't know how he came up with these numbers and also why 8 is added to the size of the register map.
</div>
<div class = section>
<h3>Conclusions </h3>
In this post, there was definitely a phrase that came up a lot, it was "I don't know",
I think simply after finally finding something to work on, it was just out of my range to complete, even though I am sure it would be an easy task for someone who is experienced in this kind of environment. I think I should have kept my options open and perhaps choose a easier project to work on.
I am disappointed in myself for not producing anything worthy of a patch but I am out of time to contribute anything more, In the summer I will try to complete it for fun and will post results if I get anywhere.
Regardless of my results I have learned a lot through this project and through this whole course and I think it is always worth challenging yourself to expand your knowledge of computers since there is so much to learn.
</div>
<div class = end>
Thanks for reading, have a good summer.
</div>
</body>
</html>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-30926706476864566972015-04-14T12:47:00.000-07:002015-04-15T16:26:52.353-07:00More project updates<html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 95%;
line-height: 100%;
white-space: pre;
white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -o-pre-wrap;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 1.5%;
margin-bottom: 2.5%;
padding-top: 0%;
color: yellow}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Project Switch</title>
</head>
<body>
<div class = title>
Project Progress: Stack-less Just in time compiler
</div>
<div class = intro>
This is an update about my project progress in SPO600, problems with the new project.
</div>
<div class = section>
<h3>Sljit</h3>
After my previous <a href="http://spo-asm.blogspot.ca/2015/04/project-update.html">post</a> I found out what I have to do in sljit.
Basically there are three areas that need changing in order to increase the number of floating point registers. First is the defines, which requires me to add architecture specific sections for the number of floating point registers in a architecture.Second is the function entry exit points which require me to add additional areas to save and restore floating point registers, because currently they only save and restore integer registers. Finally I would have to change the defines for the temporary registers and map the registers in order to get the real register index. This is a great area and I really want it to work but I am struggling with this and not to confident that I can complete this on time.
</div>
<div class = section>
<h3>The defines</h3>
There is a file called sljitConfigInternal.h which has many defines for integer registers that look something like this:
<div class = codeblock>
<pre>#elif (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64)
#define SLJIT_NUMBER_OF_REGISTERS 25
#define SLJIT_NUMBER_OF_SAVED_REGISTERS 10
#define SLJIT_LOCALS_OFFSET_BASE (2 * sizeof(sljit_sw))</pre>
</div>
but when it comes to floating point registers, all that is there is this:
<div class = codeblock>
<pre>#define SLJIT_NUMBER_OF_FLOAT_REGISTERS 6
#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) && (defined _WIN64)
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS 1
#else
#define SLJIT_NUMBER_OF_SAVED_FLOAT_REGISTERS 0
#endif</pre>
</div>
This is pretty much just assigning 6 float registers to any architecture no matter what, as you can imagine this is not ideal because many processors can use more than 6. My task was to expand this so that it would use more for x86 or arm system.
</div>
<div class = section>
<h3>Function entry and exit points</h3>
In each architecture specific file(for arm64 it would be sljitNativeARM_64 there are functions that deal with the function entry and exit points these are called sljit_emit_enter() and sljit_emit_return(). What these functions currently do is save and restore the integer registers but if we have more floating point registers It would have to be changed to save and restore them aswell.
</div>
<div class = section>
<h3>Register mapping</h3>
Right now in the ARM64 arch specific file two temporary floating point registers are being used with no mapping at all.
We can compare the functionality used for in the integer registers to see what it means to have register mapping:
<div class = codeblock>
<pre>#define TMP_ZERO (0)
#define TMP_REG1 (SLJIT_NUMBER_OF_REGISTERS + 2)
#define TMP_REG2 (SLJIT_NUMBER_OF_REGISTERS + 3)
#define TMP_REG3 (SLJIT_NUMBER_OF_REGISTERS + 4)
#define TMP_LR (SLJIT_NUMBER_OF_REGISTERS + 5)
#define TMP_SP (SLJIT_NUMBER_OF_REGISTERS + 6)
#define TMP_FREG1 (0)
#define TMP_FREG2 (SLJIT_NUMBER_OF_FLOAT_REGISTERS + 1)
static SLJIT_CONST sljit_ub reg_map[SLJIT_NUMBER_OF_REGISTERS + 8] = {
31, 0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17, 8, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 29, 9, 10, 11, 30, 31
};
#define W_OP (1 << 31)
#define RD(rd) (reg_map[rd])
#define RT(rt) (reg_map[rt])
#define RN(rn) (reg_map[rn] << 5)
#define RT2(rt2) (reg_map[rt2] << 10)
#define RM(rm) (reg_map[rm] << 16)
#define VD(vd) (vd)
#define VT(vt) (vt)
#define VN(vn) ((vn) << 5)
#define VM(vm) ((vm) << 16)</pre>
</div>
As we can see, there is only two lines for the floating point registers(the TMP_FREG lines), and it is much more complex for the integer registers. The reg_map is used in the macros at the bottom in order to provide the correct machine register index for that register. I would have to do something similar for the floating point registers.
</div>
<div class = section>
<h3>Problems</h3>
There are a few problems that have stopped me from completing these changes. First I am weary of how many floating point registers there are in each architecture, When looking at the integer registers the numbers are quite specific, for example ARM64 is defined as having 25 registers, MIPS is defined as having 22 registers, I have found that arm is supposed to have 32 floating point registers but it seems strange that it would be such an even number but I will try it regardless. There is the other line, NUMBER_OF_SAVED_FLOAT_REGISTERS which I am having trouble where to find that out, Chris Tyler, my professor, directed me to the procedure call standard for arm but I was unsuccessful in finding anything there. This problem kind of has me stuck and confused on what to do. It would be easy if I could ask the maintainer where/how he determined the registers numbers but he has stopped responding to my emails. For about 5 days we were talking, I would send one email and then he would send one back in the morning and I would respond and so forth, but I sent him an email one day and he just stopped responding, so he either got really busy or something happened to him, lets hope he is just busy. For now I will just try to get it to work by using 32 floating point registers for arm and do some trial and error to find the SAVED registers allowed.
</div>
<div class = section>
<h3>Conclusions</h3>
This project despite the problems seems really interesting, I think if i had gotten an earlier start I would have been able to really complete a patch but switching projects slowed me down quite a lot. I think I will continue trying to complete this or get some progress even after spo600 is done, perhaps the maintainer will get some free time to help me with it. I will try to have something to show for next week but I don't know if it will be much.
</div>
<div class = end>
Thanks for reading
</div>
</body>
</html>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-72760248436993747762015-04-03T10:00:00.000-07:002015-04-03T10:00:28.379-07:00Project update<html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 95%;
line-height: 100%;
white-space: pre;
white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -o-pre-wrap;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 1.5%;
margin-bottom: 2.5%;
padding-top: 0%;
color: yellow}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Project Switch</title>
</head>
<body>
<div class = title>
Project Progress: Stack-less Just in time compiler
</div>
<div class = intro>
This is an update about my project progress in SPO600, an area to work on has been found!
</div>
<div class = section>
<h3>Sljit</h3>
After my previous <a href="http://spo-asm.blogspot.ca/2015/03/switching-projects.html">post</a> I began talking to one of the developers of sljit about contributing to the project. He was very helpful and after some talk of what would be a good area for me we settled on one. This area involves offering more floating point registers on arm or x86. Currently the sljit compiler only has 6 registers available for floating point operations on all cpus but could offer more if the cpu has more available. This will require me to save and restore floating point registers and also modify which registers get used as temporary floating point registers through register mapping.
</div>
<div class = section>
<h3>Moving forward</h3>
For now I will be looking at all this in more detail will provide a more detailed update in the future. I am really glad that I finally have a solid direction to go in and I look forward to contributing to this project.
</div>
<div class = end>
Thanks for reading
</div>
</body>
</html>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-47726747200359570112015-03-31T00:02:00.000-07:002015-03-31T00:02:42.477-07:00Switching projects<html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 95%;
line-height: 100%;
white-space: pre;
white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -o-pre-wrap;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 1.5%;
margin-bottom: 2.5%;
padding-top: 0%;
color: yellow}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Project Switch</title>
</head>
<body>
<div class = title>
Project Progress: <strike>PERL5</strike> PCRE / Stack-less Just in time compiler
</div>
<div class = intro>
This is an update about my project progress in SPO600, I have switched projects and made some steps forward.
</div>
<div class = section>
<h3>Perl5</h3>
In my previous <a href="http://spo-asm.blogspot.ca/2015/03/project-updates.html">post</a> I ruled out some areas in Perl5 and was heading towards the regular expression engine as an area to optimize. Well I got a response from the mailing list saying the regular expression engine should be fine and does not really need any simple optimizations. I was kind of stubborn in thinking I could find something in perl and I kept looking when nothing obvious was there. In hindsight I should have stopped and tried out different packages sooner.
</div>
<div class = section>
<h3>PCRE and sljit</h3>
After a suggestion from my professor I have looked at <a href="http://www.pcre.org/">PCRE(Perl compatible regular expressions)</a> This small library deals with parsing regular expressions using similar semantics as perl. I downloaded the source via subversion
<div class = codeblock>
<pre>svn co svn://vcs.exim.org/pcre2/code/trunk pcre</pre>
</div>
and found a folder in the source called <a href="http://www.exim.org/viewvc/pcre2/code/trunk/src/sljit/">sljit</a>. There are many files in that directory with architecture specific code. They have some for x86, some for ARM, some for Sparc and more, this peaked my interest so I looked into it. I found that sljit is a stack-less just in time compiler which is cpu independent, <a href="http://sljit.sourceforge.net/">more info here</a>. Specifically in pcre it is part of a <a href="http://sljit.sourceforge.net/pcre.html">pcre performance project</a> which uses sljit to improve the pattern matching speed of pcre.
</div>
<div class = section>
<h3>Moving forward</h3>
I have just found all this recently, so for now I will be looking into any functions that can be ported over to another architecture or any areas that look promising for optimization. I hope to soon zero in on one particular location, and actually get something done.
</div>
<div class = end>
Thanks for reading
</div>
</body>
</html>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-76857051482459837612015-03-24T14:56:00.000-07:002015-03-24T14:56:06.086-07:00Project updates<html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 95%;
line-height: 140%;
white-space: pre;
white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -o-pre-wrap;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 2.5%;
margin-bottom: 2.5%;
color: yellow}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Project Progress</title>
</head>
<body>
<div class = title>
Project Progress: PERL5
</div>
<div class = intro>
This is an update about my project progress in SPO600
</div>
<div class = section>
<h3>Ruled out</h3>
In my previous <a href="http://spo-asm.blogspot.ca/2015/03/project-progress.html">post</a> I mentioned three areas, The tail call optimization, Regex super - linear cache and inline assembly.
After contacting the perl5 community on irc it seems that the tail call optimization portion should not be there, therefore it has been ruled out. I mailed the mailing list about any suggestions regarding the inline assembly portion and any other ideas they had but I have not received word back as of now. I kind of ruled out this assembly code for now unless further information comes up to suggest it has potential.
</div>
<div class = section>
<h3>Moving on</h3>
In proceeding with this project I have not narrowed down or progressed as much as I would have liked to by this time, I am just spinning my wheels. For now I will work at understanding how I could improve the regex engine while also exploring other packages to see if I can find some more straightforward things to accomplish. Ideally I would like to find places I could use inline assembly optimizations or compiler intrinsics, both of which we have been looking at recently in the SPO600 course.
</div>
</body>
</html>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-77051545397997863782015-03-15T19:25:00.000-07:002015-03-16T15:54:22.079-07:00Project Progress<html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 95%;
line-height: 140%;
white-space: pre;
white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -o-pre-wrap;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 2.5%;
margin-bottom: 2.5%;
color: yellow}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Project Progress</title>
</head>
<body>
<div class = title>
Project Progress: PERL5
</div>
<div class = intro>
This post is about a project I am starting in my SPO600 class that requires me to optimize a portion of the lamp stack.
</div>
<div class = section>
<h3>Areas to optimize</h3>
I have chosen Perl5 as the package that I will be working on and have found a few areas that may be good areas to optimize and make changes.<br/>
The first two areas I found using the <a href="">perl todo list</a>. <br/> The first was tail call optimization:<br/>
Seen at <a href="http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod#l1130">line 1130</a>.<br/>
This would essentially have me find areas where tail call optimization is possible and rewrite them to implement it.
Here is a link that explains what tail call optimization is: <a href="http://c2.com/cgi/wiki?TailCallOptimization"> TCO</a><br/>
<br/>
The second area I found was in regards to Perls regular expression engine.<br/>
Seen at <a href="http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod#l1154">line 1154</a> <br/>
In their engine certain regular expressions end up taking exponential time. They have a workaround for this called super-linear cache but they say the code has not been well maintained and could use improvement. I found the location of this problem in the source by grepping for the keyword 'super-linear', found at <a href="http://perl5.git.perl.org/perl.git/blob/HEAD:/regexec.c#l6737"> regexec.c</a>.<br/>
It seems like this could be an area for optimization although I am not very confident about how I would attempt this or what I would change to improve it because I do not have a strong knowledge of how a regular expression engine works. <br/>
<br/>
The final one I found by looking through the perl 5 git repository(<a href="http://perldoc.perl.org/perlhack.html#Read-access-via-Git">Instructions here</a>), I found some sections of inline assembly by grepping for the keyword 'asm' using 'grep -r asm ./*' these sections were in a file called os2.c: <br/>
<a href="http://perl5.git.perl.org/perl.git/blob/HEAD:/os2/os2.c#l4587">my_emx_init()</a> <br/>
<a href="http://perl5.git.perl.org/perl.git/blob/HEAD:/os2/os2.c#l4622">my_os_version</a><br/>
These functions may potentially be able to be ported to aarch64 syntax.
I am a bit uncertain about this area because I am not sure what this code does or if it is important or not.
</div>
<div class = section>
<h3>Why Perl?</h3>
I chose Perl for my project because the community seems really clear and organized. They have a todo list with various tasks which is very useful and as you can see above it helped me a lot with regards to finding areas to work on. Also they have a very active community, In their mailing list archive, there are daily messages which makes me confident that if I need help or need to ask a question I won't be waiting for extended periods of time.
</div>
<div class = section>
<h3>Proceeding</h3>
Looking at my 3 options I believe the tail call optimizations might have a large impact depending on how many areas I can find. I would like to implement some code involving the aarch64 platform because that would relate to the SPO600 course the most but I am uncertain about the inline assembly code that I have found so far. The regular expression area seems really interesting, but I am afraid it would not be feasible given the time I have, it is something I will definitely consider if my project doesn't go as planned.
<br/>
Proceeding with this project I plan on starting to work out how to apply the tail call optimizations while I engage with the upstream community about which direction is the best for them and for me. I also plan on benchmarking Perl on x86 and aarch64 to see if I can find any further areas or functions that may let me perform a platform specific optimization.
</div>
<div class = section>
<h3>Perl Upstream</h3>
Perl has a relatively straightforward guide on their website <a href="http://perldoc.perl.org/perlhack.html#PATCHING-PERL"> here</a>. <br/>To summarize, if you have a patch either use <a href="https://rt.perl.org/Public/">perlbug</a> or send it to perlbug@perl.org. Once the patch has been processed it will be posted on the mailing list for discussion, you are encouraged to join the discussion and promote your patch. They recommend using git, You can get the source by using 'git clone git://perl5.git.perl.org/perl.git perl', Once you make changes you can use git diff to make a patch, this compares your branch and the main branch to produce the patch.
</div>
<div class = section>
<h3>Conclusions</h3>
This project has made me the most nervous of any project I have had so far. It is filled with uncertainties, a couple weeks ago I was uncertain I would even find anything to work on but eventually I did. Now I am uncertain on which direction to go and whether or not my contributions will be accepted. Regardless of what happens it is a great learning experience and I now appreciate the complexity of large projects like Perl or other packages in the lamp stack.
</div>
</body>
</html>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-74189239140709695182015-03-02T10:16:00.000-08:002015-03-02T10:31:27.037-08:00Device Access <html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 95%;
line-height: 140%;
white-space: pre;
white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -o-pre-wrap;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 2.5%;
margin-bottom: 2.5%;
color: yellow}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Device Access</title>
</head>
<body>
<div class = title>
Device Access
</div>
<div class = intro>
This post is regarding a presentation that I did in SPO600 where I talked about device access in low level languages.
</div>
<div class = section>
<h3> Access using I/O Ports </h3>
I/O ports: These look just a memory cell to a computer but they connect any data written
or read from it to a device that is connected to the computer.
<div class = codeblock>
<pre> MOV DX, 0487 ; Port number.
MOV AL, 1 ; Number to write to the port.
OUT DX, AL ; Write to the port.
IN DX, AL ; Read from the port.</pre>
</div>
In this case DX takes in the port number and AL takes in the number you are writing to the port. OUT will write the number stored in AL to DX and IN will read from the device and store results in AL.<br>
<span style="font-weight: bold">Danger: </span><br/>
Accessing devices in this way is dangerous. Every port cannot be accessed in the same
way, Some ports are read only, and some are write only and some are both. If you don’t
know exactly what port you are using you risk damaging any device that is accessing
that port.
</div>
<div class = section>
<h3> Access using Interrupts </h3>
An alternative method is using software interrupts to gain access to a device, This is often used in more complex devices such as the mouse in order to simplify accessing it.
<div class = codeblock>
<pre> MOV AX, 0 ; Access subfunction 0 of int 33h
INT 33 ; Make the interrupt call</pre>
</div>
In the Example we use Interrupt 33 which provides access the mouse, and moving a
number into AX will allow you to access a specific function of the mouse (function 0
returns a value which indicates if a mouse has been detected/installed). There are many, many more sub-functions available <a href="http://www.ctyme.com/intr/int-33.htm"> here</a> that allow you to access many different parts of the mouse.
</div>
<div class = section>
<h3> Platform / architecture Issues </h3>
I/O and device instructions are very processor dependent, due to the details of how a
processor moves data in and out, most of the source code is coded specifically for a
platform. For example in the linux kernel they have many different files including io.h that are stored in folders specific to the architecture.<br/>
Arm: <a href="http://lxr.free-electrons.com/source/arch/arm/include/asm/io.h"> io.h</a><br/>
x86: <a href="http://lxr.free-electrons.com/source/arch/x86/include/asm/io.h"> io.h</a><br/>
</div>
<div class = section>
<h3> Conclusions </h3>
I was unable to find to much information about this topic and Chris Tyler, my professor for SPO600, pointed out to me that this is because all the device access SHOULD be handled by the operating system unless you are writing for device drivers or doing embedded programming.
</div>
<div class = section>
<h3> Resources </h3>
<ul>
<li>
<a href="https://courses.engr.illinois.edu/ece390/books/artofasm/CH03/CH03-6.html">1</a> A portion of a book explaining IO ports and interrupts
</li>
<li>
<a href="http://www.petesqbsite.com/sections/tutorials/tuts/petter_new_asm/ASMTUT5.TXT">2</a> A tutorial explaining IO ports and software interrupts
</li>
<li>
<a href="http://lwn.net/images/pdf/LDD3/ch09.pdf">3</a> One chapter of a book about device driver programming detailing IO ports and how they are used with devices
</li>
</ul>
</div>
<div class = end>
Thanks for reading
</div>
</body>
</html>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-1453809589133523852015-02-28T22:25:00.003-08:002015-03-02T20:31:53.106-08:00Compiled C lab<!-- Creator : groff version 1.21 -->
<!-- CreationDate: Sun Mar 1 00:43:27 2015 -->
<html>
<head>
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre {
font-family: "Courier 10 Pitch", Courier, monospace;
font-size: 82%;
line-height: 140%;
}
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
.codeblock { border: solid 2px grey;
background-color: black;
margin-left: 2.5%;
margin-top: 2.5%;
margin-bottom: 2.5%;
color: yellow;
overflow-x: auto;}
.title {
font-family: "Verdana", Times, Serif;
font-size: 150%;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
}
.intro {
font-family: "Verdana", Times, Serif;
font-size: 95%;
font-style: italic;
font-weight: bold;
border-bottom: solid 2px grey;
text-align: center;
padding-bottom: 2.5%;
}
.section {
margin-left: 2.5%
font-family: "Verdana", Times, Serif;
font-size: 100%;
border-bottom: solid 2px grey;
margin-bottom: 2.5%;
padding-bottom: 2.5%;
margin-top: 2.5%;
}
.section h3{
text-decoration: underline;
margin-bottom: 1.5%;
margin-top: 1.5%;
}
.end{
font-style: italic;
font-size: 95%;
margin-top: 2.5%;
margin-left: 5%;
font-family: "Verdana", Times, Serif;
}
</style>
<title>Compiled C Lab</title>
</head>
<body>
<div class = title>
Compiled C Lab
</div>
<div class = intro>
This post explores some of the gcc compiler options and how they affect the assembly code that gcc creates.
</div>
<div class = section>
<h3>Files Used</h3>
Further in this post I will be referencing some simple files, here is the original c source:
<div class = codeblock>
<pre>//hello.c
#include <stdio.h>
int main(){
printf("Hello Everybody\n");
}</pre>
</div>
<div class = codeblock>
<pre>//hello2.c
#include <stdio.h>
int main(){
printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10);
}</pre>
</div>
<div class = codeblock>
<pre>//hello3.c
#include <stdio.h>
void output(){
printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10);
}
int main(){
output();
}</pre>
</div>
</div>
<div class = section>
<h3>Objdump</h3>
I will be using these three commands with objdump in order to analyze the gcc output file.
<ul>
<li>
objdump -f [filename]
<br/> This shows you header information, such as the architecture it was compiled for, file name and more.
</li>
<li>
objdump -s [filename]
<br/> This shows you detailed section information.
</li>
<li>
objdump --source [filename]
<br/> This disassembles your source code and shows you it, alongside the assembly instructions generated.
</li>
</ul>
</div>
<div class = "section">
<h3>Compiling and analyzing</h3>
We will be compiling the source code with a number of different options and analyzing it with the objdump options shown above.
<div class = "section">
<h3>First compile: gcc hello.c -g -O3 -fno-builtin -o origin</h3>
Using <span style="font-style:bold"> objdump -f</span> gives us header information. This information changes minimally within
one computer, but as soon as you start moving to different computers and different architectures it will change a lot. This is the output:
<div class = codeblock>
<pre> origin: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000000000400400</pre>
</div>
Using <span style="font-style:bold">objdump -s</span> gives us valuable section contents. For example the "Hello Everybody" string is stored in the section .rodata.
<div class = codeblock>
<pre> Contents of section .rodata:
4005d8 01000200 48656c6c 6f204576 65727962 ....Hello Everyb
4005e8 6f64790a 00 ody..</pre>
</div>
Using <span style="font-style:bold">objdump --source</span> gives us our source code alongside assembly instructions. In this example we can see that our source code is stored within a .text section and inside a main function.
<div class = codeblock>
<pre> 000000000040050c <main>:
#include <stdio.h>
int main(){
40050c: 55 push %rbp
40050d: 48 89 e5 mov %rsp,%rbp
printf("Hello Everybody\n");
400510: bf dc 05 40 00 mov $0x4005dc,%edi
400515: b8 00 00 00 00 mov $0x0,%eax
40051a: e8 c1 fe ff ff callq 4003e0 <printf@plt>
}
40051f: 5d pop %rbp
400520: c3 retq
400521: 90 nop
400522: 90 nop
400523: 90 nop
400524: 90 nop</pre>
</div>
</div>
<div class = section>
<h3>Second compile: gcc hello.c -g -O0 -fno-builtin -static -o step1 </h3>
Adding -static.<br/>
Overall the section size and the source is much larger than the original because static adds all the linked code from various libraries in C.<br/>
As you can see below, after outputting the objdump results to files, the --source file is about 700 times larger and the section information is about 200 times larger than the original.
<div class = codeblock>
<pre>
9399 Feb 25 12:11 origin-src.txt
10865 Feb 25 12:11 origin-s.txt
142 Feb 25 12:14 step1-f.txt
7103611 Feb 25 12:15 step1-src.txt
2723639 Feb 25 12:14 step1-s.txt
</pre>
</div>
</div>
<div class = section>
<h3>Third compile: gcc hello.c -g -O0 -o step2 </h3>
Removing -fno-builtin.
<div class = codeblock>
<pre> 000000000040050c <main>:
#include <stdio.h>
int main(){
40050c: 55 push %rbp
40050d: 48 89 e5 mov %rsp,%rbp
printf("Hello Everybody\n");
400510: bf cc 05 40 00 mov $0x4005cc,%edi
400515: e8 c6 fe ff ff callq 4003e0 <puts@plt>
}
40051a: 5d pop %rbp
40051b: c3 retq</pre>
</div>
This changes the function call by removing some builtin function optimizations,for example instead of calling 'printf' it calls 'puts'
</div>
<div class = section>
<h3>Fourth compile: gcc hello.c -O0 -fno-builtin -o step3 </h3>
Removing -g
<div class = codeblock>
<pre>Contents of section .got.plt: Contents of section .got.plt:
6008c8 e0066000 00000000 00000000 00000000 ..`............. 6008c8 e0066000 00000000 00000000 00000000 ..`.............
6008d8 00000000 00000000 e6034000 00000000 ..........@..... 6008d8 00000000 00000000 e6034000 00000000 ..........@.....
6008e8 f6034000 00000000 ..@..... 6008e8 f6034000 00000000 ..@.....
Contents of section .data: Contents of section .data:
6008f0 00000000 00000000 00000000 00000000 ................ 6008f0 00000000 00000000 00000000 00000000 ........ ........
Contents of section .comment: Contents of section .comment:
0000 4743433a 20284465 6269616e 20342e37 GCC: (Debian 4.7 0000 4743433a 20284465 6269616e 20342e37 GCC: (Debian 4.7
0010 2e322d35 2920342e 372e3200 4743433a .2-5) 4.7.2.GCC: 0010 2e322d35 2920342e 372e3200 4743433a .2-5) 4.7.2.GCC:
0020 20284465 6269616e 20342e34 2e372d33 (Debian 4.4.7-3 0020 20284465 6269616e 20342e34 2e372d33 (Debian 4.4.7-3
0030 2920342e 342e3700 ) 4.4.7. 0030 2920342e 342e3700 ) 4.4.7.
Contents of section .debug_aranges: >
0000 2c000000 02000000 00000800 00000000 ,............... >
0010 0c054000 00000000 15000000 00000000 ..@............. >
0020 00000000 00000000 00000000 00000000 ................ >
Contents of section .debug_info: >
0000 91000000 02000000 00000801 53000000 ............S... >
0010 01690000 00120000 000c0540 00000000 .i.........@.... >
0020 00210540 00000000 00000000 00020807 .!.@............ >
0030 00000000 02010871 00000002 02074000 .......q......@. >
0040 00000204 07050000 00020106 73000000 ............s... >
0050 0202055f 00000003 0405696e 74000208 ..._......int... >
0060 057f0000 00020807 88000000 0201067a ...............z >
0070 00000004 01910000 00010357 0000000c ...........W.... >
0080 05400000 00000021 05400000 00000000 .@.....!.@...... >
0090 00000001 00 ..... >
Contents of section .debug_abbrev: >
0000 01110125 0e130b03 0e1b0e11 01120110 ...%............ >
0010 06000002 24000b0b 3e0b030e 00000324 ....$...>......$ ></pre>
</div>
Using the linux command diff -y [filename1] [filename2] we can compare these two files, in the left file which is the original we can see that the section contents are filled with many sections prefixed by .debug, this is because the -g option adds debugging information. When we remove -g all the debugging information is gone.
</div>
<div class = section>
<h3> Fifth compile: gcc hello2.c -g -O0 -fno-builtin -o step4 </h3>
Changing C file to hello2.c which has additional arguments in printf.
<div class = codeblock>
<pre>000000000040050c <main>:
#include <stdio.h>
int main(){
40050c: 55 push %rbp
40050d: 48 89 e5 mov %rsp,%rbp
400510: 48 83 ec 30 sub $0x30,%rsp
printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10);
400514: c7 44 24 20 0a 00 00 movl $0xa,0x20(%rsp)
40051b: 00
40051c: c7 44 24 18 09 00 00 movl $0x9,0x18(%rsp)
400523: 00
400524: c7 44 24 10 08 00 00 movl $0x8,0x10(%rsp)
40052b: 00
40052c: c7 44 24 08 07 00 00 movl $0x7,0x8(%rsp)
400533: 00
400534: c7 04 24 06 00 00 00 movl $0x6,(%rsp)
40053b: 41 b9 05 00 00 00 mov $0x5,%r9d
400541: 41 b8 04 00 00 00 mov $0x4,%r8d
400547: b9 03 00 00 00 mov $0x3,%ecx
40054c: ba 02 00 00 00 mov $0x2,%edx
400551: be 01 00 00 00 mov $0x1,%esi
400556: bf 20 06 40 00 mov $0x400620,%edi
40055b: b8 00 00 00 00 mov $0x0,%eax
400560: e8 7b fe ff ff callq 4003e0 <printf@plt>
}
400565: c9 leaveq
400566: c3 retq
400567: 90 nop
</pre>
</div>
As you can see, the numbers 1 to 5 get placed in the registers esi, edx, ecx, r8d, r9d, but the rest use %rsp in an interesting way.
Consider this line:
<div class = codeblock>
<pre>movl $0x7,0x8(%rsp)</pre>
</div>
%rsp is the stack pointer, and using the brackets is called dereferencing, this basically means it moves the value 0x7 into the location which is 8 bytes away from the stack.
</div>
<div class = section>
<h3> Sixth compile: gcc hello3.c -g -O0 -fno-builtin -o step5 </h3>
We are now using hello3.c which moves the printf to another function named output.
The code is still mostly similar but as expected it moves all the instructions related to printf outside main into a new function.
<div class = codeblock>
<pre>000000000040050c <output>:
#include <stdio.h>
void output(){
40050c: 55 push %rbp
40050d: 48 89 e5 mov %rsp,%rbp
400510: 48 83 ec 30 sub $0x30,%rsp
printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10);
400514: c7 44 24 20 0a 00 00 movl $0xa,0x20(%rsp)
40051b: 00
40051c: c7 44 24 18 09 00 00 movl $0x9,0x18(%rsp)
400523: 00
400524: c7 44 24 10 08 00 00 movl $0x8,0x10(%rsp)
40052b: 00
40052c: c7 44 24 08 07 00 00 movl $0x7,0x8(%rsp)
400533: 00
400534: c7 04 24 06 00 00 00 movl $0x6,(%rsp)
40053b: 41 b9 05 00 00 00 mov $0x5,%r9d
400541: 41 b8 04 00 00 00 mov $0x4,%r8d
400547: b9 03 00 00 00 mov $0x3,%ecx
40054c: ba 02 00 00 00 mov $0x2,%edx
400551: be 01 00 00 00 mov $0x1,%esi
400556: bf 30 06 40 00 mov $0x400630,%edi
40055b: b8 00 00 00 00 mov $0x0,%eax
400560: e8 7b fe ff ff callq 4003e0 <printf@plt>
}
400565: c9 leaveq
400566: c3 retq
0000000000400567 <main>:
int main(){
400567: 55 push %rbp
400568: 48 89 e5 mov %rsp,%rbp
output();
40056b: b8 00 00 00 00 mov $0x0,%eax
400570: e8 97 ff ff ff callq 40050c <output>
}
400575: 5d pop %rbp
400576: c3 retq
400577: 90 nop
</pre>
</div>
</div>
<div class = section>
<h3> Seventh compile: gcc hello3.c -g -O3 -fno-builtin -o step6 </h3>
We are changing the optimization level from 0 to 3, this will change our code quite a lot and add optimizations.
<div class = codeblock>
<pre>0000000000400520 <output>:
#include <stdio.h>
void output(){
400520: 48 83 ec 38 sub $0x38,%rsp
printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10);
400524: 41 b9 05 00 00 00 mov $0x5,%r9d
40052a: 41 b8 04 00 00 00 mov $0x4,%r8d
400530: c7 44 24 20 0a 00 00 movl $0xa,0x20(%rsp)
400537: 00
400538: c7 44 24 18 09 00 00 movl $0x9,0x18(%rsp)
40053f: 00
400540: b9 03 00 00 00 mov $0x3,%ecx
400545: c7 44 24 10 08 00 00 movl $0x8,0x10(%rsp)
40054c: 00
40054d: c7 44 24 08 07 00 00 movl $0x7,0x8(%rsp)
400554: 00
400555: ba 02 00 00 00 mov $0x2,%edx
40055a: c7 04 24 06 00 00 00 movl $0x6,(%rsp)
400561: be 01 00 00 00 mov $0x1,%esi
400566: bf 30 06 40 00 mov $0x400630,%edi
40056b: 31 c0 xor %eax,%eax
40056d: e8 6e fe ff ff callq 4003e0 <printf@plt>
}
400572: 48 83 c4 38 add $0x38,%rsp
400576: c3 retq </pre>
</div>
As you can see our program has changed a bit, the order of the mov statements has changed and different statements are being used.
For example consider these 2 lines:
<div class = codeblock>
<pre> xor %eax,%eax ;O3
mov $0x0,%eax ; O0</pre>
</div>
Both of these statements are doing the same thing, moving the value of 0 into eax.
When optimizations are turned on it uses xor on a register to get the value of 0, this is much faster because it does not have to move any data at all.<br/>
Note: when you xor any value to itself the result is 0.
</div>
</div>
<div class = section>
<h3> Conclusions </h3>
The complexity of all the compiler options is massive and this post only scratches the surface of all the options available.
This provides some insight into the amount of assembly code generated even in the most basic C programs. It gives you an idea how complex assembly can get especially if the C code was larger, it also allows you to have a glimpse behind the scenes of C.
</div>
<div class = end>
Thanks for reading.
</div>
</body>
</html>
James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-92049945198572443082015-02-01T15:13:00.002-08:002015-02-02T17:27:03.736-08:00Lab 3<span style="font-size: large;">Lab 3</span>,<span style="font-size: large;"> Python Profiling</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><span style="font-size: small;">We decided to not continue using MySQL chose to download python. We thought it would be a good idea to test it on both servers, australia: 64 bit x86 and red: 64 bit Arm.</span></span><br />
<span style="font-size: large;"><span style="font-size: small;">We used python 3.4 using this <a href="https://www.python.org/ftp/python/3.4.2/Python-3.4.2.tgz">link</a> and wget to get it onto the server. After we un-tarred it (tar -xvf [filename] ) we ran the ./configure which gave us a makefile and then we made a few changes. </span></span><br />
<span style="font-size: large;"><span style="font-size: small;">We changed line 72 in the makefile which was<br />BASECFLAGS= -Wno-unused-result</span></span><br />
<span style="font-size: large;"><span style="font-size: small;">to</span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">BASECFLAGS= -Wno-unused-result -pg</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"><br /></span></span></span></span>
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">This allows python to output a gmon.out file when python executes, this contains all the profiling information.</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">We now needed to run something with Python so we would have a valid gmon.out file.</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">we found this resource: <a href="https://docs.python.org/devguide/runtests.html">tests</a> which showed us how to run python tests using the command</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">./python -m test</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"><br /></span></span></span></span>
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">On Australia:</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">This went through 389 tests when it was completed we had a gmon.out file</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">To make the profile easier to read our professor suggested to use gprof2dot to generate a png image file of the profile, the command is:</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">gprof ./python | gprof2dot | dot -Tpng > Profile.png </span></span></span></span><span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"><a href="http://i.imgur.com/3hZ9KyT.jpg">(full image)</a></span></span></span></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"><a href="http://imgur.com/3hZ9KyT" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMlhXBOJ1nTGeN6lcZKgnaPJ9VuUyqTWwf2CEJ37_MX919u7tuy32lyjPIE_6aQ6G6QCn0hOkkITvP6u-NUN066uqA3CwA0aPbs0Z9Ax-dCedVKJoNqc3oVKqVew39vpfboSAeEBklLbE/s1600/pythongprof.png" height="400" width="263" /></a></span></span></span></span></div>
<br />
<br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"> </span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">On Red: </span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">This went through most of the tests but stalled at the last one with a message:</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">_mcleanup: gmon.out: No such file or directory </span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">(I later realized that message popped up a lot during the tests which suggests that it is not relevant to the stalling)</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">After waiting a long time I just used control-C to get out of the test and I still had a gmon.out with 388 tests. Again, used gprof2dot to get the image file of the profile<a href="http://i.imgur.com/qi2CpZg.jpg"> (full image)</a></span></span></span></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUIHSmJTFZekmJo3nmDia97F14a2gwjy_JgXKCTY6QkI_phqovTbZm3qX5J3jrwMJIRqiiCpgwIAAwLTvFBFIaRLDgut0cE5GW9bJkZsVeIfKVfph2AgNsf3wBlN2t93znNY-JG-8dM3U/s1600/python-arm64.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUIHSmJTFZekmJo3nmDia97F14a2gwjy_JgXKCTY6QkI_phqovTbZm3qX5J3jrwMJIRqiiCpgwIAAwLTvFBFIaRLDgut0cE5GW9bJkZsVeIfKVfph2AgNsf3wBlN2t93znNY-JG-8dM3U/s1600/python-arm64.png" height="320" width="312" /></a></div>
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">Although both pictures appear very different I was looking at the function calls and they are the same, it is simply the layout of the picture that changes drastically.</span></span></span></span><br />
<br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">There is one function that stands out on both systems is PyEval_EvalFrameEx, it takes up 12% on x86 and 14% on Arm. I looked up the source code <a href="https://github.com/python/cpython/blob/master/Python/ceval.c#L795">here</a> and the function itself is</span></span></span></span><span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"> around 2500 lines , it is safe to say I'm not sure what it is doing. It does have an interesting comment section <a href="https://github.com/python/cpython/blob/master/Python/ceval.c#L827">line 827</a> about optimizations that this function has, this is essentially trying to avoid leading the CPU on a mispredicted branch, this relates to what Chris Tyler was talking about in our class(on 1/29/2015) when he mentioned CPUs guessing the correct path to follow.</span></span></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"><br /></span></span></span></span>
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;"><br /></span></span></span></span>
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">In the end I think red's profile is hard to directly compare to australia's because of the difference of layout of the 2 files but the EvalFrameEx takes up the most usage on both systems. It seems like they are aware of this with the extensive optimization of it.</span></span></span></span><br />
<br />
<span style="font-size: large;"><span style="font-size: small;"><span style="font-size: large;"><span style="font-size: small;">Thanks for reading</span></span></span></span>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-5183542146878217052015-02-01T15:13:00.001-08:002015-02-02T17:26:52.732-08:00Lab 2<span style="font-size: large;">Lab 2, Benchmarking Mysql</span><br />
<br />
<span style="font-size: large;"><span style="font-size: small;">For lab 2 we had to benchmark a software package. We chose MySql and found the link <a href="http://dev.mysql.com/downloads/mysql/">here</a> and under the selected platform just choose Source Code. We were using a 64 bit x86 machine running fedora 21 and there was not a version for fedora so we just chose generic Linux option which is at the very bottom. We copied that <a href="http://dev.mysql.com/get/Downloads/MySQL-5.6/mysql-5.6.22.tar.gz">link</a> and used the wget command to install it via command line. Once we Un-tarred it (tar -xvf [filename]) we started looking for the configure script but it did not have one, it used cmake, not configure. So we installed cmake on the server and ran it and it gave us a makefile. We ran make without the useful -j option which allows it to take advantage of multiple cores, so that delayed us, we ending up waiting for a long time. Once it was made we needed to benchmark it, eventually we found out how to use the benchmark suite through this <a href="http://dev.mysql.com/doc/refman/5.0/en/mysql-benchmarks.html">tutorial</a>. We were running low on time in class so we stopped it after the first test but it gave us this results file:</span></span><br />
<span style="font-size: large;"><span style="font-size: small;"></span></span><br />
<span style="font-size: large;"><span style="font-size: small;"> per operation:<br />Operation seconds usr sys cpu tests<br />alter_table_add 100.00 0.02 0.00 0.02 100 <br />alter_table_drop 101.00 0.01 0.00 0.01 91 <br />create_index 2.00 0.00 0.00 0.00 8 <br />create_table 11.00 0.00 0.00 0.00 28 <br />drop_index 2.00 0.00 0.00 0.00 8 <br />drop_table 14.00 0.00 0.00 0.00 28 <br />insert 340.00 0.37 0.20 0.57 9768 <br />select_distinct 2.00 0.19 0.00 0.19 800 <br />select_group 1.00 0.24 0.02 0.26 2800 <br />select_join 0.00 0.06 0.00 0.06 100 <br />select_key_prefix_join 1.00 0.34 0.00 0.34 100 <br />select_simple_join 1.00 0.08 0.00 0.08 500 <br />TOTALS 575.00 1.31 0.22 1.53 14331 </span></span><br />
<span style="font-size: large;"><span style="font-size: small;"><br /></span></span>
<span style="font-size: large;"><span style="font-size: small;">I'm not quite sure if I understand this table, particularly why there is so many seconds but there was so little time taken on usr or sys side. My theory is that the usr and sys and cpu columns are the percent of cpu it was using at the time, but I could be wrong.</span></span><br />
<br />
I think this lab was quite useful, we learned how to use cmake and how to speed up the make process with -j and we did get results, and although I might not understand them yet, they're still results!<br />
<br />
<span style="font-size: large;"><span style="font-size: small;">Thanks for reading.</span></span><br />
<span style="font-size: large;"><br /></span>James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0tag:blogger.com,1999:blog-1292290740143629365.post-8637895458236167912015-01-26T20:32:00.000-08:002015-01-27T22:52:09.672-08:00Contributing to open source projects<b><span style="font-size: large;">Lab 1, Contributing to open source projects</span></b><br />
I have chose 3 projects wmii, Go, and plan9. The lab suggested 2 projects but I did three because<br />
wmii is very small and I couldn't really get that much content about it, but I still wanted to advertise it because I think it's a ton of fun.<br />
<b><a href="https://code.google.com/p/wmii/">Wmii</a> </b>is a small, simple dynamic tiling window manager that borrows ideas & aesthetic from the plan 9 operating system, specifically the acme text editor.<br />
<a href="http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs"><b>Plan9</b></a> is a very interesting operating system that was developed shortly after UNIX by the same team(Pike, Thompson, Richie). They were essentially given free reign to create this OS and they came up with something that is very unique and is full of very interesting ideas. It has been open-source since 2002.<br />
<a href="http://en.wikipedia.org/wiki/Go_%28programming_language%29"><b>Go</b></a> is a new programming language being developed at Google by prominent Software engineers namely Ken Thompson(UNIX, B, UTF-8) and Rob Pike(UTF-8, Plan9) with help from the open source community.<br />
<br />
<span style="font-size: large;"><b>Go </b></span><br />
<br />
Go is distributed under a BSD style patent grant license. <a href="http://golang.org/LICENSE">License</a> | <a href="https://code.google.com/p/go/source/browse/PATENTS"> Patent grant</a><br />
To start contributing to the Go Language you don't start with coding, Their site suggests you discuss your idea with other members of the open source community via their <a href="https://groups.google.com/forum/#!forum/golang-nuts">mailing list.</a> This helps you to make sure someone else isn't doing it and that you can verify that it is a good idea so you don't waste your time. Go uses github extensively for all their issue tracking and source code. As you can see <a href="https://github.com/golang">here</a>, as well as the language itself you can contribute to various aspects of Go, such as networking libraries, the compiler, mobile libraries, cryptography libraries and much more, there is plenty to do! All code must be reviewed, they use a custom git command called git <a href="https://godoc.org/golang.org/x/review/git-codereview">codereview</a>, this is provides easy commands when working with git and the <a href="http://en.wikipedia.org/wiki/Gerrit_%28software%29">Gerrit code review system</a> that Go uses. Once you have made your change you mail it to be reviewed using the 'git codereview mail' command. You may receive comments from the reviewer, you can then modify your code accordingly and use the mail command again to resubmit your new code, it continues like this until you receive a comment saying Looks good to me or LGTM. When your code has been approved you can now sync(git codereview sync) and then submit your code to the master branch(git codereview submit).<br />
More info: <a href="https://golang.org/doc/contribute.html#tmp_4"> Contribute to Go</a><br />
<br />
<span style="font-size: large;"><b>WMII</b></span><br />
<br />
Wmii is distributed under the <a href="http://opensource.org/licenses/mit-license.php">MIT license</a>.<br />
Wmii uses google code to host their source, track issues and submit issues. They also use the Mercurial version control system for their repositorys. Since it is a relatively small project simply cloning from Mercurial making edit's and committing should allow one of the project member's(there are only 4) to see your code and either accept or deny the patch. You can keep track of issues and keep in touch with the project members <a href="https://code.google.com/p/wmii/issues/list">here</a>. I apologize but I just could not find that much information about submitting patches and such. I suggest you give it a try, it's fun and pretty simple once you go through the <a href="https://wmii.googlecode.com/hg/doc/wmii.pdf">user guide</a> and maybe if you start using it you can find some bugs that need fixing.<br />
<br />
<b><span style="font-size: large;">PLAN9</span></b><br />
<br />
Plan 9 is distributed under a dual license <a href="http://en.wikipedia.org/wiki/GNU_General_Public_License">GNU GPLv2</a> | <a href="http://en.wikipedia.org/wiki/Lucent_Public_License">Lucent license</a><br />
I thought plan9 would be a interesting deviation from the usual projects that upstream through git or mercurial or other repositories. Since it is a operating system, you will have to be running it. Image files can be found <a href="http://plan9.bell-labs.com/wiki/plan9/download/">here</a> and info about Installing it can be found <a href="http://plan9.bell-labs.com/wiki/plan9/installation_instructions/">here</a>. Plan9 has a file server called sources as a host for their <a href="http://plan9.bell-labs.com/wiki/plan9/Sources_repository/index.html">sources</a> repository, you can browse through it on the <a href="http://plan9.bell-labs.com/sources/">web</a> but it seemed to be a bit buggy. In plan 9 you can simply mount that server on your directory using '9fs sources' and browse through it as if it's local through the path which should be /n/sources/. Similarly to Go and I think this applies to all open source projects, discuss with your idea with other people, or if you don't have an idea you can ask for suggestions(<a href="http://plan9.bell-labs.com/wiki/plan9/mailing_lists/">mailing list info</a>). Once you have some code to post you use the command in plan9 simply called <a href="http://plan9.bell-labs.com/magic/man2html/1/patch">patch</a>. They set guidelines for you which basically say that you should explain your patch/bug fix/ update clearly, Follow <a href="http://plan9.bell-labs.com/magic/man2html/6/style">style guidelines </a>and update man pages when necessary. Once you submit with the patch command you can receive 2 messages: 'Sorry' or 'Applied'. If you receive 'Sorry' they will tell you why and what things you can change to fix it and if you receive 'Applied' then you've done well and the patch has been accepted.<br />
More info: <a href="http://plan9.bell-labs.com/wiki/plan9/how_to_contribute/index.html">how to contribute</a><br />
<br />
In conclusion I think with some small projects you might have to e-mail the project member's directly or chat on IRC to find a clear path to contributing. Concerning bigger projects, most of them really want contributor's so they have clear explanations posted to guide you through contributing, so I suppose I'm just a Stenographer, for now.<br />
<br />
Thanks for reading.<br />
<br />
<br />
<br />James Boyerhttp://www.blogger.com/profile/04119013221000296064noreply@blogger.com0