Thursday, 23 March 2017

GLIBC: Learning to Build It

One of the first things we need to do in order to get our work on GLIBC incorporated into the mainstream library is to test it. In order to test it, we need to build it and make changes and see the results of our changes.

In this post, I will document my efforts to build glibc on my local machine.
Since changes to glibc happen from time to time, its probably better to get the source files from Git, and then get new changes as they come. The website suggests doing:


git clone git://sourceware.org/git/glibc.git
cd glibc
git checkout --track -b local_glibc-2.25 origin/release/2.25/master

I followed the code precisely and now I have the cloned repository in my local machine with a branch. From there I made another branch called 'mktime-optimize' so that I can do my own work without messing up the master branch.

For building instructions, a link is given: https://sourceware.org/glibc/wiki/Testing/Builds

I think I want to build without installing, and hopefully the build will be enough to run tests on. The website gives a basic prototype:


$ mkdir $HOME/src $ cd $HOME/src $ git clone git://sourceware.org/git/glibc.git $ mkdir -p $HOME/build/glibc $ cd $HOME/build/glibc $ $HOME/src/glibc/configure --prefix=/usr $ make

Now.. I will apply the same logic to my branch and see what happens. Since I was in my home directory when I cloned the repository, I'll check for /src inside the glibc/ directory I have. I didn't see any /src, so this is confusing me a bit. I think that /src is a directory name that we create so that we don't get confused about which file group we are in. I THINK, as long as I remember that the /glibc on my root directory is the "source"... I should be fine. I'm going to try: mkdir -p /build/glibc, while in the root directory (not inside /glibc) and see what happens..

So, it didn't work, but using ' mkdir -p $HOME/build/glibc ' worked out and there is a /build directory in my root directory. Next, for step 5, I typed ' $HOME/glibc/configure --prefix=/usr (making sure to omit the /src part). I got the following:












It seems I have a problem since I am trying to build a Linux library system inside of Windows. I'm going to try to get a gcc or something close enough to work with:

http://stackoverflow.com/questions/6394755/how-to-install-gcc-on-windows-7-machine
https://sourceforge.net/projects/mingw-w64/files/

I started the download and selected the following options:

















Still didn't work.. so in a mixture of frustration and also some curiousity, I did all the build steps inside the Xerxes account (because its Linux-based out of my Matrix account). And.. after several minutes of seeing text screaming away.. it finished and appears to be built inside XERXES!! Hopefully I didn't or will not soon break the server!!

Thursday, 9 March 2017

Project: Optimizing GNU's GLIBC Code

For our individual projects, we are combining the skills we have learned thus far and attempting to make some improvements to the C libraries we all use everyday. The C code that I will be attempting to analyze and create optimizations for is:

mktime.c


The purpose of mktime.c is to "Convert a 'struct tm' to a time_t value".

In other words, time_t is the computer's tracking of time in seconds that has transpired since January 1, 1970. Since this number would be a very large number of seconds, most humans cannot easily convert a given date into time_t. Thus, humans would write something like March 10, 2017, and mktime.c would take that human language for the date (struct tm) and convert it into the amount of seconds from the beginning of computer time until now.

The reason why I decided to look into this function is because I use functions concerning 'time' for most of my own projects. I also thought that I wanted to take a crack at a function that would be used quite often by programmers everywhere and make a real contribution. The mktime.c function was fairly long, so I thought I would probably find SOMETHING I could make better inside all that code.

Below I will display the entire function for mktime.c and highlight some of the lines of code that look like I could possibly do something optimal with.

 ----------------------------------------------

/* Convert a 'struct tm' to a time_t value.
   Copyright (C) 1993-2017 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Paul Eggert <eggert@twinsun.com>.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

/* Define this to have a standalone program to test this implementation of
   mktime.  */
/* #define DEBUG_MKTIME 1 */

#ifndef _LIBC
# include <config.h>
#endif

/* Assume that leap seconds are possible, unless told otherwise.
   If the host has a 'zic' command with a '-L leapsecondfilename' option,
   then it supports leap seconds; otherwise it probably doesn't.  */
#ifndef LEAP_SECONDS_POSSIBLE
# define LEAP_SECONDS_POSSIBLE 1
#endif

#include <time.h>

#include <limits.h>

#include <string.h> /* For the real memcpy prototype.  */

#if defined DEBUG_MKTIME && DEBUG_MKTIME
# include <stdio.h>
# include <stdlib.h>
/* Make it work even if the system's libc has its own mktime routine.  */
# undef mktime
# define mktime my_mktime
#endif /* DEBUG_MKTIME */

/* Some of the code in this file assumes that signed integer overflow
   silently wraps around.  This assumption can't easily be programmed
   around, nor can it be checked for portably at compile-time or
   easily eliminated at run-time.

   Define WRAPV to 1 if the assumption is valid and if
     #pragma GCC optimize ("wrapv")
   does not trigger GCC bug 51793
   <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51793>.
   Otherwise, define it to 0; this forces the use of slower code that,
   while not guaranteed by the C Standard, works on all production
   platforms that we know about.  */
#ifndef WRAPV
# if (((__GNUC__ == 4 && 4 <= __GNUC_MINOR__) || 4 < __GNUC__) \
      && defined __GLIBC__)
#  pragma GCC optimize ("wrapv")
#  define WRAPV 1
# else
#  define WRAPV 0
# endif
#endif

/* Verify a requirement at compile-time (unlike assert, which is runtime).  */
#define verify(name, assertion) struct name { char a[(assertion) ? 1 : -1]; }

/* A signed type that is at least one bit wider than int.  */
#if INT_MAX <= LONG_MAX / 2
typedef long int long_int;
#else
typedef long long int long_int;
#endif
verify (long_int_is_wide_enough, INT_MAX == INT_MAX * (long_int) 2 / 2);

/* Shift A right by B bits portably, by dividing A by 2**B and
   truncating towards minus infinity.  A and B should be free of side
   effects, and B should be in the range 0 <= B <= INT_BITS - 2, where
   INT_BITS is the number of useful bits in an int.  GNU code can
   assume that INT_BITS is at least 32.

   ISO C99 says that A >> B is implementation-defined if A < 0.  Some
   implementations (e.g., UNICOS 9.0 on a Cray Y-MP EL) don't shift
   right in the usual way when A < 0, so SHR falls back on division if
   ordinary A >> B doesn't seem to be the usual signed shift.  */
#define SHR(a, b)                                               \
  ((-1 >> 1 == -1                                               \
    && (long_int) -1 >> 1 == -1                                 \
    && ((time_t) -1 >> 1 == -1 || ! TYPE_SIGNED (time_t)))      \    CODE USED OFTEN!!
   ? (a) >> (b)                                                 \
   : (a) / (1 << (b)) - ((a) % (1 << (b)) < 0))

/* The extra casts in the following macros work around compiler bugs,
   e.g., in Cray C 5.0.3.0.  */

/* True if the arithmetic type T is an integer type.  bool counts as
   an integer.  */
#define TYPE_IS_INTEGER(t) ((t) 1.5 == 1)

/* True if negative values of the signed integer type T use two's
   complement, or if T is an unsigned integer type.  */
#define TYPE_TWOS_COMPLEMENT(t) ((t) ~ (t) 0 == (t) -1)

/* True if the arithmetic type T is signed.  */
#define TYPE_SIGNED(t) (! ((t) 0 < (t) -1))

/* The maximum and minimum values for the integer type T.  These
   macros have undefined behavior if T is signed and has padding bits.
   If this is a problem for you, please let us know how to fix it for
   your host.  */
#define TYPE_MINIMUM(t) \
  ((t) (! TYPE_SIGNED (t) \
? (t) 0 \
: ~ TYPE_MAXIMUM (t)))
#define TYPE_MAXIMUM(t) \
  ((t) (! TYPE_SIGNED (t) \
? (t) -1 \
: ((((t) 1 << (sizeof (t) * CHAR_BIT - 2)) - 1) * 2 + 1)))      STRENGTH REDUCTION?

#ifndef TIME_T_MIN
# define TIME_T_MIN TYPE_MINIMUM (time_t)
#endif
#ifndef TIME_T_MAX
# define TIME_T_MAX TYPE_MAXIMUM (time_t)
#endif
#define TIME_T_MIDPOINT (SHR (TIME_T_MIN + TIME_T_MAX, 1) + 1)

verify (time_t_is_integer, TYPE_IS_INTEGER (time_t));
verify (twos_complement_arithmetic,
(TYPE_TWOS_COMPLEMENT (int)
&& TYPE_TWOS_COMPLEMENT (long_int)
&& TYPE_TWOS_COMPLEMENT (time_t)));

#define EPOCH_YEAR 1970
#define TM_YEAR_BASE 1900
verify (base_year_is_a_multiple_of_100, TM_YEAR_BASE % 100 == 0);

/* Return 1 if YEAR + TM_YEAR_BASE is a leap year.  */
static int
leapyear (long_int year)
{
  /* Don't add YEAR to TM_YEAR_BASE, as that might overflow.
     Also, work even if YEAR is negative.  */
  return
    ((year & 3) == 0
     && (year % 100 != 0
|| ((year / 100) & 3) == (- (TM_YEAR_BASE / 100) & 3)));
}

/* How many days come before each month (0-12).  */
#ifndef _LIBC
static
#endif
const unsigned short int __mon_yday[2][13] =                         COMBINE INTO SINGLE ARRAY?
  {
    /* Normal years.  */
    { 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365 },
    /* Leap years.  */
    { 0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366 }
  };


#ifndef _LIBC
/* Portable standalone applications should supply a <time.h> that
   declares a POSIX-compliant localtime_r, for the benefit of older
   implementations that lack localtime_r or have a nonstandard one.
   See the gnulib time_r module for one way to implement this.  */
# undef __localtime_r
# define __localtime_r localtime_r
# define __mktime_internal mktime_internal
# include "mktime-internal.h"
#endif

/* Return 1 if the values A and B differ according to the rules for
   tm_isdst: A and B differ if one is zero and the other positive.  */
static int
isdst_differ (int a, int b)
{
  return (!a != !b) && (0 <= a) && (0 <= b);
}

/* Return an integer value measuring (YEAR1-YDAY1 HOUR1:MIN1:SEC1) -
   (YEAR0-YDAY0 HOUR0:MIN0:SEC0) in seconds, assuming that the clocks
   were not adjusted between the time stamps.

   The YEAR values uses the same numbering as TP->tm_year.  Values
   need not be in the usual range.  However, YEAR1 must not be less
   than 2 * INT_MIN or greater than 2 * INT_MAX.

   The result may overflow.  It is the caller's responsibility to
   detect overflow.  */

static time_t
ydhms_diff (long_int year1, long_int yday1, int hour1, int min1, int sec1,
   int year0, int yday0, int hour0, int min0, int sec0)
{
  verify (C99_integer_division, -1 / 2 == 0);                   PRECALCULATE TO -0.5 ?

  /* Compute intervening leap days correctly even if year is negative.
     Take care to avoid integer overflow here.  */
  int a4 = SHR (year1, 2) + SHR (TM_YEAR_BASE, 2) - ! (year1 & 3);      REPETITIOUS,
  int b4 = SHR (year0, 2) + SHR (TM_YEAR_BASE, 2) - ! (year0 & 3);      USE CONSTANTS?
  int a100 = a4 / 25 - (a4 % 25 < 0);
  int b100 = b4 / 25 - (b4 % 25 < 0);                     STRENGTH REDUCTIONS?
  int a400 = SHR (a100, 2);
  int b400 = SHR (b100, 2);
  int intervening_leap_days = (a4 - b4) - (a100 - b100) + (a400 - b400);

  /* Compute the desired time in time_t precision.  Overflow might
     occur here.  */
  time_t tyear1 = year1;
  time_t years = tyear1 - year0;
  time_t days = 365 * years + yday1 - yday0 + intervening_leap_days;
  time_t hours = 24 * days + hour1 - hour0;
  time_t minutes = 60 * hours + min1 - min0;
  time_t seconds = 60 * minutes + sec1 - sec0;
  return seconds;
}

/* Return the average of A and B, even if A + B would overflow.  */
static time_t
time_t_avg (time_t a, time_t b)
{
  return SHR (a, 1) + SHR (b, 1) + (a & b & 1);
}

/* Return 1 if A + B does not overflow.  If time_t is unsigned and if
   B's top bit is set, assume that the sum represents A - -B, and
   return 1 if the subtraction does not wrap around.  */
static int
time_t_add_ok (time_t a, time_t b)
{
  if (! TYPE_SIGNED (time_t))
    {
      time_t sum = a + b;
      return (sum < a) == (TIME_T_MIDPOINT <= b);
    }
  else if (WRAPV)
    {
      time_t sum = a + b;
      return (sum < a) == (b < 0);
    }
  else
    {
      time_t avg = time_t_avg (a, b);                                                                 INLINING?
      return TIME_T_MIN / 2 <= avg && avg <= TIME_T_MAX / 2;           MAX * 0.5 ?
    }
}

/* Return 1 if A + B does not overflow.  */
static int
time_t_int_add_ok (time_t a, int b)
{
  verify (int_no_wider_than_time_t, INT_MAX <= TIME_T_MAX);
  if (WRAPV)
    {
      time_t sum = a + b;
      return (sum < a) == (b < 0);
    }
  else
    {
      int a_odd = a & 1;
      time_t avg = SHR (a, 1) + (SHR (b, 1) + (a_odd & b));
      return TIME_T_MIN / 2 <= avg && avg <= TIME_T_MAX / 2;
    }
}

/* Return a time_t value corresponding to (YEAR-YDAY HOUR:MIN:SEC),
   assuming that *T corresponds to *TP and that no clock adjustments
   occurred between *TP and the desired time.
   If TP is null, return a value not equal to *T; this avoids false matches.
   If overflow occurs, yield the minimal or maximal value, except do not
   yield a value equal to *T.  */
static time_t
guess_time_tm (long_int year, long_int yday, int hour, int min, int sec,
      const time_t *t, const struct tm *tp)
{
  if (tp)
    {
      time_t d = ydhms_diff (year, yday, hour, min, sec,
    tp->tm_year, tp->tm_yday,
    tp->tm_hour, tp->tm_min, tp->tm_sec);
      if (time_t_add_ok (*t, d))
return *t + d;
    }

  /* Overflow occurred one way or another.  Return the nearest result
     that is actually in range, except don't report a zero difference
     if the actual difference is nonzero, as that would cause a false
     match; and don't oscillate between two values, as that would
     confuse the spring-forward gap detector.  */
  return (*t < TIME_T_MIDPOINT
 ? (*t <= TIME_T_MIN + 1 ? *t + 1 : TIME_T_MIN)
 : (TIME_T_MAX - 1 <= *t ? *t - 1 : TIME_T_MAX));
}

/* Use CONVERT to convert *T to a broken down time in *TP.
   If *T is out of range for conversion, adjust it so that
   it is the nearest in-range value and then convert that.  */
static struct tm *
ranged_convert (struct tm *(*convert) (const time_t *, struct tm *),
time_t *t, struct tm *tp)
{
  struct tm *r = convert (t, tp);                                                      INLINING?

  if (!r && *t)                                                                      SHORT-CIRCUIT EVALUATION(&&)
    {
      time_t bad = *t;
      time_t ok = 0;

      /* BAD is a known unconvertible time_t, and OK is a known good one.
Use binary search to narrow the range between BAD and OK until
they differ by 1.  */
      while (bad != ok + (bad < 0 ? -1 : 1))
{
 time_t mid = *t = time_t_avg (ok, bad);
 r = convert (t, tp);
 if (r)
   ok = mid;
 else
   bad = mid;
}

      if (!r && ok)
{
 /* The last conversion attempt failed;
    revert to the most recent successful attempt.  */
 *t = ok;
 r = convert (t, tp);
}
    }

  return r;
}


/* Convert *TP to a time_t value, inverting
   the monotonic and mostly-unit-linear conversion function CONVERT.
   Use *OFFSET to keep track of a guess at the offset of the result,
   compared to what the result would be for UTC without leap seconds.
   If *OFFSET's guess is correct, only one CONVERT call is needed.
   This function is external because it is used also by timegm.c.  */
time_t
__mktime_internal (struct tm *tp,
  struct tm *(*convert) (const time_t *, struct tm *),
  time_t *offset)
{
  time_t t, gt, t0, t1, t2;
  struct tm tm;

  /* The maximum number of probes (calls to CONVERT) should be enough
     to handle any combinations of time zone rule changes, solar time,
     leap seconds, and oscillations around a spring-forward gap.
     POSIX.1 prohibits leap seconds, but some hosts have them anyway.  */
  int remaining_probes = 6;

  /* Time requested.  Copy it in case CONVERT modifies *TP; this can
     occur if TP is localtime's returned value and CONVERT is localtime.  */
  int sec = tp->tm_sec;
  int min = tp->tm_min;
  int hour = tp->tm_hour;
  int mday = tp->tm_mday;
  int mon = tp->tm_mon;
  int year_requested = tp->tm_year;
  int isdst = tp->tm_isdst;

  /* 1 if the previous probe was DST.  */
  int dst2;

  /* Ensure that mon is in range, and set year accordingly.  */
  int mon_remainder = mon % 12;
  int negative_mon_remainder = mon_remainder < 0;
  int mon_years = mon / 12 - negative_mon_remainder;
  long_int lyear_requested = year_requested;
  long_int year = lyear_requested + mon_years;

  /* The other values need not be in range:
     the remaining code handles minor overflows correctly,
     assuming int and time_t arithmetic wraps around.
     Major overflows are caught at the end.  */

  /* Calculate day of year from year, month, and day of month.
     The result need not be in range.  */
  int mon_yday = ((__mon_yday[leapyear (year)]
  [mon_remainder + 12 * negative_mon_remainder])
 - 1);
  long_int lmday = mday;
  long_int yday = mon_yday + lmday;

  time_t guessed_offset = *offset;

  int sec_requested = sec;

  if (LEAP_SECONDS_POSSIBLE)
    {
      /* Handle out-of-range seconds specially,
since ydhms_tm_diff assumes every minute has 60 seconds.  */
      if (sec < 0)
sec = 0;
      if (59 < sec)
sec = 59;
    }

  /* Invert CONVERT by probing.  First assume the same offset as last
     time.  */

  t0 = ydhms_diff (year, yday, hour, min, sec,
  EPOCH_YEAR - TM_YEAR_BASE, 0, 0, 0, - guessed_offset);

  if (TIME_T_MAX / INT_MAX / 366 / 24 / 60 / 60 < 3)       TOO MANY DIVISIONS, SIMPLIFY?
    {
      /* time_t isn't large enough to rule out overflows, so check
for major overflows.  A gross check suffices, since if t0
has overflowed, it is off by a multiple of TIME_T_MAX -
TIME_T_MIN + 1.  So ignore any component of the difference
that is bounded by a small value.  */

      /* Approximate log base 2 of the number of time units per
biennium.  A biennium is 2 years; use this unit instead of
years to avoid integer overflow.  For example, 2 average
Gregorian years are 2 * 365.2425 * 24 * 60 * 60 seconds,
which is 63113904 seconds, and rint (log2 (63113904)) is
26.  */
      int ALOG2_SECONDS_PER_BIENNIUM = 26;
      int ALOG2_MINUTES_PER_BIENNIUM = 20;
      int ALOG2_HOURS_PER_BIENNIUM = 14;
      int ALOG2_DAYS_PER_BIENNIUM = 10;
      int LOG2_YEARS_PER_BIENNIUM = 1;

      int approx_requested_biennia =
(SHR (year_requested, LOG2_YEARS_PER_BIENNIUM)
- SHR (EPOCH_YEAR - TM_YEAR_BASE, LOG2_YEARS_PER_BIENNIUM)
+ SHR (mday, ALOG2_DAYS_PER_BIENNIUM)
+ SHR (hour, ALOG2_HOURS_PER_BIENNIUM)
+ SHR (min, ALOG2_MINUTES_PER_BIENNIUM)
+ (LEAP_SECONDS_POSSIBLE
   ? 0
   : SHR (sec, ALOG2_SECONDS_PER_BIENNIUM)));

      int approx_biennia = SHR (t0, ALOG2_SECONDS_PER_BIENNIUM);
      int diff = approx_biennia - approx_requested_biennia;
      int approx_abs_diff = diff < 0 ? -1 - diff : diff;

      /* IRIX 4.0.5 cc miscalculates TIME_T_MIN / 3: it erroneously
gives a positive value of 715827882.  Setting a variable
first then doing math on it seems to work.
(ghazi@caip.rutgers.edu) */
      time_t time_t_max = TIME_T_MAX;
      time_t time_t_min = TIME_T_MIN;
      time_t overflow_threshold =
(time_t_max / 3 - time_t_min / 3) >> ALOG2_SECONDS_PER_BIENNIUM;
WHY NOT (MAX - MIN) / 3 ?? ... DIVIDING TWICE IS REDUNDANT
      if (overflow_threshold < approx_abs_diff)
{
 /* Overflow occurred.  Try repairing it; this might work if
    the time zone offset is enough to undo the overflow.  */
 time_t repaired_t0 = -1 - t0;
 approx_biennia = SHR (repaired_t0, ALOG2_SECONDS_PER_BIENNIUM);
 diff = approx_biennia - approx_requested_biennia;
 approx_abs_diff = diff < 0 ? -1 - diff : diff;
 if (overflow_threshold < approx_abs_diff)
   return -1;
 guessed_offset += repaired_t0 - t0;
 t0 = repaired_t0;
}
    }

  /* Repeatedly use the error to improve the guess.  */

  for (t = t1 = t2 = t0, dst2 = 0;
       (gt = guess_time_tm (year, yday, hour, min, sec, &t,
   ranged_convert (convert, &t, &tm)),
t != gt);
       t1 = t2, t2 = t, t = gt, dst2 = tm.tm_isdst != 0)
    if (t == t1 && t != t2
&& (tm.tm_isdst < 0                       SHORT-CIRCUIT EVALUATIONS
   || (isdst < 0
? dst2 <= (tm.tm_isdst != 0)
: (isdst != 0) != (tm.tm_isdst != 0))))
      /* We can't possibly find a match, as we are oscillating
between two values.  The requested time probably falls
within a spring-forward gap of size GT - T.  Follow the common
practice in this case, which is to return a time that is GT - T
away from the requested time, preferring a time whose
tm_isdst differs from the requested value.  (If no tm_isdst
was requested and only one of the two values has a nonzero
tm_isdst, prefer that value.)  In practice, this is more
useful than returning -1.  */
      goto offset_found;
    else if (--remaining_probes == 0)
      return -1;

  /* We have a match.  Check whether tm.tm_isdst has the requested
     value, if any.  */
  if (isdst_differ (isdst, tm.tm_isdst))
    {
      /* tm.tm_isdst has the wrong value.  Look for a neighboring
time with the right value, and use its UTC offset.

Heuristic: probe the adjacent timestamps in both directions,
looking for the desired isdst.  This should work for all real
time zone histories in the tz database.  */

      /* Distance between probes when looking for a DST boundary.  In
tzdata2003a, the shortest period of DST is 601200 seconds
(e.g., America/Recife starting 2000-10-08 01:00), and the
shortest period of non-DST surrounded by DST is 694800
seconds (Africa/Tunis starting 1943-04-17 01:00).  Use the
minimum of these two values, so we don't miss these short
periods when probing.  */
      int stride = 601200;

      /* The longest period of DST in tzdata2003a is 536454000 seconds
(e.g., America/Jujuy starting 1946-10-01 01:00).  The longest
period of non-DST is much longer, but it makes no real sense
to search for more than a year of non-DST, so use the DST
max.  */
      int duration_max = 536454000;

      /* Search in both directions, so the maximum distance is half
the duration; add the stride to avoid off-by-1 problems.  */
      int delta_bound = duration_max / 2 + stride;      ALREADY HAVE CONSTANT! USE IT

      int delta, direction;

      for (delta = stride; delta < delta_bound; delta += stride)
for (direction = -1; direction <= 1; direction += 2)
 if (time_t_int_add_ok (t, delta * direction))
   {
     time_t ot = t + delta * direction;           DIRECTION IS ONLY EVER -1 AND 1 ...
     struct tm otm;
     ranged_convert (convert, &ot, &otm);
     if (! isdst_differ (isdst, otm.tm_isdst))
{
 /* We found the desired tm_isdst.
    Extrapolate back to the desired time.  */
 t = guess_time_tm (year, yday, hour, min, sec, &ot, &otm);
 ranged_convert (convert, &t, &tm);
 goto offset_found;
}
   }
    }

 offset_found:
  *offset = guessed_offset + t - t0;

  if (LEAP_SECONDS_POSSIBLE && sec_requested != tm.tm_sec)
    {
      /* Adjust time to reflect the tm_sec requested, not the normalized value.
Also, repair any damage from a false match due to a leap second.  */
      int sec_adjustment = (sec == 0 && tm.tm_sec == 60) - sec;
      if (! time_t_int_add_ok (t, sec_requested))
return -1;
      t1 = t + sec_requested;
      if (! time_t_int_add_ok (t1, sec_adjustment))       REPETITIVE-LOOKING... SIMPLIFY??
return -1;
      t2 = t1 + sec_adjustment;
      if (! convert (&t2, &tm))
return -1;
      t = t2;
    }

  *tp = tm;
  return t;
}


/* FIXME: This should use a signed type wide enough to hold any UTC
   offset in seconds.  'int' should be good enough for GNU code.  We
   can't fix this unilaterally though, as other modules invoke
   __mktime_internal.  */
static time_t localtime_offset;

/* Convert *TP to a time_t value.  */
time_t
mktime (struct tm *tp)
{
#ifdef _LIBC
  /* POSIX.1 8.1.1 requires that whenever mktime() is called, the
     time zone names contained in the external variable 'tzname' shall
     be set as if the tzset() function had been called.  */
  __tzset ();
#endif

  return __mktime_internal (tp, __localtime_r, &localtime_offset);
}

#ifdef weak_alias
weak_alias (mktime, timelocal)
#endif

#ifdef _LIBC
libc_hidden_def (mktime)
libc_hidden_weak (timelocal)
#endif

#if defined DEBUG_MKTIME && DEBUG_MKTIME

static int
not_equal_tm (const struct tm *a, const struct tm *b)
{
  return ((a->tm_sec ^ b->tm_sec)
 | (a->tm_min ^ b->tm_min)
 | (a->tm_hour ^ b->tm_hour)
 | (a->tm_mday ^ b->tm_mday)
 | (a->tm_mon ^ b->tm_mon)
 | (a->tm_year ^ b->tm_year)
 | (a->tm_yday ^ b->tm_yday)
 | isdst_differ (a->tm_isdst, b->tm_isdst));
}

static void
print_tm (const struct tm *tp)
{
  if (tp)
    printf ("%04d-%02d-%02d %02d:%02d:%02d yday %03d wday %d isdst %d",
   tp->tm_year + TM_YEAR_BASE, tp->tm_mon + 1, tp->tm_mday,
   tp->tm_hour, tp->tm_min, tp->tm_sec,
   tp->tm_yday, tp->tm_wday, tp->tm_isdst);
  else
    printf ("0");
}

static int
check_result (time_t tk, struct tm tmk, time_t tl, const struct tm *lt)
{
  if (tk != tl || !lt || not_equal_tm (&tmk, lt))                   SHORT-CIRCUIT EVALUATIONS ( || )
    {
      printf ("mktime (");
      print_tm (lt);
      printf (")\nyields (");
      print_tm (&tmk);
      printf (") == %ld, should be %ld\n", (long int) tk, (long int) tl);
      return 1;
    }

  return 0;
}

int
main (int argc, char **argv)
{
  int status = 0;
  struct tm tm, tmk, tml;
  struct tm *lt;
  time_t tk, tl, tl1;
  char trailer;

  if ((argc == 3 || argc == 4)                            (ARGC > 2 && ARGC < 5) POSSIBLY CHEAPER?
      && (sscanf (argv[1], "%d-%d-%d%c",
 &tm.tm_year, &tm.tm_mon, &tm.tm_mday, &trailer)
 == 3)
      && (sscanf (argv[2], "%d:%d:%d%c",
 &tm.tm_hour, &tm.tm_min, &tm.tm_sec, &trailer)
 == 3))
    {
      tm.tm_year -= TM_YEAR_BASE;
      tm.tm_mon--;
      tm.tm_isdst = argc == 3 ? -1 : atoi (argv[3]);
      tmk = tm;
      tl = mktime (&tmk);
      lt = localtime (&tl);
      if (lt)
{
 tml = *lt;
 lt = &tml;
}
      printf ("mktime returns %ld == ", (long int) tl);
      print_tm (&tmk);
      printf ("\n");
      status = check_result (tl, tmk, tl, lt);
    }
  else if (argc == 4 || (argc == 5 && strcmp (argv[4], "-") == 0))    SHORT-CIRCUIT EVALUATION
    {
      time_t from = atol (argv[1]);
      time_t by = atol (argv[2]);
      time_t to = atol (argv[3]);

      if (argc == 4)
for (tl = from; by < 0 ? to <= tl : tl <= to; tl = tl1)
 {
   lt = localtime (&tl);                                              INLINING??
   if (lt)
     {
tmk = tml = *lt;
tk = mktime (&tmk);
status |= check_result (tk, tmk, tl, &tml);
     }
   else
     {
printf ("localtime (%ld) yields 0\n", (long int) tl);
status = 1;
     }
   tl1 = tl + by;
   if ((tl1 < tl) != (by < 0))
     break;
 }
      else
for (tl = from; by < 0 ? to <= tl : tl <= to; tl = tl1)
 {
   /* Null benchmark.  */
   lt = localtime (&tl);
   if (lt)
     {
tmk = tml = *lt;
tk = tl;
status |= check_result (tk, tmk, tl, &tml);
     }
   else
     {
printf ("localtime (%ld) yields 0\n", (long int) tl);
status = 1;
     }
   tl1 = tl + by;
   if ((tl1 < tl) != (by < 0))
     break;
 }
    }
  else
    printf ("Usage:\
\t%s YYYY-MM-DD HH:MM:SS [ISDST] # Test given time.\n\
\t%s FROM BY TO # Test values FROM, FROM+BY, ..., TO.\n\
\t%s FROM BY TO - # Do not test those values (for benchmark).\n",
   argv[0], argv[0], argv[0]);

  return status;
}

#endif /* DEBUG_MKTIME */

/*
Local Variables:
compile-command: "gcc -DDEBUG_MKTIME -I. -Wall -W -O2 -g mktime.c -o mktime"
End:
*/
-----------------------------------------

So here we have my initial take on some possible things I can look into.. more detailed information on changes and testing will follow!! Stay tuned.

During the mini-presentation, the professor mentioned that I need to consider if the compiler already is doing many of those optimizations.

https://msdn.microsoft.com/en-us/library/ms973852.aspx
https://www.functions-online.com/mktime.html

Thursday, 2 March 2017

Lab 7: Inline Assembler

Part A - Class Lab

Using the same code as my post in Lab 5, we are creating another algorithm that makes use of the inline assembly code, specifically SQDMULH.

The following code that our group created:

void adjustVolume8(int16_t* original,float factor,int max){
        register int16_t intFactor asm("w16");
        intFactor=(int)(factor*32767);

        int16_t *x;
        int16_t *loop_max = original+max;
        for(x=original ;x<loop_max ;x+=8)
        {
                __asm__(
                        "LD1 {v0.8h},[%[p]]\n"
                        "DUP v1.8h, w16\n"
                        "SQDMULH v0.8h,v0.8h,v1.8h\n"
                        "ST1 {v0.8h},[%[p]]"
                        :
                        :[p]"r"(x),"r"(intFactor)
                        :
                        );
        }
}


And using the same response times from Lab 5 as well:

 5000000 size sound file, new array, simple: 92.00
5000000 size sound file, original array, simple: 91.00                  *** NAIVE VERSION ***
5000000 size sound file, new array, table: 56.00
5000000 size sound file, original array, table: 56.00                     *** TABLE LOOKUP ***
5000000 size sound file, new array, int hack: 41.00
5000000 size sound file, original array, int hack: 40.00
5000000 size sound file, new array, inline assembly: 0.00
5000000 size sound file, original array, inline assembly: 4.00      *** INLINE ASSEMBLY ***


As we can see, the inline assembly version was HUGELY more efficient than the other versions we had created. This code essentially does the following things:

Reserve a register (w16) for our intFactor value.
Loop through the array we had created in 'original', grabbing several elements at a time.
Use vector registers (to hold 128-bits at a time - thereby going through the array faster)
Multiplying the entire vector register by the same value, so that several values change simultaneously.
Moving those new values back into the original array. 

Just from this process of grabbing several array elements at a time and using the same multiplication on all those values (at the same time) we have increased our efficiency from the table lookup version by 1400% or 14x faster!!

From this lab I learned that using methods to grab several elements from an array is possible and incredibly more efficient. Also, there are methods that can do processes on several array elements simultaneously. I wasn't aware before that I didn't have to go through arrays one element at a time and change each value separately before moving on to the next element. Understanding this kind of processing techniques can definitely improve future performances of programs that I write if I can find a way to apply this knowledge. 



Part B - Individual Task 

Important.png
After Memory Architecture
Do this part of the lab after the class on Memory Architecture.

I will have to wait a bit before doing this part I believe... 

Lab 3: Working with X86_64 Registers and Aarch64 Registers

For this lab, we are beginning to dive into the codes we write as they appear in machine language.

Take your typical 'Hello World' basic C program [ hello.c ]:

/* Hello World in traditional C using printf() */

#include <stdio.h>

int main() {
        printf("Hello World!\n");
}

When I compile the code into an executable file, I can jump into the assembler code by using the command:

gcc -o hello hello.c          -> compile the hello.c file into the hello executable file
./hello                               -> to execute the file (make it run on command line)
objdump -d hello | less     -> break into the hello file's assembler code


It will produce a whole bunch of header code and other filler, but the section that produces the result we asked for in our program looks like this:

00000000004004f6 <main>:
  4004f6:       55                             push   %rbp
  4004f7:       48 89 e5                   mov    %rsp,%rbp
  4004fa:       bf a0 05 40 00          mov    $0x4005a0,%edi
  4004ff:       e8 ec fe ff ff              callq  4003f0 <puts@plt>
  400504:       b8 00 00 00 00        mov    $0x0,%eax
  400509:       5d                            pop    %rbp
  40050a:       c3                            retq   
  40050b:       0f 1f 44 00 00         nopl   0x0(%rax,%rax,1)

This is basically the way that you would deconstruct a C file.

For our lab, we are going to deconstruct similar files with X86_64 and Aarch Registers. For example, looking at the hello-gas.s file (the code, not an executable file yet):

/* 
   This is a 'hello world' program in x86_64 assembler using the 
   GNU assembler (gas) syntax. Note that this program runs in 64-bit
   mode.

   CTyler, Seneca College, 2014-01-20
   Licensed under GNU GPL v2+
*/

.text
.globl  _start

_start:
        movq    $len,%rdx                       /* message length */
        movq    $msg,%rsi                       /* message location */
        movq    $1,%rdi                         /* file descriptor stdout */
        movq    $1,%rax                         /* syscall sys_write */
        syscall

        movq    $0,%rdi                         /* exit status */
        movq    $60,%rax                        /* syscall sys_exit */
        syscall

.section   .rodata

msg:    .ascii      "Hello, world!\n"
            len = . - msg

We are going to need to build this into an executable file by using the following commands, which I read from the Makefile, describing how to build them:

as hello-gas.o hello-gas.s                 -> turn the assembler code into an intermediary object file 
ld hello-gas hello-gas.o                    -> turn the object file into an executable file

Then we can jump into what the assembler code looks like in the executable file:

objdump -d hello-gas

Which will give the result:

hello-gas:     file format elf64-x86-64

Disassembly of section .text:

0000000000400078 <_start>:
  400078:       48 c7 c2 0e 00 00 00      mov    $0xe,%rdx
  40007f:       48 c7 c6 a6 00 40 00      mov    $0x4000a6,%rsi
  400086:       48 c7 c7 01 00 00 00     mov    $0x1,%rdi
  40008d:       48 c7 c0 01 00 00 00     mov    $0x1,%rax
  400094:       0f 05                              syscall 
  400096:       48 c7 c7 00 00 00 00     mov    $0x0,%rdi
  40009d:       48 c7 c0 3c 00 00 00     mov    $0x3c,%rax
  4000a4:       0f 05                              syscall 

I repeated the same steps to build all the X86_64 files and then view their assembly code, then I moved on to the Aarch64 files. What is interesting to note before moving on, is that the x86_64 assembler code strictly had what was necessary to perform the task my program was designed for. The C programs that I had deconstructed earlier had a LOT more lines of assembler code that had nothing to do with the actual function of the program (which is simply print "Hello, world!"). So it appears that C programs have a bit of work to do in translating more information to the registers concerning the layout of the programs or something like that I suppose.

For the Aarch64, the hello.s file looks like:

.text
.globl _start
_start:
        mov    x0, 1              /* file descriptor: 1 is stdout */
        adr      x1, msg         /* message location (memory address) */
        mov    x2, len           /* message length (bytes) */

        mov     x8, 64          /* write is syscall #64 */
        svc       0                  /* invoke syscall */
        mov      x0, 0           /* status -> 0 */
        mov     x8, 93          /* exit is syscall #93 */
        svc       0                  /* invoke syscall */
.data
msg:    .ascii      "Hello, world!\n"
len=    . - msg

I follow a similar process of:      as -o hello.o hello.s

However, right away I got this response:

hello.s: Assembler messages:
hello.s:5: Error: too many memory references for `mov'
hello.s:6: Error: no such instruction: `adr x1,msg'
hello.s:7: Error: too many memory references for `mov'
hello.s:9: Error: too many memory references for `mov'
hello.s:10: Error: no such instruction: `svc 0'
hello.s:12: Error: too many memory references for `mov'
hello.s:13: Error: too many memory references for `mov'
hello.s:14: Error: no such instruction: `svc 0'

I'm not sure why I got that, but for now I will continue moving onwards with the rest of the lab until I figure it out... The next part of the lab is using a looping function with some variations. The original code looks like:

.text
.globl    _start

start = 0                       /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10                        /* loop exits when the index hits this number (loop condition is i<max) */

_start:
    mov     $start,%r15         /* loop index */

loop:
    /* ... body of the loop ... do something useful here ... */

    inc     %r15                /* increment index */
    cmp     $max,%r15           /* see if we're done */
    jne     loop                /* loop if we're not */

    mov     $0,%rdi             /* exit status */
    mov     $60,%rax            /* syscall sys_exit */
    syscall

By itself, the code will do nothing, but I need to adjust the code so that it prints something like:

Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop

So I'm going to look for where I can make a change that creates that result.
Looking to our previous examples, I may need something like: msg:    .ascii      "Hello, world!\n"
So I'll change it to: msg:    .ascii      "Loop\n"

[ NEED TO FINISH THE LAB ]

Lab 5: Algorithm Selection Lab

For this lab, we are trying a few different ways increasing the volume  of a sequence of sound samples, using different algorithms. The purpose is to test which approaches are faster than others by comparing the time elapsed from the  beginning to the end of the process.

During group work, we came up with several different algorithms that had varying results - so the code used for this blog is not my own original code, but the code that was worked on by the group.

Since sound is represented by computers as 16-bit float numbers, we use integers with the type: int16_t 

The first step is creating a naive version and testing its results with a time comparison function:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <sys/time.h>


int main(int argc, char* argv[]) {

    float factor = 0.5;
    int max = 5000000;

    if(argc >= 2) {
        factor = atof(argv[1]);
    }
    if (argc >= 3) {
        max = atoi(argv[2]);
    }

    int16_t* sound = generateSound(max);


    int i;
    struct timeval t1, t2;
    double elapsed; 


    gettimeofday(&t1, NULL);
    adjustVolume(sound, factor, max);
    gettimeofday(&t2, NULL);
    elapsed = (t2.tv_sec - t1.tv_sec) * 1000 + (t2.tv_usec - t1.tv_usec) / 1000;
    printf("%d size sound file, original array, simple: %.2lf\n", max, elapsed);


} // end of main()

int16_t* generateSound(int max) {

    int leftover = max % 8;
    int len = (max + (8 - leftover));
    int16_t* rc = (int16_t*)malloc(len * 2);
    int i;
    for (i = 0; i < max; i++) {
        rc[i] = (int16_t)((rand() % 65536) - 32768);
    }
    return rc;
}


void adjustVolume(int16_t* original, float factor, int max) {

    int i;
    for (i = 0; i < max; i++) {
        original[i] = original[i] * factor;
    }
} 

-------------------------------------------------------------------------------------------------

The code above will give the simple naive version, which will create an array of int16_t values that range from -32768 -> +32767. Then our adjustVolume algorithm will simple go through the entire array and multiply those random values by the factor of 0.5.
This is probably a very slow process, since our program is creating 5 million elements in an array and looping through each one trying to change its value.  This process could run up a lot of memory and time the bigger that number gets.

The second step was to create a secondary algorithm by which we could compare the response times.
Another version we used was a table lookup. In our first naive version, we had to create a random number each time and then do calculations on it (multiply by the factor) in order to get a new number. That takes time and processing power. The purpose of the lookup table is to do all those calculations for every single possible value one time. Then, when a random number is created, we search for the already-calculated answer in the lookup. This saves us the time of calculating the values. The differences to our code look like this:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <sys/time.h>


int main(int argc, char* argv[]) {

    float factor = 0.5;
    int max = 5000000;

    if(argc >= 2) {
        factor = atof(argv[1]);
    }
    if (argc >= 3) {
        max = atoi(argv[2]);
    }



int16_t* sound = generateSound(max);
int16_t* sound2 = copySound(sound, max); 

    int i;
    struct timeval t1, t2;
    double elapsed; 
 


   gettimeofday(&t1, NULL);
    adjustVolume(sound, factor, max);
    gettimeofday(&t2, NULL);
    elapsed = (t2.tv_sec - t1.tv_sec) * 1000 + (t2.tv_usec - t1.tv_usec) / 1000;
    printf("%d size sound file, original array, simple: %.2lf\n", max, elapsed);
   


   gettimeofday(&t1, NULL);
    adjustVolume3(sound2, factor, max);
    gettimeofday(&t2, NULL);
    elapsed = (t2.tv_sec - t1.tv_sec) * 1000 + (t2.tv_usec - t1.tv_usec) / 1000;
    printf("%d size sound file, original array, table: %.2lf\n", max, elapsed); 


} // end of main() 

int16_t* generateSound(int max) {

    int leftover = max % 8;
    int len = (max + (8 - leftover));
    int16_t* rc = (int16_t*)malloc(len * 2);
    int i;
    for (i = 0; i < max; i++) {
        rc[i] = (int16_t)((rand() % 65536) - 32768);
    }
    return rc;
}


int16_t* copySound(int16_t* original, int max) {

    int leftover = max % 8;
    int len = (max + (8 - leftover));
    int16_t* duplicate = (int16_t*)malloc(len * 2);
    int i;
    for (i = 0; i < max; i++) {
        duplicate[i] = original[i];
    }
    return duplicate;
}


void adjustVolume(int16_t* original, float factor, int max) {

    int i;
    for (i = 0; i < max; i++) {
        original[i] = original[i] * factor;
    }


void adjustVolume3(int16_t* original, float factor, int max) {

    uint16_t i;
    float* table = malloc(65536 * 2);
    for (i = 0; i < 65535; i++) {
        table[i] = (int16_t)i * factor;

    }
    table[65535] = (int16_t)65535 * factor;
    int j;
    for (j = 0; j < max; j++) {
        original[j] = table[(uint16_t)original[j]];

    }
}


----------------------------------------------------------------------------------------
 
This code will make a duplicate of the first array we create with our naive version and use it as sound2. Then in adjustVolume3, we will make all our calculations ahead of time inside our duplicate table. Then we can make changes to the original array by referencing to our lookup table. 

The next step we have to do is to compare the naive version we created with the table lookup version - first in the X86 system and then separately in the Aarch64 system. Note that we are not comparing times between the systems, but comparing the algorithm differences inside each system. So that means we'll need to do these tests on the Xerxes server (X86) and then on the Betty server (Aarch64) and compare the printf responses we get.

[https://wiki.cdot.senecacollege.ca/wiki/SPO600_Servers

We had several algorithms, so I'll point out which responses are directly a result of the naive version and the table lookup versions:

On Betty:

5000000 size sound file, new array, simple: 92.00
5000000 size sound file, original array, simple: 91.00                  *** NAIVE VERSION ***
5000000 size sound file, new array, table: 56.00
5000000 size sound file, original array, table: 56.00                     *** TABLE LOOKUP ***
5000000 size sound file, new array, int hack: 41.00
5000000 size sound file, original array, int hack: 40.00
5000000 size sound file, new array, inline assembly: 0.00
5000000 7size sound file, original array, inline assembly: 4.00
 

As we can see in the Aarch64 system, using the table lookup algorithm was about 63% faster than the naive version, whereas the naive version was about 62% slower than the table lookup version.

On Xerxes:

I tried to copy the code over exactly the same way, but I got segmentation faults with core dump on the adjustVolume3 and adjustVolume4. Everything else appeared to work fine except for the inline assembly part of adjustVolume8, because that code is meant for Aarch64 only. I studied the code for a while and then I asked for some advice and got directed to this github reference for debugging:

https://github.com/SENECA-DSA555/dsa555-w17/wiki/gdb-guide

I ran the code through the debugger and got this:

(gdb) run
Starting program: /home/wawilliams/lab7
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.24-4.fc25.x86_64
5000000 size sound file, new array, simple: 30.00
5000000 size sound file, original array, simple: 23.00

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400f3d in adjustVolume4 (original=0x7ffff66fd010, newSound=0x7ffff4a5f010, factor=0.5, max=5000000) at lab7_Final.c:148
warning: Source file is more recent than executable.
148                     table[i] = (int16_t)i * factor;