Panofi Blog

Sunday, 9 December 2012

Eloquent JavaScript: A Modern Introduction to Programming by Marijn Haverbeke

A wonderful philosophical approach to programming by Marijn Haverbeke.

http://eloquentjavascript.net/chapter6.html


% The Book of Programming

%% The Two Aspects
Below the surface of the machine, the program moves. Without effort,
it expands and contracts. In great harmony, electrons scatter and
regroup. The forms on the monitor are but ripples on the water. The
essence stays invisibly below.

When the creators built the machine, they put in the processor and the
memory. From these arise the two aspects of the program.

The aspect of the processor is the active substance. It is called
Control. The aspect of the memory is the passive substance. It is
called Data.

Data is made of merely bits, yet it takes complex forms. Control
consists only of simple instructions, yet it performs difficult
tasks. From the small and trivial, the large and complex arise.

The program source is Data. Control arises from it. The Control
proceeds to create new Data. The one is born from the other, the
other is useless without the one. This is the harmonious cycle of
Data and Control.

Of themselves, Data and Control are without structure. The programmers
of old moulded their programs out of this raw substance. Over time,
the amorphous Data has crystallised into data types, and the chaotic
Control was restricted into control structures and functions.

%% Short Sayings

When a student asked Fu-Tzu about the nature of the cycle of Data and
Control, Fu-Tzu replied 'Think of a compiler, compiling itself.'

A student asked 'The programmers of old used only simple machines and
no programming languages, yet they made beautiful programs. Why do we
use complicated machines and programming languages?'. Fu-Tzu replied
'The builders of old used only sticks and clay, yet they made
beautiful huts.'

A hermit spent ten years writing a program. 'My program can compute
the motion of the stars on a 286-computer running MS DOS', he proudly
announced. 'Nobody owns a 286-computer or uses MS DOS anymore.',
Fu-Tzu responded.

Fu-Tzu had written a small program that was full of global state and
dubious shortcuts. Reading it, a student asked 'You warned us against
these techniques, yet I find them in your program. How can this be?'
Fu-Tzu said 'There is no need to fetch a water hose when the house is
not on fire.'{This is not to be read as an encouragement of sloppy
programming, but rather as a warning against neurotic adherence to
rules of thumb.}

%% Wisdom

A student was complaining about digital numbers. 'When I take the root
of two and then square it again, the result is already inaccurate!'.
Overhearing him, Fu-Tzu laughed. 'Here is a sheet of paper. Write down
the precise value of the square root of two for me.'

Fu-Tzu said 'When you cut against the grain of the wood, much strength
is needed. When you program against the grain of a problem, much code
is needed.'

Tzu-li and Tzu-ssu were boasting about the size of their latest
programs. 'Two-hundred thousand lines', said Tzu-li, 'not counting
comments!'. 'Psah', said Tzu-ssu, 'mine is almost a *million* lines
already.' Fu-Tzu said 'My best program has five hundred lines.'
Hearing this, Tzu-li and Tzu-ssu were enlightened.

A student had been sitting motionless behind his computer for hours,
frowning darkly. He was trying to write a beautiful solution to a
difficult problem, but could not find the right approach. Fu-Tzu hit
him on the back of his head and shouted '*Type something!*' The student
started writing an ugly solution. After he had finished, he suddenly
understood the beautiful solution.

%% Progression

A beginning programmer writes his programs like an ant builds her
hill, one piece at a time, without thought for the bigger structure.
His programs will be like loose sand. They may stand for a while, but
growing too big they fall apart{Referring to the danger of internal
inconsistency and duplicated structure in unorganised code.}.

Realising this problem, the programmer will start to spend a lot of
time thinking about structure. His programs will be rigidly
structured, like rock sculptures. They are solid, but when they must
change, violence must be done to them{Referring to the fact that
structure tends to put restrictions on the evolution of a program.}.

The master programmer knows when to apply structure and when to leave
things in their simple form. His programs are like clay, solid yet
malleable.

%% Language

When a programming language is created, it is given syntax and
semantics. The syntax describes the form of the program, the semantics
describe the function. When the syntax is beautiful and the semantics
are clear, the program will be like a stately tree. When the syntax is
clumsy and the semantics confusing, the program will be like a bramble
bush.

Tzu-ssu was asked to write a program in the language called Java,
which takes a very primitive approach to functions. Every morning, as
he sat down in front of his computer, he started complaining. All day
he cursed, blaming the language for all that went wrong. Fu-Tzu
listened for a while, and then reproached him, saying 'Every language
has its own way. Follow its form, do not try to program as if you
were using another language.'

Saturday, 24 November 2012

City University London

An empirical study of the performance benefits of spatial clustering analysis in the application layer of an insurance risk management platform.

Panos Bairaktaris

January 2012

Submitted in partial fulfillment of the requirements for the degree of MSc in Geographic Information Systems

Supervisor Dr Jo Wood

Abstract
Decision-making within the insurance sector is strongly influenced by data containing a spatial dimension. When understanding and managing the risks arising from natural catastrophes, spatial analysis often forms a core part of the overall analysis of both the insured asset (exposure) and the natural hazards. The design problem facing such spatial decision-making tools is how to perform very fast spatial analysis across very large and complex datasets over wide spatial extents. In the case of exposure data, a large global insurer might have a portfolio containing over a million policies spread across the globe. This project researches ways to visualise and interrogate these large datasets in a web mapping, corporate decision-making analytical platform, using modern GIS tools and the latest Microsoft .NET framework. The project will investigate the performance benefits of a spatial clustering process in the application layer rather than in the presentation layer. By identifying the underlying technical challenges, the proposed solution incorporates the latest technological advances and APIs in an orchestrated and well defined manner. Aside of increasing the analytical capabilities of the hosting insurance platform, the proposed solution also provides extensibility key points on which further development can expand and build upon, with a view to provide a spatial analysis clustering service layer that performs well under demanding usage scenarios, as well as being flexible for future extensions.

The document:

https://docs.google.com/open?id=0B38ndwEg0mv7SzNjcGpjRGhyWXc

Friday, 28 May 2010

Solution to Joe Celko's SQL puzzles & answers‎ - PUZZLE 9 AVAILABLE SEATS, Page 34

The Problem : You have a restaurant with 1,000 seats. Whenever a waiter puts someone at a seat, he logs it in a table of seats. Likewise, when a guest finishes a meal, you remove the guest’s seat number. You want to write a query to produce a list of the available seats in the restaurant, set up in blocks by their starting and ending seat numbers. Oh yes, the gimmick is that the database resides on a personal digital assistant and not a mainframe computer. As part of the exercise, you must do this with the smallest amount of storage possible. Assume each seat number is an integer. (extract from Joe Celko's SQL puzzles & answers‎ - PUZZLE 9 AVAILABLE SEATS, Page 34).

My set based solution with no procedural constructs like while loops or cursors) goes like this:

/* Start by creating a new blank database, will name it [Puzzle] */
USE [Puzzle]

/* creating the table, will occupy 1000 char X 1 byte each = 1000 bytes */
   CREATE
     TABLE dbo.Seats
                  ( Seats char(1000) )

/* then insert all seats as empty */
INSERT INTO dbo.Seats VALUES ( REPLICATE('0',1000))

/* Then we need a way of set each seat as free/occupied */
CREATE PROCEDURE [dbo].[Set_Seat]
   @seatNum smallint,
   @occupied bit
AS
BEGIN
SET NOCOUNT ON;

/* This statetement will split the char string in two parts, setting the middle with the occupied value, concat them all together again, taking care of any weird case as well. */
UPDATE dbo.Seats
     SET Seats = CASE
                            WHEN @seatNum BETWEEN 0 AND 1000
                                THEN LEFT(Seats,@seatNum-1)
                                  + CAST(@occupied as char(1))
                                  + RIGHT(Seats,(LEN(Seats)
                                  - @seatNum))
                            ELSE Seats
                        END
END
/* then use this stored procedure to update the table, i.e. the following statement will update the the seats no 1,5,10 as occupied. */
EXECUTE [Puzzle].[dbo].[Set_Seat] @seatNum = 1, @occupied = 1
EXECUTE [Puzzle].[dbo].[Set_Seat] @seatNum = 5, @occupied = 1
EXECUTE [Puzzle].[dbo].[Set_Seat] @seatNum = 10, @occupied = 1

/* after executing this the table will look like this */

Seats

10001000010000000....

/* then we will need to have a way to see which seats are free and which are occupied.
To achieve this will need a helper number table, which is obtained by cross join the sys.objects table of the   [Puzzle] db. Because the db doesn't contain many user created objects, will double cross join to it self so as to achieve a cartesian product. This will give us at least a 1000 rows containing system information. By applying the funtion ROW_NUMBER() we can get the amount of numbered rows we want. */

/* the free seats */
CREATE VIEW [dbo].[vFreeSeats]
AS
SELECT a.Seat
  FROM ( SELECT n as Seat
                   FROM ( SELECT row_number() over (order by n.object_id) as ID
                                    FROM sys.objects n
                                       CROSS JOIN sys.objects v
                                   ) D ( n )
               WHERE n <= 1000
               ) a
WHERE SUBSTRING((select * from dbo.Seats),a.Seat,1) = '0'
GO

/* and the occupied */
CREATE VIEW [dbo].[vOccupiedSeats]
AS
SELECT a.Seat
  FROM ( SELECT n as Seat
                   FROM ( SELECT row_number() over (order by n.object_id ) as ID
                                    FROM sys.objects n
                                        CROSS JOIN sys.objects v
                                 ) D ( n )
                WHERE n <= 1000
               ) a
WHERE SUBSTRING((select * from dbo.Seats),a.Seat,1) = '1'
GO

/* Obtaining a plan of the restaurant using the same patern with row_number() to get a numbering from 1 to 1000, and then LEFT JOIN to dbo.vOccupiedSeats and dbo.vFreeSeats */
CREATE VIEW [dbo].[vAllSeats]
AS
SELECT s.Seat,
                o.Seat as Occupied,
                f.Seat as Free
       FROM ( SELECT n as Seat
                        FROM
                         ( SELECT row_number() over (order by n.object_id) as ID
                             FROM sys.objects n
                                CROSS JOIN sys.objects v
                          ) D ( n )
      WHERE n <= 1000 ) s

       LEFT JOIN dbo.vOccupiedSeats o
          ON o.Seat = s.Seat
           LEFT JOIN dbo.vFreeSeats f
               ON f.Seat = s.Seat
GO

/* Showing the start and end number of continous availabel blocks of seats, taking care for special cases as well. */
CREATE VIEW [dbo].[vFirstLastSeat]
AS
SELECT f.[First] as FirstInBlock,
                l.[Last] as LastInBlock
  FROM (  /* get the first seat number of
                    each available continous block of seats */
               SELECT row_number() over (order by current_timestamp) as ID,
                             [First]
                   FROM ( /* use this in case the first free seat is no 1 */
                                 SELECT 1 as [First]
                                   FROM dbo.vFreeSeats f
                                 WHERE f.Seat = 1
                                  UNION ALL
                                 /* this will return the first seats on a
                                    continuous block, except when
                                     this is seat no1*/
                                 SELECT f.Seat as [First]
                                   FROM dbo.vFreeSeats f
                                     JOIN dbo.vOccupiedSeats o
                                        ON f.Seat = o.Seat + 1 ) f
                      ) f

     /* get the last seat number of each available continous block of seats */
     JOIN ( SELECT row_number() over ( order by current_timestamp ) as ID,
                                l.Seat as [Last]
                 FROM
                   ( /* this will return the first seats on a continuus
                         block, except when this is seat no 1000 */
                      SELECT f.Seat
                        FROM dbo.vFreeSeats f
                            JOIN dbo.vOccupiedSeats o
                               ON f.Seat = o.Seat - 1
                       UNION ALL
                       /* use this in case the last free seat is no 1000 */
                       SELECT 1000 as [Last]
                          FROM dbo.vFreeSeats
                        WHERE Seat = 1000 ) l
                  ) l

           ON f.ID = l.ID

/* The output of the previous statement */

FirstInBlock	LastInBlock
2	4
6	9
11	1000

/* Finally, if we didn't want to use the sys.objects table, we could have create a @tempTable with 10 null rows of bit datatype, and use this to cross join 3 times to give a helper number table of a 1000 rows. */

Thursday, 20 May 2010

Parallel Environments within Divine and Agnostic Algorithms of Creation

Extract from:
Sentient Future Competition European Workshop on Sensor Networks 2006 (EWSN '06) - Highly commended entries :
http://www.embedded-wisents.org/competition/pdf/bairaktaris.pdf

"Model and create a new environment using the Earth libraries we provide - minus the historical and contemporary ethical and philosophical classes. With the use of evolution algorithms, and by fast forwarding them for time and space complexity aspects, simulate genetical transformations, aiming the creation of intelligence within the system. You are expected to experiment with the parameters until the creation of viable and self contained entities (who will have the power to interact intellectually between themselves and their environment) are created. From the moment of creation and afterwards you should observe the behaviour of those entities but you are not allowed to interfere.

Your objective is to be God, and you’ll achieve this if the entities created develop analytical and philosophical qualities in the amount that they will start wondering from where and for what they’ re created for. Full marks will be granted if a form of technological advance is achieved by the entities. If this happens, as a bonus you are allowed to include in your project the full version of Embedded WiSeNts libraries – such this will ease the way of your entities to upgrade themselves the best way is possible in their future..."

Sunday, 16 May 2010

SQL POETRY : to @exist or not @exist - that is the query

start:
BEGIN TRAN
DECLARE @var INT, @exist VARCHAR
SET @var = 0

INSERT
    INTO Who
SELECT [to], [thine], [own], [self], [be], [true]
FROM Life
      JOIN Earth
        ON [lifetime] BETWEEN [birth] AND [death]
WHERE [time] = 'Now'
     AND [name] = 'You'

/* then */
IF (SELECT COUNT(*) FROM Who) > @var
BEGIN
     SET @exist = dbo.YourOwnFunction(@@IDENTITY)
END
ELSE
BEGIN
    SET @exist = null
    ROLLBACK TRANSACTION
    GOTO start
END

COMMIT

Thursday, 6 May 2010

Practical C - Mobile GIS

C.1 Limitations of this tool
ArcMap is able to perform analytical tasks on large volumes of data, but when compared to the latest developments in the mobile GI field using it to create a mobile directory seems like overkill.

Mobility is restricted as it operates on static datasets whilst requiring a client with significant processing capacity and an external server offering interaction and presentation GI services.

Using reverse-geocoding the tool is able to perform spatial queries depending on post-code information, however with each post-code indentifying between 1 and 100 unique addresses, the accuracy level of the spatial query performed is depended on the type of location clicked on the map (street numbers on larger roads usually have longer distance between them). In addition, the search tolerance setting on the ArcMap document, used within the VB script1 will influence heavily the post-code returned.

C.2 Is Google Local thorough in identifying available services? How about searching information other than 'services'; What type of limitations exist in querying spatial information in generally?

Google Local is a location-based service, accessible via thin client architecture. It allows querying by specifying various spatial search criteria and is supported by services2 responsible to convert the textual descriptions to spatial coordinates. However, with queries containing evaluative criteria such as ‘which is the best restaurants in Leytonstone’, the results span all over across London; the user needs to pan the map or filter based on distance to focus in the area preferred. When the term ‘best’, was replaced with ‘cheapest’, irrelevant results regarding property maintenance services were returned. More ambiguous qualitative queries like ‘the most expensive street in London’ did not return relevant results. It is understood that such a complex process of extracting semantically correct meaning out of verbose queries spans in many different scientific research fields, including linguistics, AI etc, therefore such inconsistencies in the results must be within the user’s expectations. But is apparent that the work that has been invested in such a tool has yield in a extremely efficient service with lots of options, which it can only be further improved.

C.3 Would a search of all Internet resources provide further information; What alternatives are there to using a postcode for spatial queries;

Extending the search using a search engine yields many results with potentially exact and relevant information; however there are fundamental differences in the algorithms behind search engines with those behind GI directory services which suffer from the lack of universally agreed geography type; An attempt to tackle this problem is provided by the GeoXwalk3 project: geometric computations introduce a mechanism to resolve spatially complex queries4 describing relationships between features with complex geographic boundaries. Further development has been done to enable advanced querying5 by “explicitly georeference implicitly georeferenced material” (geoXwalk Phase III - Final Report, 2004) .

C.4 Availability in other regions and countries

According to Google(6), there is an ongoing process on making this service available everywhere in the world, but this is not the case yet; though a workaround exists by creating a public My Map to list any business information which then can be searched using Google Maps7.

C.5 References

1. ‘ get the postcode
Dim postcode As String
postcode = getPostcode(pMxDoc.SearchTolerance, pPoint, pMxDoc.FocusMap)
2. Such as gateway location services, directory services, geocode services, route determination services etc.
3. Reid, James & Medyckyj-Scott, David., 2004. GeoXwalk Phase III – Final Report. Available at: http://edina.ac.uk/projects/geoxwalk/documents/geoXwalk%20Phase%20III%20Final%20report.doc[Accessed April 12, 2010].
4. i.e. ‘which rivers are near Banbury? ’ (geoXwalk Phase III – Final Report, 2004)
5. i.e. ‘find me all documents about Gaelic songs that do not reference the Western Isles or, find me images of towns along the river Tweed’ (geoXwalk Phase III – Final Report, 2004)
6. Maps features available in your country – Maps Help. Available at: http://maps.google.com/support/bin/answer.py?hl=en&answer=16634 [Accessed April 12, 2010].
7. http://www.google.com/support/forum/p/maps/thread?tid=7f1a2dcf8c70f97c&hl=en

Practical B - Internet GIS

B.1 Functionality, strengths and weaknesses

A core element of Web 2.0, mashups enable information sharing and interoperability by combining data from different external sources. With the adoption of web standards like RSS, online tools now allow the user to create and publish mash-ups with relative ease. Such tools are the Mapbuilder (tool for overlaying pin-points and tags over a map) and the Yahoo Pipes, with more sophisticated integration of various data sources. Both use cartographic and geographic data to produce a map; the user can then choose to add information related to location.

The produced examples have as intended audience the consumer. The Mapbuilder shows the locations of local sites in the island of Ikaria, Greece. The audience of Yahoo Pipes mashup could be someone interest in the world news with an option of accessing location related photos; this is achieved by overlaying the Reuters News RSS feed on a map using each story's location information, plus combining it with the Flicker Geo RSS service to produce a link on a relevant to location photo. There is a strong chance that the latest geo-tagged photo with a location same as the story will have relevant content though there is no guarantee of this.

B.2 Yahoo pipes usability and mashup implementation(4)

Yahoo Pipes is a Javascript application with a visual interface that includes various tools referred to as modules. Each module can accept a data source as input, manipulate it, and pipe its output to a different module. The process follows the imperative programming paradigm: inputs and outputs can be processed using modules like the Loop operator (iterating over the result-feed of its input), String functions, URLBuilder etc. The final feed can be georeferenced by using the Location Extractor module; given a string that contains a place name it attaches GeoRSS standard spatial coordinates to it. Inherent ambiguity limits its accuracy; different places with the same name do exist. The output stream of data can be exported as JSON or RSS feed. Of particular interest, is the YQL module: using sql-like syntax the user can query and filter the content of an RSS feed. This feature is used on the example pipe. Finally, the RSS-Item Builder restructures the modified data sources as an RSS feed, which then can be piped to the output.

B.3 Usability, flexibility and extensibility

Both engines are easy to use for simple mashups, though Yahoo Pipes extended capabilities can challenge users with little or no programming experience. Additionally, Yahoo Pipes user interface is computationally demanding; integrating data from different web-services in the client machine proved a difficult task without reliable and fast internet connection, with errors related to connection refusal. According to Yahoo, there is an ongoing redesigning of the data flow engine based on the YQL functionality(3). Finally, Yahoo Pipes Badge allows the map output to be embedded in any web page by providing a widget with a guid; an excellent example of how GI mashups can add value to a website free of cost.

B.4 References

1. Map Builder url: http://www.mapbuilder.net/users/panoramix95/83622
2. Yahoo Pipes url: http://pipes.yahoo.com/gita_abhp626/reuter_news_and_flicker_photos
3. Pipes Blog » Blog Archive » Connection refused and other Pipes issues. Available at: http://blog.pipes.yahoo.net/2009/12/10/connection-refused-and-other-pipes-issues/
[Accessed April 7, 2010]
4. Pipe implementation screen shot

Practical A - Geodesy and positioning

A.1 The shape and the route

Regent’s Park’s inner circle made a good start point to explore routes appropriate for track-logging with some kind of pattern. The route was planned using Google maps service; its vector based maps are simple without excessive visual variables creating clutter; thus facilitating pattern recognition. The produced trail when projected on the map can be seen as a stickman with arms extending left and right in a throwing motion and legs positioned in a manner that indicates throw-assisting movement. With a relative quantum leap of imagination there is a scarf tied around the neck as well. In hindsight, producing this pattern was an ambitious aim; the track was traced on foot, and took a considerable amount of time. Non-anticipated closed paths encountered around the ‘bucket’ area and are responsible for the ‘scarf’ detail. In a perfect world, the long neck would have been much shorter and the circle-head could contain additional detail within.

A.2 Accuracy, error and appropriateness of the map

The GPS track was collected uninterrupted during one afternoon; during data collection the sky was clear with good visibility, and the GPS was triangulating consistently using 4 satellites. The only time that the signal was lost, was due to street narrowness and existence of big trees obscuring the sky (Park Crescent Mews str. with Marylebone Rd). In this case the GPS lost visibility of all satellites, but the track path was not affected as the distance covered under such conditions was very short.

Is fairly obvious that the overall match of the GPS data and the backdrop map is quite good, making apparent that there was a sufficiently strong signal, tracking the route with high accuracy. There aren’t any particular mismatches, and where a multipath is plotted is due to the paths followed to draw the stickman’s hands and legs which were backtracked – thus track-logged twice. Occasionally, people and traffic lights necessitated diversion from a straight route.

The GPS track was collected according to the WGS841 geodetic system; to project on a 2D coordinate system, the track was transferred to ArcView via the GPSi interface and transformed using the OS British Grid2 by selecting the OSGB_1936_To_WGS_1984_Petroleum3 transformation, which allows for a deviation(4) up to 4 metres; thus there is certainly some degree of error since no exact transformation between two geodetic coordinate systems exist(5). During the transformation process a message ‘The output resolution is larger than the input feature class resolution’ was produced, due to the fact that the output geoprocessing tools in ArcGIS 9.2 is 53-bit7.

Finally, to overlay the track, two backdrop maps, one raster and one vector based were used. The scale of the raster map is 1:25,000, allowing the streets to be distinctly visible; both maps were downloaded from Edina. The track log, and the two maps where overlaid using ArcMap. Note that it was not necessary to create a new Geodatabase, and all layers where created by simple opening the Mastermap gz file.

A.3 References

1. A 3D geodetic coordinate system, that comprises a standard coordinate frame for the earth, with a spheroid surface. (http://en.wikipedia.org/wiki/World_Geodetic_System)
2. A system of geographic grid references commonly used in Great Britain.
http://en.wikipedia.org/wiki/British_national_grid_reference_system
3. This transformation uses the parameters recommended by Ordnance Survey for a Helmer(6) transformation; http://geometrybag.wordpress.com/2006/05/03/osgb-transformations-inside-arcgis/
4. A statistic value that represents the amount of disagreement among points with known longitude and latitude in a coordinate system, when compared with a different coordinate system. http://www.ordnancesurvey.co.uk/oswebsite/gps/information/coordinatesystemsinfo/guidecontents/guide6.html
5. Welcome to GPS Network. Ordnance Survey – Great Britain's national mapping agency. www.ordnancesurvey.co.uk/oswebsite/gps/information/coordinatesystemsinfo/guidecontents/guide6.html
[Accessed March 31, 2010].
6. Helmert transformation: The rotation and translation of a network of points relative to the Cartesian axes, while leaving the shape of the figure unchanged;
7. Technical Articles - ESRI Support. Available at: http://support.esri.com/index.

Monday, 28 September 2009

3.11 References and Resources

3.10 Information Architectures

As the technology advances, and Web technologies are becoming more and more ubiquitous there is an ever-growing need to deal with vast amount of data collected. In order to efficiently collect, store and extract information from these data, Information Architecture disciplines have developed and introduce several techniques which facilitate this. One of the most widely used and supported technology is the Relational Database Management Systems or RDBMS.

RDBMS implement special data structures represented as tables and relationships between those tables and the data they contain are enforced using special indexes commonly known as primary and foreign keys. The database schema supporting a IT system is one of the core subsystems; a good DB design will help towards efficient response times in user request. Modern RDBMS are incorporating subsystems and mechanisms that help towards availability, reliability and scalability (Connolly & Begg, Addison Wesley, 2005).

In contrast with Object Oriented architectures which use custom objects and data types, RDBMS are based on user created entities as tables with attributes as columns and the data contained represented with integral data types such as integers, decimal and varchar. Such data types can be very flexible in the storage they demand (SQL Server 2008 Books Online).

From my experience, the choice of the datatypes used when designing a DBschema is very crucial. It is tempting to always opt for a bigger datatype (i.e. int instead of smallint or tinyint) even when you know that this column won’t store any values bigger than 1 or 2 bytes. This has pros and cons: doing so will ease the future scaling of the system if the requirements change, but it won’t produce any errors if the client application tries to store values that are inconsistent with the behaviour of the entity the table represents, violating the data integrity without informing the user, thus more prone to error results.

3.9. Client side programming

Working as professional software developer includes a lot of programming; one could assume that I would be bored of developing such a basic client application. But my work mainly involves middle-tier and back-end database programming so I took the time to develop something that uses some of the techniques described on this blog with a view to enhance my understanding of the DOM model and other front-end technologies.

I was indented to use the meta information of BBC pages with regards of which urls to display, but I’ve stumbled across the same-origin-policy which prevents JavaScript to access pages on different sites. So the choice was to hold the urls within the application, which was developed in two faces.

- Initially the user selection was filtered by using pre-populated select html elements, with onchange() events attached to display the appropriate url on the page. This is still happening, but with a few modifications and additions described below.

- When Session 08 described the way modern search engines work, regarding inverted files etc, I’ve consider it a better challenge for me to try and emulate some of those. I certainly didn’t expect to face so many peculiar challenges imposed by current modern browsers such as:

The use of XML data source for the drop-down lists and to create a basic emulation of the IR techniques. Faced with the challenge of different ways an xml document is loaded by different browsers. Mozilla and recent versions of IE uses the XMLHttpRequest() object, though older IE versions use the ActiveXObject("Microsoft.XMLHTTP") approach (w3schools).
The use of XPath to extract XML element and attributes values greatly simplified traversing the XMLs, however there were cross-browser issues related to how the XPath expressions are used. Thus i've implement cross-browser functionality, based on recomendations by w3schools.

More info about the application inside the commented code. This is the result:
http://www.student.city.ac.uk/~abhp626/BBCinfo.html

3.8 Information Retrieval

Surely everyone who has used Google’s search engine has at one point left bewildered with the amount of information even the most obscure query returns. Many times I’ve deliberately searched using made-up words, and got back at least some loosely related results – leaving me with the warm feeling that there are other like-minded people there who wander on the art of Information Retrieval or IR.

IR is the process that allows retrieving information related to a user’s requirements. Differences between querying with a view for IR and querying RDBMS stems from the way the information is organised: in the DB environment, information is structured and related to the underlying business model, plus the process is deterministic - same query by different users will retrieve the same results. In contrast, due to the unstructured nature of the information that exists in many different formats and media and the subjective relevance of a user’s perspective, an IR query could return different results, and is highly probabilistic .
Techniques used in IR include removing stop words, stemming, and identifying synonyms in order to create document indexes. A widely used type of index is an inverted file, which is a list of terms, pointing to a list of relevant documents. Additionally, more complex queries can be constructed using Boolean algebra operators like OR and AND. (Macfarlane, A., Raper , J. & Dykes, J., Lecture 08: Information Retrieval)

The algorithms used by modern search engines in order to efficiently retrieve information are highly kept secrets – like Google’s PageRank. The penetration of the web in modern life, and the need of brand recognition, lead the importance of search engine’s ranking grew stronger and stronger (source: SEMPO Survey). As a consequence a new IT field has arise, namely Search Engine Optimization (SEO), aiming to implement various ranking improvement methodologies with a view to draw more visitors in a client’s site.

3.7 Databases

Database Management Systems (DBMS) are software programs that can be considered the foundation that any organisation using modern computer systems builds upon; when compared with the file approach used in the early days of computers, they offer considerable advantages in managing information, by allowing simultaneous access in centrally stored and systematically organised data. Additional benefits from DBMS use include data independence, reduced data redundancies, quick data recovery and enforcing security policies. (Connolly & Begg, 2005).

Relational databases are the most widely used DBMS on which information is stored in data structures that can be visualised as two-dimensional tables, each table representing a distinct entity of a model of a system. A good database design which maintains referential and relational integrity among the data stored is fundamental for a successful implementation of an organisation’s system, and helps towards efficient data retrieval and scaling-up. Using structural query language (SQL) queries can be constructed which will return data based on filtering criteria, possibly by joining one or more related tables.(source: Butterworth, R., Lecture 07: Structuring and querying information stored in databases)

Modern databases (like MS SQL Server 2008) are now spatially-enabled and can be used in the field of GIS. Along with the native SQL data types, new spatial types allow geo-coded information to be stored and indexed, on which then spatial operations can be performed, with the results visualised on a map, allowing useful information which is hidden within the data to be extracted (Longley, P. Goodchild, M. Maguire, D, and Rhind, D. (2005)).

As an example using data describing property locations and property prices, and using post-codes to get longitude-latitude coordinates, a query can be constructed which will display on a map the properties which have beed sold with a price over a given amount:

SELECT p.PropertyID,  p.Address, p.PostCode, max(s.SalePrice) as MaxPrice, c.Latitude,    c.Longtidude
FROM dbo.Property p
   JOIN dbo.PropertySale s
     ON p.PropertyID = s.PropertyID
      JOIN dbo.Coordinates c
        ON p.PropertyID = c.PropertyID
GROUP BY p.PropertyID, p.FullAddress, p.PostCode, c.Longtidude, c.Latitude
HAVING max(s.SalePrice) > 220000

PropertyID	FullAddress	PostCode	MaxPrice	Latidude	Longtitude
1012	22 Mornington Road	E11 3BE	285000	51.5696	0.0143
1013	42 Abbot's Park Road	E10 6HX	222000	51.5725	-0.0039

3.6 CSS

The success of the Internet lead to more visually and semantically complicated web documents. As a consequence the HTML code describing a web page was being cluttered by styling information and this had an adverse effect on the quality of the code. Additionally the need to maintain common aesthetics for related web pages lead the W3C to create the DOM(Document Object Model) and the CSS(Cascading Style Sheets) standards to encourage a better and more efficient web design.

DOM is a concept that takes advantage of the XML structure of the XHTML markup and provides ways to traverse, access, extract and apply properties of the elements included in an HTML page.
It considers every HTML document as a tree with hierarchically arranged elements as its nodes which in turn can have more nodes as children or can contain text (REF). Different web technologies (like JavaScript) are implementing the DOM model in different ways, but the premises are the same.

CSS uses special syntax to access and apply visual styles to any elements inside an HTML page and instructs the web browser on how to display those. It helps separating style from content by allowing single CSS files to be used by multiple web pages on the site, allowing changes of the look of those at the same time. This additionally benefits network traffic, as the browser cashes the files and doesn’t need to download them all the time.(REF)

Not all web browsers interpret CSS tags the same way. Older versions of Internet Explorer have a very limited support which resulted in scrambled page styling. Web developers have found and implement workarounds to bypass browser specific limitations, however this sometimes leads to cryptic and intelligible CSS code, violating the separation of style from content principles.

On my personal web space I’ve used CSS to provide a consistent look and feel, and there are examples of CSS inheritance and override via the class and id attributes.

3.5 XML

XML...is so simple and elegant as a concept, that one now wonders how is possible that this has not been invented years and years ago. In fact, it is very hard to think any computer data that cannot be described by some XML structure. W3C created XML to allow web documents to be easily interpreted by humans and computers as well, and is now widely used for the representation of arbitrary data structures, i.e. in web services, as well as the underlying model of several data formats for different desktop applications (Microsoft Office 2007 uses docx format, or Sun's Open Office).

XML is not a programming language as per se. Is a descendant of SGML, with user created mark-up and tags which provide the documents with the required semantics necessary to be successfully interpreted by the applications which uses them, thus greatly enhance the interoperability of different computer systems. In order to do so, Document Type Definition(DTD) and XSD schemas define the structures based on which XML documents are created and validated against, with the latter providing additional support for data definition and datatypes.

On the web front, XHTML, an extended HTML version, added well-formed and case-sensitive restrictions and as a consequence of this, XHTML documents can be processed using standard XML tools, like XPath (used to traverse the logical structure of an XML document in order to query and extract the encapsulated data), as well as the XSL Stylesheet Language which allows the transformation of an XML document into another.

Finally, modern databases (like MS Sql Server 2005 or Oracle), have incorporated the XML technology, by supporting native XML datatypes and integrated versions of tools like the XQuery which allows access and navigation of xml documents based on XPath 2.0.

3.4 Images and Graphics

Since early 80's home computers, there was a very important decisive factor when computers where compared: which has the best graphics. Still in our days, the need of an ever better representation and manipulation of graphics on our computer screens is one of the driving forces behind the computer technology (Graphics hardware,England, N,Computer Graphics and Applications).

In computer applications graphics are represented digitally using vector or raster formats. The vector format uses points in space which may be connected with lines. We use vectors to represent graphics which are scalable and with well defined limits. Graphic and 3D design artists use software that handles complex vectors to design their artefacts, and GIS applications use vectors to represent discrete objects like rivers or create different layers of information overlaid to raster maps.

On the other hand, the raster format can represent heavily detailed images. Imagine a grid, where each cell - some may even call it pixel - contains a binary value with information regarding the color of the cell. A raster file is a series of such bits, with an initial header specifying the gird's dimensions.

Compression techniques have evolved to help the distribution of large raster files over the networks. Lossless formats like the 8-bit GIF uses indexes to hold pixel information and to recreate the image on the screen. Web designers can create complex backgrounds using small GIFs by programmatically repeating them on the web page. The lossy 24-bit JPEG format produces smaller file sizes by eliminating bits of information based on sophisticated algorithms and is widely used to store complex imagery such as photographs.

The 24-bit PNG format tackles both problems of large files and data loss by using indexes like the GIF. Is very useful for web-galleries when JPEG is used to create thumbnails but the user downloads a high quality PNG.

Source: Butterworth, R. & Dykes, J., Lecture 04: Graphical information.

3.3 Internet/WWW

With one breath: A world-wide network of networks of networks, built on a client-server architecture, which uses existing lines of communication to exchange digital messages split into 'packets' of data and encoded via through network controller hardware using special protocols like http, ftp, telnet, and IPv4, enabling to connect and communicate with each other multiple types of computers (ranging from powerful, to less powerful, up to totally useless) - mostly sharing hyperlinked text and multimedia documents, or increasing computer power by combining processors from a number of machines 'over the cloud' . The DNS service translates the IP addresses to more human readable domain names, pointing to servers on which web applications are hosted, offering various services ranging from the ultra-professional enterprise scale, up to the most basic html blinking text web page, allowing software clients such as Firefox or Internet Explorer to access those through the aforementioned communication channels using URL's. (Internet - Wikipedia, the free encyclopedia).

Working as a software developer, for a company that builds web applications, I've added my bit of html in some of the zillions of pages existing on the web right now - although most of the times deployment is an automated process and I don't think too much about it. Though for this part of my MSc course, to set up a simple web site, I’ve used all of the above protocols and technologies in this order:

Created three basic html pages with links to other websites - in one page I've used Google Earth's API and the iframe tag to show off my favourite place in the world. I thought that it made sense considering I'm doing an MSc in Geographic Information Science. Then used FTP (via SSH) to upload these files to my City file space, and published the pages using Telnet and the UNIX console: http://www.student.city.ac.uk/~abhp626/index.html

3.2 Text/HTML

All information used in computer systems is transformed in a series of bits 0 and 1, represented by an electromagnetic state, and stored in a digital media. A series of bit, called byte, is the smallest addressable element for a given computer architecture and can represent anything we want to, be it alphanumeric characters or a pixel in a bitmap image etc (Butterworth, R. & Dykes, J).

Collections of digital information referred as files contain series of bytes, and computers interpret them according to rules known as file formats. The operating system is aware of the files physical locations on the disk enabling users to access them via an interface. Additional data, called metadata exist within the files providing information about the contents. Search engines categorise the web pages using metadata information stored in HTML meta elements. (Butterworth, R. & Dykes, J).

But information is not just text, could also be presented in the form of a graphics. Different technologies provide mechanisms to enable graphical information presented in electronic documents. Embedding, or file-centred view, facilitates the document distribution in different environments and has all graphics and external data included in the binary file. Linking, or document-centred view used in a local environment and has any external data linked, with changes to the linked data reflected on the container document.(Butterworth, R. & Dykes, J).

On the Internet, the usual approach is document-centred; the web pages contain links to the server's filesystem and the browser then http-request the image from there. Other options use a file-centred view via data urls : instead of linking to an image stored locally on the server, the image is provided within the URL itself as a base64-encoded string. A drawback to this method is that the browser is not caching the image and downloads it every time is used, but this can be minimized with the use of CSS(http://www.sveinbjorn.org/dataurls_css).

3.1 Introduction

Once upon a time, the Greek philosopher Aristotle on his work Organon suggested a system of logic based on only two types of propositions: true and false. Several hundreds year later George Boole gave symbolic form to Aristotle's system of logic by codifying relationships of mathematical quantities limited to one of two possible values: 1 or 0. Claude Shannon of MIT recognized how Boolean algebra could be applied to on-and-off circuits, where all signals are characterized as either "high" (1) or "low" (0). Modern binary system, documented by Gottfried Leibniz in the 17th century uses 0 and 1 to handle any problem decimal arithmetic could deal with. Computers were born.

Since then lots of water has run under the bridge, and now we are living in the Information Age. Internet and the technologies behind it, have transformed the way people have access to information, and this had a tremendous effect in every aspect of human life, from personal to professional and more(Stanford Inst. for the quantitative study of society). And the Blog service is one of those technologies that since have attribute to that (Du, H.S. & Wagner, C., 2006. Weblog success: Exploring the role of technology).

This blog is created as part of the coursework for the DITA module of my MSc course on Geographic Information Systems. I've chosen to use the http://www.blogger.com/ service to host it, due to being one of the few blog services I know that many people use, as well as because I find it an easy to use service in general. Is quite fast and gives you powerful tools that allow you to create/edit/admin and modify the contents of your blog in many levels: a novice user would be able to alter/modify the blog's contents as to achieve a more professional or personal look and feel, and the web developer could add additional functionality with client scripts that enhancing the overall functionality and impression of the blog. I'll let you to explore it with a hope that my entries will do justice to it.