Jane's Technical Stuff

Monday, July 28, 2008

Full Text Indexing - the impact of index time and query time language choice


Following on from my More on SQL Server 2005 Full Text Index Service post the other day, I thought I'd give an example of how it works

Setup


I created a table LanguageData which consisted of 2 fields liID and sValue

CREATE TABLE [dbo].[LanguageData]
(
  [ID] [int] IDENTITY(1,1) NOT NULL,
  [Value] [nvarchar](50) NOT NULL,
  CONSTRAINT [PK_LanguageData] PRIMARY KEY CLUSTERED
  (
  [ID] ASC
  ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


I entered some sample data as follows

INSERT INTO [LanguageData](Value)
SELECT 'the' UNION
SELECT 'przed' UNION
SELECT 'jakby'


where 'the' is featured in the English and Neutral language noise word files, 'przed' and 'jakby' are in the Polish language noise files. Note: You'll need to have installed the Polish full text index to make this work.

Next enable the full text indexing on the database

sp_fulltext_database 'enable'

and then create a full text catalog and an index for the table LanguageData

CREATE FULLTEXT CATALOG LanguageData AS DEFAULT
CREATE FULLTEXT INDEX ON LanguageData ([Value] LANGUAGE 1045 )
KEY INDEX [PK_LanguageData]


where 1045 indicates the language Polish - retrieved from
SELECT alias, lcid FROM Sys.syslanguages
WHERE alias = 'Polish'


Scenarios


Now, time to run some tests,

1) Check that all is initially correct, get everything

SELECT * FROM LanguageData

which returns 3 rows, as expected

2) Get everything which matches the noise word 'jakby'

SELECT * FROM LanguageData
WHERE CONTAINS(*,'jakby')


returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)."

3) Get everything which matches the noise word 'jakby' specifying Polish (1045) in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby', language 1045 )


returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)."

4) Get everything which matches the word 'jakby' specifying US English (1033) in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby', language 1033 )


returns no rows as the word 'jakby' was stripped out at index time. No warning message is displayed though as 'jakby' is not a noise word for US English

5) Get everything which matches the word 'the'

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the')


returns one row, as 'the' isn't a noise word in Polish and so wasn't stripped out at index time or at query time

6) Get everything which matches the word 'the' specifying Polish in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045 )


returns one row, as 'the' isn't a noise word in Polish and so wasn't stripped out at index time or at query time

7) Get everything which matches the word 'the' specifying US English in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1033 )


returns no rows as 'the' is a noise word in US English and therefore is excluded at query time. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed

Now to make it more interesting, lets add some data which combines noise words with normal words

INSERT INTO [LanguageData] (Value)
VALUES
('jakby przed the test')

which includes 2 polish noise words, one english noise word and one remaining word

8) Get everything which matches the word 'jakby'

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby')


returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)." is displayed

9) Get everything which matches the word 'the'

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the')


returns 2 rows, both the individual 'the' entry and the new 'jakby przed the test' rows. No message is displayed.

10) Get everything which matches the word 'the' using an explicit query language of Polish

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045)


returns 2 rows, both the individual 'the' entry and the new 'jakby przed the test' rows. No message is displayed.

11) Get everything which matches the word 'the' using an explicit query language of English

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1033 )


returns no rows as the word 'the' was stripped at query time according to the noise words for 1033. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed

And then to make it even more interesting, lets add a new word 'jane' to the LanguageData dataset, and to the noisewords file for the Neutral language (LCID 0) which (on my machine at least) is at C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseNEU.txt

To get the full text indexing service to pick up the changes to the noise files, you need to restart the service via the Control Panel -> Administrative Tools -> Service dialog

INSERT INTO LanguageData (Value)
VALUES ('jane')

12) Get everything which matches the word 'jane' using the implicit query language (Polish)

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane')


which returns 1 row, as 'jane' isn't a polish noise word and wasn't stripped out at either index or query time

13) Get everything which matches the word 'jane' using the explicit query language English

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane', language 1033 )


which returns 1 row, as 'jane' isn't a polish noise word and so wasn't stripped out an index time, neither is it an english noise word so isn't stripped out at query time either

14) Get everything which matches the word 'jane' using the explicit query language Neutral

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane', language 0 )


which returns 0 rows as 'jane' is a neutral noise word and so is stripped out at index time. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed

Summary


What this shows, is that when you choose a language to set your full text index up as, this impacts the words which will be stripped out of the index as anything defined as noise will be removed. This has an impact on the choice of language when different language content is being indexed as we need to be clear that what is one languages noise word, isn't another ones non-noise word.
  • When querying a full text index, it is possible to specify that the query you are running is for a particular language, but if you do and if the language is different to that you set the index up as, then you'll remove 2 sets of noise words from your search - both those that were set up when the index was defined, but also those based on the language specified in the query
  • The noise files are defined on an instance by instance basis and so any alterations to the noise file will affect all full text indexes on an instance.
  • To pick up changes to the noise files, the service needs to be restarted.
  • SQL Server 2008 seems to change this and so more research will be required - it relies on STOPLISTs instead.

Labels: , , ,

// posted by Jane @ 4:30 PM   save to del.icio.us

Comments:

Thursday, July 24, 2008

More on SQL Server 2005 Full Text Index Service


In my previous post about How to work out which are valid full text languages on a SQL Server 2005 instance I referred to sys.syslanguages and sys.fulltext_languages in my queries, but didn't really say much more about them, so here goes

sys.syslanguages


In the definition on MSDN it states
"Contains one row for each language present in the instance of SQL Server 2005. Although U.S. English is not in syslanguages, it is always available to SQL Server."

And one thing on the choice of U.S. English vs UK English. The SQL Server Full Text Search: Language Features says
"In actual fact UK English does not refer to the Queen's English or the English used in the United Kingdom, but International English; the English that is used in all other English speaking countries other than US English."

As an English person, living in England and speaking English I find this a somewhat grating use of the phrase UK English. Bah!

sys.fulltext_languages


In the definition on MSDN it states
"This catalog view contains one row per language available for full-text indexing/querying operations. Each row provides an unambiguous representation of the available full-text linguistic resources that are registered with Microsoft SQL Server. The name or lcid can be specified in the full-text queries and full-text index DDL."

The list in this table, doesn't match those in sys.syslanguages. These are purely the full-text-indexable languages. As I mentioned in my previous post 6 languages can be added by following these instructions. The line
"The name or lcid can be specified in the full-text queries and full-text index DDL."
refers to the ability to issue the following SQL:
SELECT *
FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045 )

which indicates that the locale used for querying should be 1045, which equates to Polish. I have some sample SQL to post in the next few days which demonstrates the difference between indexing and querying language choices.

In General


I've been doing quite a bit of work with trying to understand how the SQL Server 2005 full text index works, and how the language choice impacts it. My knowledge of full text indexing as a whole to this stage hasn't been great, so I've done quite a lot of background reading. Amongst the best resources I've found are: both by Hillary Cotter which provide a really simple, but yet pretty comprehensive introduction to the various features of indexing and querying using the Full Text Index service.

Labels: , , ,

// posted by Jane @ 8:02 PM   save to del.icio.us

Comments:

How to work out which are valid full text languages on a SQL Server 2005 instance


Despite SQL Server 2005 supporting 33 languages (found by issuing SELECT * FROM sys.syslanguages), not all of these are available for the full text index service. To find out which ones are run the query:
SELECT *
FROM sys.fulltext_languages


On my machine, this returns the following languages:
  • British English
  • Chinese (Hong Kong SAR, PRC)
  • Chinese (Macau SAR)
  • Chinese (Singapore)
  • Simplified Chinese
  • Traditional Chinese
  • Dutch
  • English
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Neutral
  • Spanish
  • Swedish
  • Thai

An additional 6 languages are supported and available for a separate install. These are :
  • Danish
  • Polish
  • Português (Brasil)
  • Portuguese
  • Russian
  • Turkish

To install these, follow the instructions here.
The following languages are not supported for full text searching at all within SQL Server 2005:
  • Arabic
  • Bulgarian
  • Croatian
  • Czech
  • Estonian
  • Finnish
  • Greek
  • Hungarian
  • Latvian
  • Lithuanian
  • Norwegian
  • Romanian
  • Slovak
  • Slovenian

SQL Server 2008 offers more full text language support bringing the total of available languages to 50. It would appear that Danish, Polish and Turkish remain installable additions.

Labels: , , ,

// posted by Jane @ 7:36 PM   save to del.icio.us

Comments:

Wednesday, July 23, 2008

Localisation, Javascript and extended character sets in Visual Studio 2005


I'm currently doing some work on looking into producing a localised version of a Madgex job board (not dis-similar to the work I did this time last year)and am mainly looking at the SQL Server and javascript areas whilst a colleague looks at the .NET side. Glenn gave me a tip off that when he'd been doing something similar, he'd had problems with Visual Studio 2005 not saving his javascript files as UTF.

So, within VS2005 I created a javascript file and put 2 lines into it. They were simply:
alert ('hello world');
alert('Zarys gramatyki por¢wnawczej jezyk¢w slowianskich');


I then linked this into a (very) basic HTML page

<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
  <title>i18n</title>
  <script type="text/javascript" src="js/i18n.js" language="javascript"></script>
</head>
<body>
</body>
</html>


so that on pageload 2 alert boxes are displayed, one saying 'Hello World' and the one saying 'Zarys gramatyki por¢wnawczej jezyk¢w slowianskich'.

Unfortunately what is displayed instead is:

which isn't exactly what I had in mind.

I opened the file in Notepad++ (my text editor of choice) to take a look at the file type and it is, as I'd expected, saved as ANSI, not UTF-8 or UTF-16


I used Notepad++'s menu item Format -> Convert to UTF-8 to convert this file from ANSI into UTF-8, and then re-ran my test and all works correctly as expected. Hurrah!

I then repeated this using VS2008 and found that this is one of the fixes over VS2005.

So, the alert now correctly displays:

and when opened in Notepad++ the file is now, correctly, UTF-8.

Labels: , , ,

// posted by Jane @ 3:43 PM   save to del.icio.us

Comments:

Monday, July 21, 2008

Post-Implementation/Post-Project Reviews


In my almost 15 years of development, I've found little more beneficial than a well run Post-Implementation Review meeting. I find them a great to way to learn, improve and ensure that the next project goes more smoothly than the one before.

What is a Post-Implementation/Post-Project Review?


It is a meeting held at the end of a project at which people who have contributed to the project as a whole get an opportunity to discuss the highs and lows of the project.

My preparation normally involves thinking back over the course of the project and thinking about:
  • what went well?
  • what didn't go so well?
  • what we could do better next time and what lessons we can learn
  • How well the project was analysed and specified
  • how well the project was managed
  • how well the testing phase went - bugs found in testing vs UAT vs post-live
  • how well the handover to support went
  • how well the original time estimates reflected reality
  • how well specified the infrastructure was - were the original estimates on page impressions etc valid

The most recent one I attended was run in the order of the project, so feedback was made first against the Sales process, then the Analysis process etc. This worked pretty well, but did mean that the last few stages of the project were rushed to ensure that the meeting finished on time. Alternatives include asking the "What went well?", "What didn't go well?" questions of every person in the room. This ensures that everyone gets their say but does involve preparation on behalf of every attendee (no bad thing).

Who should be there?


For me, the ideal meeting should include everyone who has been involved with the project, from start to finish - in some cases this could be a lot of people but every function should be represented - so definitely Sales, Analysts, Project Managers, Developers, Support and Systems. Every person should have an equal opportunity to speak.

When should it happen?


Usually, after the project has gone live and been handed over into a support phase. In some cases a project can last too long, and if the project is scheduled to take more than 6 months, its probably worth having 6 monthly review meetings to ensure that key learnings aren't forgotten, or that subsequent projects can learn and improve quickly. These shouldn't replace the Post Implementation Review but should supplement it.

What should happen afterwards?


The final part of the meeting should be a quick review of the "What went well?" items, and of the "Key learnings". Someone should be tasked with producing a document which should then be circulated outlining the key learnings from the project, and also the highs - the lows should be kept within the team and learnt from but not circulated - it shouldn't be a shaming exercise but should be a great motivator. Any individuals charged with process review, or implementing changes to current/ongoing projects should be informed of the key learnings to ensure these learnings are escalated and implemented as quickly as possible.

Labels: ,

// posted by Jane @ 7:31 PM   save to del.icio.us

Comments:

Friday, July 18, 2008

TSQL: Enumations and constants


Bruce sent me a link the other day to an article T4 template for generating SQL view from C# enumeration which I found interesting from a modelling constants/enumerations in SQL viewpoint.

The example used was modelling an enumeration of ContactType which has valid items of Individual and Organisation.

The article used a view to model this, as per
CREATE VIEW enumContactType
AS
  SELECT
    0 AS Individual,
    1 AS Organization


and then using it within a SELECT as

SELECT *
FROM Contact
WHERE Type = (SELECT Organization FROM enumContactType)

(Note: in the original article Oleg used a schema called enum, but I'm just ignoring this at the moment and have thus changed the name from enum.ContactType to enumContactType)

An alternative


In my previous company, we used Scalar-Valued Functions to mimic constants, and I guess this could be extended to enumerations. I thought I'd re-create the above example and give it a try to see how it looks and compares.

So, to model the enumeration ContactType, I've created two functions as follows:
CREATE FUNCTION enumContactTypeIndividual()
RETURNS INT
AS
BEGIN
  RETURN 0
END
GO

CREATE FUNCTION enumContactTypeOrganisation()
RETURNS INT
AS
BEGIN
  RETURN 1
END
GO


And then to reproduce the SELECT query I wrote:
SELECT *
FROM Contact
WHERE Type = dbo.enumContactTypeOrganisation()


The resulting data matches that used in the VIEW model and provides an alternative. I'm sure that a template could be written to produce those functions as an output as per the end part of Oleg's article.

Performance and timings


I was interested in the relative performance of these two methods, so armed with my timing code from last week I checked them out. I amended the SELECT to bring back the COUNT(*) FROM Contact into a local integer variable, and ran it 1000000 times.

The results are as follows:
CodeDescription TimeInMS
EnumViewUsing the view13010
EnumUDFUsing the UDF21450

showing that the view method is more performant.

I then changed the function to make use of SCHEMABINDING. The new functions look like:
CREATE FUNCTION enumContactTypeIndividual()
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
  RETURN 0
END
GO

CREATE FUNCTION enumContactTypeOrganisation()
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
  RETURN 1
END
GO


And the timings change to be:
CodeDescription TimeInMS
EnumViewUsing the view13010
EnumUDFUsing the UDF20280

which do reduce the time taken for the UDF but still means that the view is faster.

For interests sake I then ran a comparison timing against the code using the literal as:
SELECT *
FROM Contact
WHERE Type = 1

which resulted in
CodeDescription TimeInMS
EnumLiteralUsing the literal12043

showing it is faster, but not by much, than the view.

Summary


So, what has this shown?
  • Using a view is quite efficient and effective for modelling enumerations
  • Using a UDF is an alternative, but is slower
  • Schema binding makes UDF usage quicker
  • The difference between using a VIEW and using the hard-coded literal isn't a lot in perfomance terms

Labels: ,

// posted by Jane @ 1:24 PM   save to del.icio.us

Comments:
Great article. Thanks Jane! I would not have expected a UDF to be slower than a VIEW. Any ideas on why?
 

Tuesday, July 08, 2008

INFORMATION_SCHEMA views


As I alluded to the other day, I'm gradually weaning myself off my dependency on (the fairly ugly) sys.objects, sys.columns etc as a way to query the meta data about my database. Instead I'm using the SQL-92 compliant INFORMATION_SCHEMA views.

Information schema views provide an internal, system table-independent view of the SQL Server metadata. Information schema views enable applications to work correctly although significant changes have been made to the underlying system tables. The information schema views included in SQL Server 2005 comply with the SQL-92 standard definition for the INFORMATION_SCHEMA.


So now, when I'm writing database upgrade scripts and attempting to write defensive SQL (which is my usual position these days, regardless of whether I think the script will be run more than once - lets just say I've learnt from making such assumptions) I usually wrap
ALTER TABLE
statements within
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name = 'MyTable' AND Column_Name = 'MyNewColumn'),
CREATE TABLE
statements within
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE Table_Name = 'MyTable') etc

The main area that I have to revert to the sys views for is indexes, and finding out what columns are included in which index, which is the uglier, but no-less-effective
SELECT
  OBJECT_NAME (i.object_id) AS Tablename,
  i.name AS IndexName,
  c.name AS ColumnName,
  CASE ic.is_descending_key
  WHEN 1 THEN 'DESC'
  ELSE 'ASC'
  END as ColumnSort
FROM sys.indexes i
INNER JOIN sys.index_columns ic
ON i.object_id = ic.object_id
AND i.index_id = ic.index_id
INNER JOIN sys.columns c
ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
INNER JOIN sys.objects o
on c.object_id = o.object_id
WHERE o.type = 'U'
ORDER BY TableName, indexName, ic.key_ordinal


MSDN has an interesting article Querying the SQL Server System Catalog FAQ which has examples for finding out (using the various object catalog views) many different areas of meta data across a SQL Server 2005 database and is worth using as a starting point.

Labels: ,

// posted by Jane @ 2:20 PM   save to del.icio.us

Comments:
Cool, I've just INFORMATION_SCHEMA that to save me scrolling through the enormous MSSMS treeview. Love it!
 

Monday, July 07, 2008

TSQL - Timings


A while ago I blogged about how to get the date element of a datetime column in TSQL. In that post I said
I would probably have done it via a CONVERT/CAST operation, converting to a VARCHAR and then back to a DATETIME, but this is a much more efficient method.

but I didn't prove it at the time. I gave it some more thought and wanted to know what the differences were, so I wrote some timings code.

This script creates one table Timings with columns of Code, Description, ActionTime and IsComplete. It has a combined primary key of Code and IsComplete. Code must be unique - and can be a string of up to 10 characters long to uniquely identify the action being timed. IsComplete is used to differentiate between the start time and end time of the process being monitored.

The script also creates 3 stored procedures:
  • up_RecordStart which takes 2 parameters - the unique code and optional description. This is used to record the start of the activity being monitored.
  • up_RecordEnd which takes just 1 parameter - the code - should match the code used in up_RecordStart. This is used to record the end of the activity being monitored.
  • up_GetTimings which again takes just 1 parameter - the code to return the timings from. It then returns the Code, Description and the length of time the action took in ms.

I wrote some script to then use these objects to test the assertion I made that FLOOR and combinations of converting DATETIME to FLOAT etc would be more efficient than using either CAST or CONVERT to VARCHAR(12) and back again to a DATETIME.

------------------------------
-- Clean up before we start --
------------------------------
DELETE FROM Timings
WHERE Code IN ('FLR','CONVERT','CAST')
GO

---------------------
-- Try using Floor --
---------------------
EXEC up_RecordStart @Code='FLR', @Description='SELECT CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))'
GO

DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0

WHILE @i < 1000000 -- try the next statement for 1000000 times - this should be enough to see some differences
BEGIN

  SET @floorDate = CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))
  SET @i = @i + 1
END
PRINT @floorDate
GO

EXEC up_RecordEnd @Code='FLR'
GO

-----------------------
-- Try using convert --
-----------------------
EXEC up_RecordStart @Code='CONVERT', @Description='SELECT CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))'
GO

DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0
WHILE @i < 1000000
BEGIN

  SET @floorDate = CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))
  SET @i = @i + 1
END
PRINT @floorDate
GO

EXEC up_RecordEnd @Code='CONVERT'
GO

--------------------
-- Try using Cast --
--------------------
EXEC up_RecordStart @Code='CAST', @Description='SELECT CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)'
GO

DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0
WHILE @i < 1000000
BEGIN

  SET @floorDate = CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)
  SET @i = @i + 1
END
PRINT @floorDate
GO

EXEC up_RecordEnd @Code='CAST'
GO

-------------------------
-- Now get the timings --
-------------------------
EXEC up_GetTimings 'FLR'
GO

EXEC up_GetTimings 'CONVERT'
GO

EXEC up_GetTimings 'CAST'
GO


This results in the following data being returned:
CodeDescription TimeInMS
FLRSELECT CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))1313
CONVERTSELECT CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))3236
CASTSELECT CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)3203

which shows that the method using FLOOR is more efficient, and that there isn't a lot to chose between CONVERT and CAST

Labels: ,

// posted by Jane @ 7:11 PM   save to del.icio.us

Comments:

Friday, July 04, 2008

VBUG Brighton: Understanding LINQ with Mike Taulty


Last night Madgex hosted an excellent VBUG Brighton session by Mike Taulty on LINQ.

Despite the sunny, warm evening we managed to pack 25 or so Microsoft technologies developers into our boardroom and listened intently whilst Mike talked and demo'd his way around LINQ, explaining some of the newer C#/VB9 language features as he went. Whilst not being the exact same slide deck, after a rummage around Mike's site I found a post about a similar sounding talk complete with presentation in PDF format.

I remain slightly dissapointed by the syntax for Linq to XML
var query = from c in data.DescendantsAndSelf("customer")
select (string)c.Attribute("id");

which as Mike said, involves a bit too much of hoping and praying (relying on no underlying changes, no strong typing etc).

However, I'm really encouraged by the idea of Linq to XSD which seems like a much better idea, tying the query to a schema rather than a document.

Fabrice has some sample code based on Linq to XML and Linq to XSD as follows, which goes to show the improvement using the XSD version

Here is a LINQ to XML query:
from item in purchaseOrder.Elements("Item")
select (double)item.Element("Price") * (int)item.Element("Quantity")


Here is the same query as above, but written using LINQ to XSD:
from item in purchaseOrder.Item
select item.Price * item.Quantity

which I think looks much more elegant and less clunky.

Labels: , ,

// posted by Jane @ 7:13 PM   save to del.icio.us

Comments:

Monty Hall problem - TSQL


Following on from this morning's post about the Monty Hall problem, and proving it in PHP I figured I'd prove it in TSQL as well.

So here is my SQL version.

To maintain consistency with my PHP version, I've made it output similar text, so the results are along the lines of:
Monty Hall Problem
This is a simple TSQL query to prove the Monty Hall problem [http://en.wikipedia.org/wiki/Monty_hall_problem]

------------------------
The Results are in:
------------------------
Out of 10000 games, the contestant was right to swap 66.94% of the time and wrong 33.06% of the time


The TSQL version is a bit more elegant with regards to working out which door to open for the contestant, as it is a simple statement of
SELECT TOP 1 @Opened = DoorNumber
FROM @Doors
WHERE DoorNumber NOT IN (@Prize, @Picked)
ORDER BY NEWID()

making the most of set theory to enable the exclusion of the @Prize door and the @Picked door as opposed to the same thing in my PHP code
$remaining = array();
/* the gameshow host opens a door which has nothing behind it, so the gameshow host knows where the prize is
but can't choose to open the door the contestant has chosen, so remove both picked and prize from the options,
this leaves either one of two doors that can be opened, so pick one randomly */
for ($i=0; $i<3; $i ++)
{
  switch($doors[$i])
  {
    case $prize:
      break;
    case $picked:
      break;
    default:
      array_push($remaining,$doors[$i]);
  }
}
$opened = $remaining[array_rand($remaining)];

which is all a bit more procedural and, at least to my mind, less elegant - but then again I like the syntax of SQL which either makes me a freak or a masochist (according to at least one colleague)

Labels: ,

// posted by Jane @ 6:08 PM   save to del.icio.us

Comments:

Monty Hall problem - PHP


This evening, after returning from an excellent VBUG: Brighton (of which more in a later post), Richard was talking about the Monty Hall problem after listening to a discussion about it on the BBC podcast In Our Time.

For those who don't know the Monty Hall problem it is this:
  • A gameshow set has 3 doors.
  • Behind one of the doors is a prize.
  • Behind the other 2 doors is nothing.
  • The contestant choses a door.
  • The gameshow hosts, knowing where the prize is, and which door the contestant has chosen, opens a door which he knows hasn't got the prize behind it and the contestant hasn't chosen.
  • The contestant is then offered the opportunity to trade their door for the one remaining door.
Should the contestant switch? The answer is yes 2/3rds of the time. See here for the reasons.

Richard and I both set about proving it in the tools we had available, I chose PHP, Richard chose Scala. And we both can successfully demonstrate that by always switching doors the contestant is more likely to win.

My PHP version is available for demo here and downloadable via a zip file (right click, save as) here.

Labels: ,

// posted by Jane @ 12:25 AM   save to del.icio.us

Comments:

Brighton Bloggers   This page is powered by Blogger. Isn't yours?   rss Sussex Digital - focusing on the Sussex digital community