Monday, July 28, 2008
Full Text Indexing - the impact of index time and query time language choice
Following on from my
More on SQL Server 2005 Full Text Index Service post the other day, I thought I'd give an example of how it works
Setup
I created a table LanguageData which consisted of 2 fields liID and sValue
CREATE TABLE [dbo].[LanguageData]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_LanguageData] PRIMARY KEY CLUSTERED
(
[ID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]I entered some sample data as follows
INSERT INTO [LanguageData](Value)
SELECT 'the' UNION
SELECT 'przed' UNION
SELECT 'jakby' where 'the' is featured in the English and Neutral language noise word files, 'przed' and 'jakby' are in the Polish language noise files. Note: You'll need to have
installed the Polish full text index to make this work.
Next enable the full text indexing on the database
sp_fulltext_database 'enable'and then create a full text catalog and an index for the table LanguageData
CREATE FULLTEXT CATALOG LanguageData AS DEFAULT
CREATE FULLTEXT INDEX ON LanguageData ([Value] LANGUAGE 1045 )
KEY INDEX [PK_LanguageData]where 1045 indicates the language Polish - retrieved from
SELECT alias, lcid FROM Sys.syslanguages
WHERE alias = 'Polish'Scenarios
Now, time to run some tests,
1) Check that all is initially correct, get everything
SELECT * FROM LanguageDatawhich returns 3 rows, as expected
2) Get everything which matches the noise word 'jakby'
SELECT * FROM LanguageData
WHERE CONTAINS(*,'jakby')returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)."
3) Get everything which matches the noise word 'jakby' specifying Polish (1045) in the CONTAINS clause
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby', language 1045 )returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)."
4) Get everything which matches the word 'jakby' specifying US English (1033) in the CONTAINS clause
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby', language 1033 )returns no rows as the word 'jakby' was stripped out at index time. No warning message is displayed though as 'jakby' is not a noise word for US English
5) Get everything which matches the word 'the'
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the')returns one row, as 'the' isn't a noise word in Polish and so wasn't stripped out at index time or at query time
6) Get everything which matches the word 'the' specifying Polish in the CONTAINS clause
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045 )returns one row, as 'the' isn't a noise word in Polish and so wasn't stripped out at index time or at query time
7) Get everything which matches the word 'the' specifying US English in the CONTAINS clause
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1033 )returns no rows as 'the' is a noise word in US English and therefore is excluded at query time. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed
Now to make it more interesting, lets add some data which combines noise words with normal words
INSERT INTO [LanguageData] (Value)
VALUES
('jakby przed the test')which includes 2 polish noise words, one english noise word and one remaining word
8) Get everything which matches the word 'jakby'
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby')returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)." is displayed
9) Get everything which matches the word 'the'
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the')returns 2 rows, both the individual 'the' entry and the new 'jakby przed the test' rows. No message is displayed.
10) Get everything which matches the word 'the' using an explicit query language of Polish
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045)returns 2 rows, both the individual 'the' entry and the new 'jakby przed the test' rows. No message is displayed.
11) Get everything which matches the word 'the' using an explicit query language of English
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1033 )returns no rows as the word 'the' was stripped at query time according to the noise words for 1033. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed
And then to make it even more interesting, lets add a new word 'jane' to the LanguageData dataset, and to the noisewords file for the Neutral language (LCID 0) which (on my machine at least) is at C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseNEU.txt
To get the full text indexing service to pick up the changes to the noise files, you need to restart the service via the Control Panel -> Administrative Tools -> Service dialog
INSERT INTO LanguageData (Value)
VALUES ('jane')
12) Get everything which matches the word 'jane' using the implicit query language (Polish)
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane')
which returns 1 row, as 'jane' isn't a polish noise word and wasn't stripped out at either index or query time
13) Get everything which matches the word 'jane' using the explicit query language English
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane', language 1033 )
which returns 1 row, as 'jane' isn't a polish noise word and so wasn't stripped out an index time, neither is it an english noise word so isn't stripped out at query time either
14) Get everything which matches the word 'jane' using the explicit query language Neutral
SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane', language 0 )
which returns 0 rows as 'jane' is a neutral noise word and so is stripped out at index time. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed
Summary
What this shows, is that when you choose a language to set your full text index up as, this impacts the words which will be stripped out of the index as anything defined as noise will be removed. This has an impact on the choice of language when different language content is being indexed as we need to be clear that what is one languages noise word, isn't another ones non-noise word. - When querying a full text index, it is possible to specify that the query you are running is for a particular language, but if you do and if the language is different to that you set the index up as, then you'll remove 2 sets of noise words from your search - both those that were set up when the index was defined, but also those based on the language specified in the query
- The noise files are defined on an instance by instance basis and so any alterations to the noise file will affect all full text indexes on an instance.
- To pick up changes to the noise files, the service needs to be restarted.
- SQL Server 2008 seems to change this and so more research will be required - it relies on STOPLISTs instead.
Labels: full text index, localisation, localization, SQLServer2005
// posted by Jane @ 4:30 PM
Comments:
Thursday, July 24, 2008
More on SQL Server 2005 Full Text Index Service
In my previous post about
How to work out which are valid full text languages on a SQL Server 2005 instance I referred to
sys.syslanguages and
sys.fulltext_languages in my queries, but didn't really say much more about them, so here goes
sys.syslanguages
In the
definition on MSDN it states
"Contains one row for each language present in the instance of SQL Server 2005. Although U.S. English is not in syslanguages, it is always available to SQL Server."
And one thing on the choice of U.S. English vs UK English. The
SQL Server Full Text Search: Language Features says
"In actual fact UK English does not refer to the Queen's English or the English used in the United Kingdom, but International English; the English that is used in all other English speaking countries other than US English."
As an English person, living in England and speaking English I find this a somewhat grating use of the phrase UK English. Bah!
sys.fulltext_languages
In the
definition on MSDN it states
"This catalog view contains one row per language available for full-text indexing/querying operations. Each row provides an unambiguous representation of the available full-text linguistic resources that are registered with Microsoft SQL Server. The name or lcid can be specified in the full-text queries and full-text index DDL."
The list in this table, doesn't match those in sys.syslanguages. These are purely the full-text-indexable languages. As I mentioned in my
previous post 6 languages can be added by following
these instructions. The line
"The name or lcid can be specified in the full-text queries and full-text index DDL."
refers to the ability to issue the following SQL:
SELECT *
FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045 ) which indicates that the locale used for querying should be 1045, which equates to Polish. I have some sample SQL to post in the next few days which demonstrates the difference between indexing and querying language choices.
In General
I've been doing quite a bit of work with trying to understand how the SQL Server 2005 full text index works, and how the language choice impacts it. My knowledge of full text indexing as a whole to this stage hasn't been great, so I've done quite a lot of background reading. Amongst the best resources I've found are:
both by
Hillary Cotter which provide a really simple, but yet pretty comprehensive introduction to the various features of indexing and querying using the Full Text Index service.
Labels: full text index, localisation, localization, SQLServer2005
// posted by Jane @ 8:02 PM
Comments:
How to work out which are valid full text languages on a SQL Server 2005 instance
Despite SQL Server 2005 supporting 33 languages (found by issuing
SELECT * FROM sys.syslanguages), not all of these are available for the full text index service. To find out which ones are run the query:
SELECT *
FROM sys.fulltext_languages On my machine, this returns the following languages:
- British English
- Chinese (Hong Kong SAR, PRC)
- Chinese (Macau SAR)
- Chinese (Singapore)
- Simplified Chinese
- Traditional Chinese
- Dutch
- English
- French
- German
- Italian
- Japanese
- Korean
- Neutral
- Spanish
- Swedish
- Thai
An additional
6 languages are supported and available for a separate install. These are :
- Danish
- Polish
- Português (Brasil)
- Portuguese
- Russian
- Turkish
To install these, follow the instructions
here.
The following languages are not supported for full text searching at all within SQL Server 2005:
- Arabic
- Bulgarian
- Croatian
- Czech
- Estonian
- Finnish
- Greek
- Hungarian
- Latvian
- Lithuanian
- Norwegian
- Romanian
- Slovak
- Slovenian
SQL Server 2008 offers
more full text language support bringing the total of available languages to 50. It would appear that Danish, Polish and Turkish remain
installable additions.
Labels: full text index, localisation, localization, SQLServer2005
// posted by Jane @ 7:36 PM
Comments:
Wednesday, July 23, 2008
Localisation, Javascript and extended character sets in Visual Studio 2005
I'm currently doing some work on looking into producing a localised version of a
Madgex job board (not dis-similar to the
work I did this time last year)and am mainly looking at the SQL Server and javascript areas whilst a colleague looks at the .NET side.
Glenn gave me a tip off that when he'd been doing something similar, he'd had problems with Visual Studio 2005 not saving his javascript files as UTF.
So, within VS2005 I created a javascript file and put 2 lines into it. They were simply:
alert ('hello world');
alert('Zarys gramatyki por¢wnawczej jezyk¢w slowianskich');I then linked this into a (very) basic HTML page
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
<title>i18n</title>
<script type="text/javascript" src="js/i18n.js" language="javascript"></script>
</head>
<body>
</body>
</html>so that on pageload 2 alert boxes are displayed, one saying 'Hello World' and the one saying 'Zarys gramatyki por¢wnawczej jezyk¢w slowianskich'.
Unfortunately what is displayed instead is:

which isn't exactly what I had in mind.
I opened the file in
Notepad++ (my text editor of choice) to take a look at the file type and it is, as I'd expected, saved as ANSI, not UTF-8 or UTF-16

I used Notepad++'s menu item Format -> Convert to UTF-8 to convert this file from ANSI into UTF-8, and then re-ran my test and all works correctly as expected. Hurrah!
I then repeated this using VS2008 and found that this is one of the fixes over VS2005.
So, the alert now correctly displays:

and when opened in Notepad++ the file is now, correctly, UTF-8.

Labels: javascript, localisation, localization, Visual Studio
// posted by Jane @ 3:43 PM
Comments:
Monday, July 21, 2008
Post-Implementation/Post-Project Reviews
In my almost 15 years of development, I've found little more beneficial than a well run Post-Implementation Review meeting. I find them a great to way to learn, improve and ensure that the next project goes more smoothly than the one before.
What is a Post-Implementation/Post-Project Review?
It is a meeting held at the end of a project at which people who have contributed to the project as a whole get an opportunity to discuss the highs and lows of the project.
My preparation normally involves thinking back over the course of the project and thinking about:
- what went well?
- what didn't go so well?
- what we could do better next time and what lessons we can learn
- How well the project was analysed and specified
- how well the project was managed
- how well the testing phase went - bugs found in testing vs UAT vs post-live
- how well the handover to support went
- how well the original time estimates reflected reality
- how well specified the infrastructure was - were the original estimates on page impressions etc valid
The most recent one I attended was run in the order of the project, so feedback was made first against the Sales process, then the Analysis process etc. This worked pretty well, but did mean that the last few stages of the project were rushed to ensure that the meeting finished on time. Alternatives include asking the "What went well?", "What didn't go well?" questions of every person in the room. This ensures that everyone gets their say but does involve preparation on behalf of every attendee (no bad thing).
Who should be there?
For me, the ideal meeting should include everyone who has been involved with the project, from start to finish - in some cases this could be a lot of people but every function should be represented - so definitely Sales, Analysts, Project Managers, Developers, Support and Systems. Every person should have an equal opportunity to speak.
When should it happen?
Usually, after the project has gone live and been handed over into a support phase. In some cases a project can last too long, and if the project is scheduled to take more than 6 months, its probably worth having 6 monthly review meetings to ensure that key learnings aren't forgotten, or that subsequent projects can learn and improve quickly. These shouldn't replace the Post Implementation Review but should supplement it.
What should happen afterwards?
The final part of the meeting should be a quick review of the "What went well?" items, and of the "Key learnings". Someone should be tasked with producing a document which should then be circulated outlining the key learnings from the project, and also the highs - the lows should be kept within the team and learnt from but not circulated - it shouldn't be a shaming exercise but should be a great motivator. Any individuals charged with process review, or implementing changes to current/ongoing projects should be informed of the key learnings to ensure these learnings are escalated and implemented as quickly as possible.
Labels: article, project
// posted by Jane @ 7:31 PM
Comments:
Friday, July 18, 2008
TSQL: Enumations and constants
Bruce sent me a link the other day to an article
T4 template for generating SQL view from C# enumeration which I found interesting from a modelling constants/enumerations in SQL viewpoint.
The example used was modelling an enumeration of ContactType which has valid items of Individual and Organisation.
The article used a view to model this, as per
CREATE VIEW enumContactType
AS
SELECT
0 AS Individual,
1 AS Organizationand then using it within a SELECT as
SELECT *
FROM Contact
WHERE Type = (SELECT Organization FROM enumContactType)(Note: in the original article Oleg used a schema called enum, but I'm just ignoring this at the moment and have thus changed the name from enum.ContactType to enumContactType)
An alternative
In my previous company, we used
Scalar-Valued Functions to mimic constants, and I guess this could be extended to enumerations. I thought I'd re-create the above example and give it a try to see how it looks and compares.
So, to model the enumeration ContactType, I've created two functions as follows:
CREATE FUNCTION enumContactTypeIndividual()
RETURNS INT
AS
BEGIN
RETURN 0
END
GO
CREATE FUNCTION enumContactTypeOrganisation()
RETURNS INT
AS
BEGIN
RETURN 1
END
GOAnd then to reproduce the SELECT query I wrote:
SELECT *
FROM Contact
WHERE Type = dbo.enumContactTypeOrganisation()The resulting data matches that used in the VIEW model and provides an alternative. I'm sure that a template could be written to produce those functions as an output as per the end part of
Oleg's article.
Performance and timings
I was interested in the relative performance of these two methods, so armed with my
timing code from
last week I checked them out. I amended the SELECT to bring back the COUNT(*) FROM Contact into a local integer variable, and ran it 1000000 times.
The results are as follows:
| Code | Description | TimeInMS |
| EnumView | Using the view | 13010 |
| EnumUDF | Using the UDF | 21450 |
showing that the view method is more performant.
I then changed the
function to make use of SCHEMABINDING. The new functions look like:
CREATE FUNCTION enumContactTypeIndividual()
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN 0
END
GO
CREATE FUNCTION enumContactTypeOrganisation()
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN 1
END
GOAnd the timings change to be:
| Code | Description | TimeInMS |
| EnumView | Using the view | 13010 |
| EnumUDF | Using the UDF | 20280 |
which do reduce the time taken for the UDF but still means that the view is faster.
For interests sake I then ran a comparison timing against the code using the literal as:
SELECT *
FROM Contact
WHERE Type = 1which resulted in
| Code | Description | TimeInMS |
| EnumLiteral | Using the literal | 12043 |
showing it is faster, but not by much, than the view.
Summary
So, what has this shown?
- Using a view is quite efficient and effective for modelling enumerations
- Using a UDF is an alternative, but is slower
- Schema binding makes UDF usage quicker
- The difference between using a VIEW and using the hard-coded literal isn't a lot in perfomance terms
Labels: SQL, SQLServer2005
// posted by Jane @ 1:24 PM
Comments:
Tuesday, July 08, 2008
INFORMATION_SCHEMA views
As I
alluded to the other day, I'm gradually weaning myself off my dependency on (the fairly ugly)
sys.objects,
sys.columns etc as a way to query the meta data about my database. Instead I'm using the SQL-92 compliant
INFORMATION_SCHEMA views.
Information schema views provide an internal, system table-independent view of the SQL Server metadata. Information schema views enable applications to work correctly although significant changes have been made to the underlying system tables. The information schema views included in SQL Server 2005 comply with the SQL-92 standard definition for the INFORMATION_SCHEMA.
So now, when I'm writing database upgrade scripts and attempting to write
defensive SQL (which is my usual position these days, regardless of whether I think the script will be run more than once - lets just say I've learnt from making such assumptions) I usually wrap
ALTER TABLE statements within
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name = 'MyTable' AND Column_Name = 'MyNewColumn'),
CREATE TABLE statements within
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE Table_Name = 'MyTable') etc
The main area that I have to revert to the sys views for is indexes, and finding out what columns are included in which index, which is the uglier, but no-less-effective
SELECT
OBJECT_NAME (i.object_id) AS Tablename,
i.name AS IndexName,
c.name AS ColumnName,
CASE ic.is_descending_key
WHEN 1 THEN 'DESC'
ELSE 'ASC'
END as ColumnSort
FROM sys.indexes i
INNER JOIN sys.index_columns ic
ON i.object_id = ic.object_id
AND i.index_id = ic.index_id
INNER JOIN sys.columns c
ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
INNER JOIN sys.objects o
on c.object_id = o.object_id
WHERE o.type = 'U'
ORDER BY TableName, indexName, ic.key_ordinalMSDN has an interesting article
Querying the SQL Server System Catalog FAQ which has examples for finding out (using the various
object catalog views) many different areas of meta data across a SQL Server 2005 database and is worth using as a starting point.
Labels: SQL, SQLServer2005
// posted by Jane @ 2:20 PM
Comments:
Monday, July 07, 2008
TSQL - Timings
A while ago I blogged about
how to get the date element of a datetime column in TSQL. In that post I said
I would probably have done it via a CONVERT/CAST operation, converting to a VARCHAR and then back to a DATETIME, but this is a much more efficient method.
but I didn't prove it at the time. I gave it some more thought and wanted to know what the differences were, so I wrote some
timings code.
This script creates one table
Timings with columns of
Code,
Description,
ActionTime and
IsComplete. It has a combined primary key of Code and IsComplete. Code must be unique - and can be a string of up to 10 characters long to uniquely identify the action being timed. IsComplete is used to differentiate between the start time and end time of the process being monitored.
The script also creates 3 stored procedures:
- up_RecordStart which takes 2 parameters - the unique code and optional description. This is used to record the start of the activity being monitored.
- up_RecordEnd which takes just 1 parameter - the code - should match the code used in up_RecordStart. This is used to record the end of the activity being monitored.
- up_GetTimings which again takes just 1 parameter - the code to return the timings from. It then returns the Code, Description and the length of time the action took in ms.
I wrote some script to then use these objects to test the assertion I made that FLOOR and combinations of converting DATETIME to FLOAT etc would be more efficient than using either CAST or CONVERT to VARCHAR(12) and back again to a DATETIME.
------------------------------
-- Clean up before we start --
------------------------------
DELETE FROM Timings
WHERE Code IN ('FLR','CONVERT','CAST')
GO
---------------------
-- Try using Floor --
---------------------
EXEC up_RecordStart @Code='FLR', @Description='SELECT CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))'
GO
DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0
WHILE @i < 1000000 -- try the next statement for 1000000 times - this should be enough to see some differences
BEGIN
SET @floorDate = CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))
SET @i = @i + 1
END
PRINT @floorDate
GO
EXEC up_RecordEnd @Code='FLR'
GO
-----------------------
-- Try using convert --
-----------------------
EXEC up_RecordStart @Code='CONVERT', @Description='SELECT CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))'
GO
DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0
WHILE @i < 1000000
BEGIN
SET @floorDate = CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))
SET @i = @i + 1
END
PRINT @floorDate
GO
EXEC up_RecordEnd @Code='CONVERT'
GO
--------------------
-- Try using Cast --
--------------------
EXEC up_RecordStart @Code='CAST', @Description='SELECT CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)'
GO
DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0
WHILE @i < 1000000
BEGIN
SET @floorDate = CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)
SET @i = @i + 1
END
PRINT @floorDate
GO
EXEC up_RecordEnd @Code='CAST'
GO
-------------------------
-- Now get the timings --
-------------------------
EXEC up_GetTimings 'FLR'
GO
EXEC up_GetTimings 'CONVERT'
GO
EXEC up_GetTimings 'CAST'
GO This results in the following data being returned:
| Code | Description | TimeInMS |
| FLR | SELECT CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE()))) | 1313 |
| CONVERT | SELECT CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE())) | 3236 |
| CAST | SELECT CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME) | 3203 |
which shows that the method using FLOOR
is more efficient, and that there isn't a lot to chose between CONVERT and CAST
Labels: SQL, SQLServer2005
// posted by Jane @ 7:11 PM
Comments:
Friday, July 04, 2008
VBUG Brighton: Understanding LINQ with Mike Taulty
Last night
Madgex hosted an excellent
VBUG Brighton session by
Mike Taulty on
LINQ.
Despite the sunny, warm evening we managed to pack 25 or so Microsoft technologies developers into our boardroom and
listened intently whilst Mike talked and demo'd his way around LINQ, explaining some of the newer C#/VB9 language features as he went. Whilst not being the exact same slide deck, after a rummage around Mike's site I found a
post about a similar sounding talk complete with presentation in
PDF format.
I remain slightly dissapointed by the syntax for Linq to XML
var query = from c in data.DescendantsAndSelf("customer")
select (string)c.Attribute("id"); which as Mike said, involves a bit too much of hoping and praying (relying on no underlying changes, no strong typing etc).
However, I'm really encouraged by the idea of
Linq to XSD which seems like a much better idea, tying the query to a schema rather than a document.
Fabrice has
some sample code based on Linq to XML and Linq to XSD as follows, which goes to show the improvement using the XSD version
Here is a LINQ to XML query:
from item in purchaseOrder.Elements("Item")
select (double)item.Element("Price") * (int)item.Element("Quantity")Here is the same query as above, but written using LINQ to XSD:
from item in purchaseOrder.Item
select item.Price * item.Quantitywhich I think looks much more elegant and less clunky.
Labels: Brighton, event, vbug
// posted by Jane @ 7:13 PM
Comments:
Monty Hall problem - TSQL
Following on from
this morning's post about the Monty Hall problem, and proving it in PHP I figured I'd prove it in TSQL as well.
So
here is my SQL version.
To maintain consistency with
my PHP version, I've made it output similar text, so the results are along the lines of:
Monty Hall Problem
This is a simple TSQL query to prove the Monty Hall problem [http://en.wikipedia.org/wiki/Monty_hall_problem]
------------------------
The Results are in:
------------------------
Out of 10000 games, the contestant was right to swap 66.94% of the time and wrong 33.06% of the timeThe TSQL version is a bit more elegant with regards to working out which door to open for the contestant, as it is a simple statement of
SELECT TOP 1 @Opened = DoorNumber
FROM @Doors
WHERE DoorNumber NOT IN (@Prize, @Picked)
ORDER BY NEWID()making the most of set theory to enable the exclusion of the @Prize door and the @Picked door as opposed to the same thing in my PHP code
$remaining = array();
/* the gameshow host opens a door which has nothing behind it, so the gameshow host knows where the prize is
but can't choose to open the door the contestant has chosen, so remove both picked and prize from the options,
this leaves either one of two doors that can be opened, so pick one randomly */
for ($i=0; $i<3; $i ++)
{
switch($doors[$i])
{
case $prize:
break;
case $picked:
break;
default:
array_push($remaining,$doors[$i]);
}
}
$opened = $remaining[array_rand($remaining)];which is all a bit more procedural and, at least to my mind, less elegant - but then again I like the syntax of SQL which either makes me a freak or a masochist (according to at least one colleague)
Labels: Monty Hall problem, SQL
// posted by Jane @ 6:08 PM
Comments:
Monty Hall problem - PHP
This evening, after returning from an excellent VBUG: Brighton (of which more in a later post), Richard was talking about the
Monty Hall problem after listening to a discussion about it on the BBC podcast
In Our Time.
For those who don't know the Monty Hall problem it is this:
- A gameshow set has 3 doors.
- Behind one of the doors is a prize.
- Behind the other 2 doors is nothing.
- The contestant choses a door.
- The gameshow hosts, knowing where the prize is, and which door the contestant has chosen, opens a door which he knows hasn't got the prize behind it and the contestant hasn't chosen.
- The contestant is then offered the opportunity to trade their door for the one remaining door.
Should the contestant switch? The answer is yes 2/3rds of the time. See
here for the reasons.
Richard and I both set about proving it in the tools we had available, I chose PHP, Richard chose Scala. And we both can successfully demonstrate that by always switching doors the contestant is more likely to win.
My PHP version is available for demo
here and downloadable via a zip file (right click, save as)
here.
Labels: Monty Hall problem, php
// posted by Jane @ 12:25 AM
Comments: