Jane's Technical Stuff

Thursday, January 24, 2008

Finding some text within a SQL Server 2005 Database


I was asked what sounded like a fairly simple question today - is there a way to do a full text search of a database, without having to set up full text searching. As this was to answer an immediate query - where was some text that was being displayed coming from - and not within the context of a query to form part of an application I thought about it a while, and decided that some dynamic SQL was what was needed. I started off with the fact that I wanted to pull together a statement like:
SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%SearchTerm%'
and then run that across all columns which are text, ntext, varchar, nvarchar, char or nchar.

Getting the list of table and column names is reasonably straightforward
SELECT TABLE_NAME,COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE IN ('varchar', 'nvarchar', 'text', 'ntext', 'char', 'nchar')


I wanted to loop through all of these and get the counts, so I used a local table variable to hold the results
DECLARE @tabSearchableColumns TABLE (TableName VARCHAR(100), ColumnName VARCHAR(100), Matches int) and simply inserted the results of the above query into it (along with a NULL for the Matches column). I then set up a loop, looping based on the COUNT(*) FROM @tabSearchableColumns WHERE Matches IS NULL. Within that loop I take the top item from @tabSearchableColumns, and use the TableName and ColumnName to populate the SQL SELECT above.

Now came the interesting part, getting the variable @intDataCount from the following statement:
EXEC (SELECT @intDataCount = COUNT(*) FROM TableName WHERE ColumnName LIKE '%SearchTerm%)
as the variable @intDataCount has the scope of the execute statement, and doesn't get propogated to any outside variable with the same name.

To get around this involves the use of sp_executesql specifying the parameter that we expect to get out, i.e. EXEC sp_executesql @strSQL, N'@intDataCount INT OUTPUT', @intDataCount OUTPUT

An important point here is that @strSQL, or the string being sent in as the SQL string MUST be cast be an NVARCHAR, either by DECLARE @strSQL NVARCHAR(100) or N'SELECT ColumnName FROM TableName'. Otherwise, don't be surprised to encounter the following error:
Msg 214, Level 16, State 2, Procedure sp_executesql, Line 1
Procedure expects parameter '@statement' of type 'ntext/nchar/nvarchar'.


Finally, having got the count out into my @intDataCount variable, I can update the appropriate row in tabSearchableColumns, redo the COUNT(*) FROM @tabSearchableColumns WHERE Matches IS NULL and continue looping. The final stage is just to select the table and column data which has Matches > 0.

The script for FindTableColumnDataMatches is available for download, and as with sp_generateinsert is tested against SQL Server 2005, leave me comments if you've found this useful, or have suggestions for improvements.

Labels: , ,

// posted by Jane @ 7:59 PM   save to del.icio.us

Comments:

Wednesday, January 23, 2008

NTS: Use of grep to count instances of a certain phrase in a file


I got an email from Geni (more here) today asking how many people I had in my GEDCOM file to import. A GEDCOM file is a formatted text file with a format similar to the following (taken from the wikipedia example):
0 @I1@ INDI
1 NAME Bob /Cox/
1 SEX M
1 FAMS @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @I2@ INDI
1 NAME Joann /Para/
1 SEX F
1 FAMS @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @I3@ INDI
1 NAME Bobby Jo /Cox/
1 SEX M
1 FAMC @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
1 CHIL @I3@


To get the number of people I would want to import, I wanted to count the number of instances of "1 NAME" entries. I knew that Richard would be able to provide a nice easy answer that wouldn't involve lots of scripting. And of course he pointed me in the direction of grep and wc.

The finished product is grep '1 NAME' /Users/jane/Documents/file.GED | wc -l which searches for '1 NAME' amongst the file /Users/jane/Documents/file.GED and then pipes the output into the wc command specifying -l to get the line count.

Labels: ,

// posted by Jane @ 7:16 PM   save to del.icio.us

Comments:

Update to spu_generateinsert - use of UNION SELECT


Jon Green from 4R Systems Inc left a comment earlier proposing a suggested improvement to spu_generateinsert.

He suggested that it would be easier to scan the produced SQL if it made use of SELECT and UNION SELECT rather than producing a whole host of INSERT statements. This changes the output from:
INSERT INTO [TableName] ([ColumnA], [ColumnB]) VALUES ('A','B')
INSERT INTO [TableName] ([ColumnA], [ColumnB]) VALUES ('C','D')

and replaces it with
INSERT INTO [TableName] ([ColumnA], [ColumnB])
SELECT 'A','B'
UNION SELECT 'C','D'


This is a great suggestion and I have implemented it by the use of an additional parameter @producesingleinsert. By default this is set to 0 which produces the separate INSERT statements as it has done all along. If it is specified as 1, however, it now produces a single statement making use of SELECT and UNION SELECT. I have implemented this as an optional parameter so as not to break any reliance on this functionality.

I also made another change today which is to exclude calculated columns. This has involved joining INFORMATION_SCHEMA.COLUMNS to syscolumns via sysobjects to make use of syscolumns.IsComputed.

The final change was the addition of another optional parameter @ debug. Again this defaults to 0, but if set to 1 prints out the SELECT clause which will be used to obtain the data. This can help to debug problems with the procedure, both in terms of the internal logic, but also with regards to checking that the criteria specified is being handled correctly.

As ever this is work in progress, so please continue sending in your suggestions and I'll attempt to implement them.

Download the script from here.

Labels: , ,

// posted by Jane @ 6:52 PM   save to del.icio.us

Comments:

Tuesday, January 22, 2008

Double counting document downloads


One of the projects I'm working on counts document downloads. Yesterday I was fixing a bug relating to it double counting, but only in IE. The documents can only be pdf or docs. We have .doc, .docx and .pdf files being handled by a DocumentHandler class as set up in web.config
<httpHandlers>
<add verb="*" path="*.pdf" type="DocumentHandler"/>
<add verb="*" path="*.doc" type="DocumentHandler"/>
<add verb="*" path="*.docx" type="DocumentHandler"/>
</httpHandlers>


DocumentHandler has a ProcessRequest method. Within this method we increment our count.

When opening a .pdf by either Firefox or IE this method gets called once.
When opening a .doc or .docx with Firefox this method gets called once.
When opening a .doc or .docx with IE this method gets called twice. Firstly when the browser window is created, and then when the Open dialog is displayed.

I spent some time looking into this, especially concentrating on the differences within the context.Request object. The second time this method is hit, the UserAgent is Microsoft Office Existence Discovery. The other, more usable, difference is that Context.Request.UrlReferrer is null. This is the condition I have chosen to use to prevent the double counting. If someone stores the link to the document as a bookmark, then this won't count, but within the context of this application this is unlikely.

Labels:

// posted by Jane @ 10:33 AM   save to del.icio.us

Comments:
IE/MSOffice's behaviour here is so unfriendly. Why do you need to check the existence of the document? You just requested it! It's a good job we're not charging per-download ;p
 

Tuesday, January 15, 2008

iPod Touch Update - Continued


Ok, making progress after my previous post. I've updated my iPod Touch to be 1.1.3. I've followed the link for the January Software Upgrade, and attempted to buy it. Apparently I needed to update my version of iTunes to 7.6, so I've upgraded that. Now I'm attempting to buy the January Upgrade, but get this error displayed.

Sigh.

Labels: , ,

// posted by Jane @ 10:20 PM   save to del.icio.us

Comments:

iPod Touch update


After the keynote at MacWorld today I thought I'd see if there was an update available for my iPod Touch yet.

No idea, just an error telling me that the update server could not be contacted. Hmm, either a genuine error or too many people trying the same thing?

I'm also planning on paying the £12.99 ($20) for the January Software Upgrade to make the most of the Mail application but I want to get the upgrade to 1.1.3 first.

I've been using my iPod Touch for 3 or so weeks now and really like it. The interface is great, the screen is lovely (shame I managed to scratch it (don't believe any of the "it's unscratchable" rumours) - at least I've got some nice screen protection now and a great rubberised case). One of the applications I'm really missing is an interface to Last fm so I'm hoping that when the developer kit gets released, an authorised application will be written.

Labels: , ,

// posted by Jane @ 9:04 PM   save to del.icio.us

Comments:

More on spu_generateinsert


Dave posted a comment on the original post the other day seeking for help with resolving an error he was having running the script. The problem transpired to be with dealing with columns of data type image. On investigation, I discovered that there were quite a few column types which wouldn't work as they hadn't been catered for.

I've updated the script to work with all the data types I can get it to work with. For images, binary and varbinary columns, it attempts to insert NULL and puts a warning at the bottom of the generated script saying:
-- ** WARNING: There is an image column in your table which has not been migrated - this has been replaced with NULL. You will need to do this by hand. Images are not supported by this script at this time. or
-- ** WARNING: There is a binary or varbinary column in your table which has not been migrated - this has been replaced with NULL. You will need to do this by hand. Binary and VarBinary are not supported by this script at this time. as appropriate
This isn't great, but is at least informative and explains what has happened.

I will try and think about a way of scripting out data from columns of type image, binary and varbinary, but at least, in the meantime the script no longer errors (at least to my knowledge - if you know otherwise, then please let me know and I'll make more fixes).

Download updated spu_generateinsert.

Labels: , , ,

// posted by Jane @ 6:32 PM   save to del.icio.us

Comments:
What's the limitation on image/binary columns? Is it how to represent them in a plain-text SQL command? If so, how about representing them with e.g. base64? Presumably, it's not that simple though...!
 

Sunday, January 13, 2008

Nokia SportsTracker, 6110 and GPS anomalies


I downloaded a new version of Nokia Sports Tracker the other day and discovered that it now has an online community aspect as well. This is a feature that we originally used on SportsDo, our previous gps tracking tool. The main benefit over SportsDo is that Sports Tracker allows you to upload via your mobile phone OR by importing a GPX file directly into the application. Sports Tracker also still allows exports directly from the application in CSV, XML, GPX or Google Earth (KML) formats to either phone or SD card- a real advantage.

On today's cycle ride, it tells me I got a speed of 42.1mph (which does not match what my cycle computer says - 21.7mph). I've started hunting for tools to spot and remove anomalies from GPX or KML files, but in the meantime have come up with a workable alternative.

  1. Export the file in CSV format
  2. Also export the format in GPX format
  3. Open the CSV file and sort by the Speed column
  4. Look for any readings that look extraordinary - in my case I had lots around the 20, 21, 22 mph mark, and then 3 at 37, 38 and 41 mph - obviously incorrect. Make a note of those numbers, they'll be used in a minute
  5. Open up the GPX file in an editor
  6. Search the file for "Speed " and use the numbers recorded above
  7. For every anomaly you spot, remove the whole trkpt node (see example below)
  8. Save the GPX file
  9. Head over to Sports Tracker and upload the new file
  10. Check that the file looks about right


After working through these stages, I now have an uploaded trail that looks more accurate.

Sample trkpt
<trkpt lat="50.824980" lon="-0.148063">
<ele>91.0</ele>
<speed>0.07</speed>
<course>193.8</course>
<desc>Speed 0.2 mph Distance 0.00 mi</desc>
<time>2008-01-13T13:50:34.95</time>
<name>2</name>
</trkpt>

Labels: , ,

// posted by Jane @ 2:55 PM   save to del.icio.us

Comments:
Thanks for your posts, Jane. Very helpful.
 

Saturday, January 12, 2008

iusethis


I belong to a blogroll of Geek Girl blogs and subscribe to the RSS feed. One post that caught my eye was Véro's post about her Blog Topic Challenge. I left her a comment suggesting "How about a post about your favourite tools for helping you to get stuff done? Or your favourite gadgets? Or your regular RSS reads?".

She has responded with Favourite tools for getting things done sharing her mac apps via the site iusethis. I hadn't spotted this site before, so I've been and set up a profile so that everyone can see what tools I use as well.

Labels:

// posted by Jane @ 10:48 AM   save to del.icio.us

Comments:
iusethis is great for mac apps -- I've used it quite a bit to find good Mac apps. I hear there's a windows version in the offing, but with so many of my most-used apps being online apps now, a webapps version would be very welcome.

My profile is at http://osx.iusethis.com/user/bruceboughton :)
 

More on Route 66 - buying the North West USA this time


Yesterday I bought the North West USA map from Route 66. It took me a while to play border identification to determine which states were going to be covered but I was pretty confident that it had what I needed. For some reason, all you get when you look at a map within Route 66 is a map showing the shape of the areas to be covered, and the longitude and latitude of the top left hand corner and bottom right hand corner.



So, I went ahead and bought the map, and can confirm that for anyone else trying to find out what states are covered, that these are the ones:
  • Alaska
  • California
  • Colorado
  • Hawaii
  • Idaho
  • Iowa
  • Minnesota
  • Montana
  • Nebraska
  • Nevada
  • North Dakota
  • Oregon
  • South Dakota
  • Utah
  • Washington
  • Wyoming

Labels: ,

// posted by Jane @ 10:04 AM   save to del.icio.us

Comments:

Monday, January 07, 2008

Another update - Generate Insert statement from table for SQL Server


Bruce used my spu_generateinsert script today and identified a further issue, it didn't handle GUIDs properly. I fixed that and did a spot of further testing on one of my test databases and discovered a collation issue, so I fixed that as well whilst I was at it. The result is that it runs slower now, but specifies COLLATE database_default on each column to ensure that there are no "Cannot resolve collation conflict for column 1 in SELECT statement" errors raised.

As before, download it here and feel free to feed back any issues/improvements etc.

Labels: , ,

// posted by Jane @ 6:35 PM   save to del.icio.us

Comments:

Thursday, January 03, 2008

Update - Generate Insert statement from table for SQL Server


I had a bit of spare time today, so I've updated spu_generateInsert.sql to fix a couple of annoyances:
  • it now handles text and ntext appropriately - wrapping the data in single quotes, and escaping any single quotes as appropriate - it also displays a warning indicating that text data will be converted into varchar(8000)
  • it now handles NULL values appropriately, and so the step in my previous post on this procedure about "replace of 'NULL' with NULL" can now be safely ignored.
I've put comments into the stored procedure to indicate the changes.

I've tested this on SQL Server 2005 but I'd recommend testing it and checking it yourself before relying on it.

Download it here

Labels: , , ,

// posted by Jane @ 6:32 PM   save to del.icio.us

Comments:
Great stuff!
 
This post was linked from Reflective Perspective - Chris Alcock - Morning Brew #4
 
Thank u so much Jane!
 

Brighton Bloggers   This page is powered by Blogger. Isn't yours?   rss Sussex Digital - focusing on the Sussex digital community