home

Archive for the 'Computers' Category

Sendmail Wrapper

Friday, December 16th, 2005

We had some spam problems last week, one of them caused by a form that wasn’t properly escaped. While that problem was fixed, the real problem was that it was hard to figure out what script had the issue.

To solve this, I wrote a sendmail wrapper for use by PHP (though really it could be used by anything) that logs the message along with the date, a message id (also inserted in th e headers) and the current directory (which gives the location of the original script).

It also extracts out the domain name from the current directory, but this is server specific so you’ll need to change the pattern to match your file system.

Eventually I’d like to include support to check for a maximum number of recipients, and maybe some other heuristics to check for spam.

You can get the script at:
http://gregmaclellan.com/php/sendmail.phps 

You should save this file as /usr/local/sbin/sendmail_logged.

The reporting script is at:
http://gregmaclellan.com/php/mailreporting.sh

There are instructions in this file for how to add it to cron.

Let me know if you have any comments, suggestions, do any improvements, or find any bugs.

 

Seach: Time vs Relvancy

Monday, December 5th, 2005

Something that seems to be missing from searches is time. Search engines base their results on relevancy, which makes finding newer methods of doing something difficult.

For example, I will search for how to do something in linux, like configuring a RAID array. There is a ton of information on this, but the most relevant hits you get are about configuring raidtools. Mdadm has replaced raidtools as the tool of choice, but since raidtools has been around so long, and there are so many old pages that link to it, it scores the highest. I’m sure there’s millions of other examples of this on other topics too.

Google has an advanced search where you can specify pages modified in the last x months, but it doesn’t really help much. One of the pages returned when I limit the search to the last 3 months has a revision history typed out at the top of it, and it shows the last update in 2003. MSN has a "Search builder" function, where (among other options) you can specify how important it is to be recently updated, popular, and a relevant match. This still doesn’t bring up really relevant results. Yahoo is the only one of the three that actually does return an mdadm-related result as #1 when you search within the last 3 months. (I should point out that both Google and Yahoo return this same page as #5 and #6, respectively, but my point here is that someone who knows nothing about it is probably going to pick #1 or #2, and implement raid with the older raidtools method).

MSN’s search-tuning functions

All three have a news search engine that returns date-based results for recent news items, but this is pretty limited in that it’s only searching news sites. Linux software RAID developments aren’t exactly breaking news on CNN, so the news search isn’t exactly the place to find this stuff.

I think one problem with the date-based results as they are now is the way they are likely determining the date of the page. If they are using the last modified header (part of HTTP specifications), then that would explain a lot of the problems. It’s quite possible that the last-modified header is changed due to content that is dynamically created, content that is moved with ftp to another server, copying without preserving date/time or even a misconfigured webserver. What they should be doing is comparing the contents of the page to the contents the last time they indexed. It wouldn’t be totally accurate (depending on how often they index the page), but it would at least give a real representation of when the contents were changed. They would have to ignore dynamic things like ads and current date displays (via pattern matching) but it wouldn’t be that complicated.

Hopefully it’s just a matter of time…

On the topic of search engines, I came across a few new Google features while researching for this entry that I didn’t know about:

SOAP: Gives it a REST

Thursday, November 17th, 2005

I’ve noticed in the last little while that there seems to be a trend happening with web services: people think SOAP is too complex. I keep coming across articles and comments talking about how web services are just over engineered. I have to say, I totally agree.

Here’s some excripts from a c|net article:

A debate is raging over whether the number of specifications based on Extensible Markup Language (XML), defining everything from how to add security to where to send data, has mushroomed out of control.

Tim Bray, co-inventor of XML and director of Web technologies at Sun Microsystems, said recently that Web services standards have become “bloated, opaque and insanely complex.”

This isn’t something that’s new. An onlamp article from 2003 talks about how people use REST over SOAP:

While SOAP gets all the press, there are signs REST is the Web service that people actually use. Since Amazon.com has both SOAP and REST APIs, they’re a great way to measure usage trends. Sure enough, at OSCon, Jeff Barr, Amazon.com’s Web Services Evangelist, revealed that Amazon handles more REST than SOAP requests.

Personally, I’ve always gone with so-called REST interfaces if I have a choice. I’ve in fact been using REST for many years, without realizing it was called REST (the term, which stands for Representational State Transfer, was coined by Roy Fielding in his doctoral dissertation in 2000).

Put simply, SOAP just requires so much setup and overhead to do what should be a simple task. As a programmer, I like to actually make working code, and I hate writing tons of ‘helper’ code that essentially doesn’t actually do anything. That’s what I feel like I’m doing when working with SOAP and writing WSDL schemas and all the extra junk. There are APIs to make things eaiser, but in the end it’s still an over-engineered protocol.

REST keeps it simple, with really no formal definition. It’s more a method than anything else. Go to a URL, get a bunch of data back. Talk about simple.

Advocates of REST push that you should only even use normal HTTP GET requests for retreiving data (as opposed to POSTing a complex XML query like SOAP does). This makes sense, as one of the ideas behind the web to begin with is that any piece of information can be obtained with a URI. This makes REST rediculsly simple, and for this reason some people hate it, others love it. It definately makes thing easy to debug, as you can test responses in your browser.

Most REST services return a response in XML, but they don’t have to. If all you’re trying to get is one piece of information, it’s just as easy to just return that information in the body, with no tags at all. The querying application doesn’t even have to parse anything. Obviously this has it’s downfalls (like not being easy to returning error conditions or multiple values), which is probably the reason reponses are usually XML. Of course it helps that there are XML parses available for virtually every language, including JavaScript. In fact, REST is basically what drives most AJAX applications.

Of course, SOAP is XML and therefore human-readable as well, but let’s look at the diferences with a small example.

SOAP example

This is an example SOAP request for getting the price of a product.

Request:

POST /SOAP HTTP/1.1 Host: www.ecommercesite.com Content-Type: text/xml; charset="utf-8" Content-Length: nnnn SOAPAction: "Some-URI" <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <m:GetMarketPrice xmlns:m="Some-URI"> <symbol>PART304285</symbol> </m:GetMarketPrice> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Response:

HTTP/1.1 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: nnnn <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/> <SOAP-ENV:Body> <m:GetMarketPriceResponse xmlns:m="Some-URI"> <Price>50.25</Price> </m:GetMarketPriceResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

REST example

Just a simple GET query:


GET /REST/getprice?symbol=PART304285 HTTP/1.1
Host: www.ecommercesite.com
Content-Length: nnnn


and the response:


HTTP/1.1 200 OK
Content-Type: text/xml; charset=”utf-8″
Content-Length: nnnn

<price>50.25</price>

Considering they both do the same thing.. which one looks simpler?

Of course, for a REST service to be useful, just like any other API or tool, it needs to be well documented. This means documenting the request parameters, all the things it can do, as well as all the output formats. SOAP has WSDL that I guess provides this information (to a point, and assuming you know enough to make sense of it all), but I don’t think that feature alone is worth all the other baggage SOAP carries.

To be honest, I also only have limited experience using SOAP (for the reasons that I’ve outlined in this entire post), so I’m quite willing to hear arguments to convince me why SOAP could be more beneficial than REST. At this point however, I can only conclude that SOAP is just over-engineered. Why bother coding all that extra junk when you can just REST? :)

Diag| Memory: Current usage: 26605 KB
Diag| Memory: Peak usage: 26725 KB