Friday, July 1, 2011

Unintended Stateful Junction Changes

Hey Dale, I created a Stateful Junction by commandline on our WebSEAL instances, but the UUID on the junction for one instance is not what I set it to.  What's up? -- Joe

a) Your WebSEAL has been hacked;
b) The Wopper has figured out your launch code;
c) IBM fixed the code as it was being shipped;
d) None of the above; or
e) All of the above.

As much fun as e) would be, I'm betting on c).  I've seen an application make it through Dev and QA and into Pstage before someone from another project commented that the login page wasn't formatted properly.  Really?  Does anyone test without automation (which would not see the missing 'pretty' elements).

Late changes tend to be documented in the release notes or left out.  This time it was left out. 

There is an undocumented feature of TAMeb stateful junctions.  If a stateful junction is amended through the Web Portal Manager (WPM), the UUID will change - without warning or an advisory notice - and no longer be synchronized with the matching junctions on other WebSEAL instances. 

All junctions have a UUID (example: b06fb684-971f-11e0-a7e8-c0003c008803).  Stateful junctions are created so the UUID is synchronized across the duplicate junctions on several WebSEAL instances.  With stateful junctions, a user will be sent to the same backend server regardless of which WebSEAL instance the user is routed through.

A workaround is to paste the correct UUID into the ‘Stateful UUID’ field and click the ‘Apply’ button. 

The ‘Stateful UUID’ field is also not documented.  Its function is further blurred, as if undocumented was not enough, by making the field immediately below and a third shorter than the Server UUID field.  This really looks like a rushed fix to a last minute discovery of a problem.

Sunday, March 13, 2011

Validation of WebSEAL instance restarts

Never, never, never rely on the 'pdweb status' command to confirm that a WebSEAL instance has restarted correctly.  A status of 'Yes' does not mean the instance is functional.  Even a successful authentication and access test can be followed 15 min. later by customer calls to your help desk.
After having had problems and pouring over logs, I have found the following procedure to be reliable.
After a WebSEAL instance restart, validate performance of the instance as follows:
1.  On the WebSEAL server and logged in as or sudo'd to ivmgr or an equivalent account, run ‘pdweb status’ and confirm that the WebSEAL instances are started.

2.  Tail the instance logs to confirm that traffic is flowing through the WebSEAL instance.  A sample command is

tail –f /var/ibm/tivoli/common/DPW/logs/<instance>/log/combined.log

Check the log for each instance that was restarted.

3.  Confirm that the instance has registered properly with SMS.  (This is the 15 min. part mentioned above.  IBM was not able to explain the delay.)  Review the following log on the SMS servers for events indicating  that a WebSEAL instance is or is not properly registered with SMS or that there is an ObjectGrid problem.

SMS server sample location:  /var/logs/<path>/SMSServer<1 or 2>/SystemOut.log

Sample registration with SMS:

001489cb DSess         I ClientStore addClientReplicaSet() CTGSM0313I   The previous instance of client, <instance>-webseald-<servername>, has been replaced. The previous instance ID was 2d009f74-3435-11e0-8c64-001125c5fec9, and the new instance ID is 844c4f58-2378-11e0-ac3e-001125c5fec9.

Sample error that I have seen start showing up 12 minutes after the restart:

0014922b DSess         E ClientStore storeNewClient() CTGSM0301E   The new instance, 2d009f74-3435-11e0-8c64-001125c5fec9, of the client, <instance>-webseald-<servername>, could not be stored.

Note that the new instance ID is the same as the old instance ID in the first event.

If ObjectGrid is having a problem, ObjectGrid errors will be in the log and the WebSEAL instances will not start.

4.  Confirm that the WebSEAL instance agrees that it is registered with SMS by reviewing the instance msg log on the WebSEAL server
/var/ibm/tivoli/common/DPW/logs/msg__webseald-<instance>.log

Sample error:
0x38A0A135 webseald ERROR wds client AMWSMSSOAPCall.cpp 104 0x00019697
DPWDS0309E   An error was returned from the SOAP server in cluster dsess when calling the getSession interface: CTGSI0302W   The client is not registered with the session management server. (pd / wsi) (code: 0x38c5812e).

Note that IBM has previously confirmed that the following is an expected and harmless error:
0x38CF0131 webseald WARNING wwa server WsTcpListener.cpp 397 0x00004647
DPWWA0305E   The 'pd_tcp_write' routine failed for 'WsTcpConnector::write', errno = -1

5.    Log into the Policy Server and observe CPU usage.   It might spike to high levels if the ssl session cache is full.  (IBM does not have a command or monitor for the portion of the ssl session cache in use.  Therefore, it is advised by IBM that the ssl session timeout and ssl session cache size be tuned for your environment.)

6.  On the Policy Server, review the msg__pdmgrd_utf8.log.  A few of the following event is expected in normal operation.  A stream of them indicates the ssl session cache may be full.
Sample event:
0x106520EB pdmgrd NOTICE bas mts e:\am610\src\mts\mtsserver.cpp 1886 0x000008dc HPDBA0235I   The server lost the client's authentication, probably because of session expiration.

The ssl session cache can be cleared by restarting the pdmgrd process or, on Windows, the Access Manager Policy Server service.    If the ssl session cache is full, new connections must wait for connections to time out.  The default timeout is 7200 seconds (2 hours).  IBM support recommends tuning this parameter beginning with a value of 1800 seconds (30 min.).  The SSL Session Cache size can also be increased from the default value of 1024 to as high as 4095 (larger numbers are not recognized and 1024 will then be used).

Recommended: Read


I don't provide a reading list. If I did, it would be long and there is a large chance your eyes would glaze over. The most valuable reading lesson I received was from Mr. Baxter, my senior high school English teacher --> Always carry a book in your back pocket and read it when you have to wait.

Following a version of that rule when he was 12, my son passed both the hardware and operating system A+ exams just after his 13th birthday. The people at the New Horizons testing facility were awed by his success at that age. That is another lesson: don't limit another person, not even a child. What a person can achieve is only limited by the barriers we build.

I read -- a lot -- and keep the best books.

The photo does not include the .pdf files I have copied to my Kindle. At a fraction of the cost of an iPad, a Kindle is a great easy to carry tool for reading when I have to wait. (That last sentence sounds like an ad even though I toned it down. You should have seen the original. And no, Amazon is not paying me.)

Sunday, March 6, 2011

WebSEAL Traces

Customers have only one complaint when they click 'submit' on the login page and the application request fails:  they can't log in.  That is their perception, but is it reality?

One of the troubleshooting options available to a WebSEAL administrator is the trace command.  Specifically discussed in this blog entry are debug and snoop traces.

Detailed information of what WebSEAL sees from both the browser and the backend servers can be obtained by running trace commands.  A debug trace contains header information; a snoop trace includes full information, but often is not necessary and is more difficult to review.  I usually run them simultaneously so I will have already obtained the more detailed snoop trace, if it is needed.
Traces can be started in pdadmin with the following sample commands.  Note the path location for the .txt file produced:
server task default-webseald-<servername>.company.com trace set pdweb.debug 9 file path=/var/pdweb/www-default/traces/pdweb.debug_LoginProblem.txt,rollover_size=100000000
server task default-webseald-<servername>.company.com trace set pdweb.snoop 9 file path=/var/pdweb/www-default/traces/pdweb.snoop_LoginProblem.txt,rollover_size=100000000
Turn off tracing with the following sample commands:
server task default-webseald-<servername>.company.com trace set pdweb.snoop 0
server task default-webseald-<servername>.company.com trace set pdweb.debug 0
9 is the highest level of tracing; 0 turns tracing off.

As an example, a debug trace may show you that the customer has, in fact, logged in successfully and received the usual 302 redirect with a set-cookie command for the WebSEAL cookie, which cookie the browser includes in the next GET request.  If the backend server is down, WebSEAL will immediately send a 500 error to the browser, but a 500 error could also originate with a backend server.  You will be able to see this in the trace.  Of course, many other possibilities for the 'can't log in' issue may exist, such as a backend server operation that takes longer than the WebSEAL inactivity timeout setting.

As a WebSEAL administrator, the trace command is definitely your friend.

Junction Configuration Basics

The following discussion is a quick and dirty introduction to TAMeb junction configuration.  There are several chapters covering standard, transparent path, and virtualhost junctions in the WebSEAL Administration Guide.  In addition, there are a variety of switches not in the samples below, such as -s and -u for stateful junctions, and behaviors controlled via configuration files, such as inactivity and connection timeouts.  I recommend a thorough reading of the product documentation if you will have serious involvement with TAMeb.  Coming in at over 1100 pages for the WebSEAL Administration Guide alone, acceptance of my recommendation is not for the faint-of-heart.
Sample commands to create and manipulate server junctions follow.
server task <instance>-webseald-<servername>.company.com create -t tcp -A -F /opt/pdweb/etc/<name>.ltpa -Z <password> -x -h <host servername>.company.com  -p 80 -c iv-creds,iv-groups /sampleapp/secure –f

‘<instance>’ is the WebSEAL instance.  Unless changed, the first instance created is 'default'.
‘<servername>’ is the WebSEAL server.
'-t' is the type of junction (tcp, ssl)
‘-A’ and ‘-F’ are used for the LTPA key and password.
‘-x’ indicates this is a transparent path junction.
‘-h’ is for the host / target server.
‘-p’ is for the port.
‘-c iv-creds,iv-groups’ sets the TAMeb headers to be sent to the backend servers for fine-grained access decisions handled by the applications.
‘/<sampleapp>/secure’ is the junction, which for a transparent path junction must match the application server context.
‘-f’ is to force creation of the junction even if it already exists.

To add additional servers to a junction, follow the pattern of this command:

server task <instance>-webseald-<servername>.company.com add -h <host servername>.company.com  /sampleapp/secure

Creation of a virtualhost junction is similar:
server task <instance>-webseald-<servername>.bcbsnc.com virtualhost create -t tcp -h <host servername>.company.com -p 80 -z default -c iv-creds,iv-groups -v support.ibm.com vhost-ibm –f
Also similar is adding additional servers to the virtualhost junction:
server task <instance>-webseald-<servername>.company.com virtualhost add -h <host servername>.company.com vhost-ibm
Junction information can be viewed with ‘server task … show …’ and ‘object show …’ commands.
server task <instance>-webseald-<servername>.company.com show /sampleapp

object show /WebSEAL/<servername>.company.com-default/sampleapp

Saturday, March 5, 2011

Authentication and Authorization Process Flow

The process followed for authentication and authorization using Tivoli Access Manager for e-business is scattered throughout the product documentation.  Pulling this together and plugging the pieces into the correct location was no minor task.  IBM seems to have a preference for providing a six-step overview process.  Having it available greatly speeds resolution of issues and, much more often, validation of TAMeb before locating the issue in more likely locations such as the webserver or application server. 

How to Configure Access Control Lists (ACLs)

Sample commands to create and manipulate an Access Control List (ACL) follow.  Note that ACLs are inherited, but not cumulatively.  An ACL placed on the root of a context will be overridden by an ACL placed at a lower level.
acl create acl_name
Example:
pdadmin sec_master> acl create <application_name>_Secure
acl attach object_name acl_name
Examples:
pdadmin sec_master> acl attach /WebSEAL/<servername>.bcbsnc.com-default/<context>/secure <application_name>_Secure
pdadmin sec_master> acl detach /WebSEAL/<servername>.bcbsnc.com-default/<application_name>/secure
acl find acl_name
Example:
pdadmin sec_master> acl find <application_name>_Secure
/WebSEAL/<servername>.bcbsnc.com-default/<application_name>/secure
pdadmin sec_master> acl list
default-webseal
default-root
<application_name>_Secure
default-replica
default-management
acl modify acl_name delete attribute attribute_name [attribute_value]
acl modify acl_name description description
acl modify acl_name remove any-other
acl modify acl_name remove group group_name
acl modify acl_name remove unauthenticated
acl modify acl_name remove user user_name
For unauthenticated access (public), any-other and unauthenticated should have Trx permissions; for group secured access any-other and unauthenticated should have the T permission and the group should have Trx permissions.  Authenticated, but group membership irrelevant would have any-other with Trx permissions and unauthenticated with the T permission.
acl modify acl_name set any-other [permissions]
acl modify acl_name set attribute attribute_name attribute_value
acl modify acl_name set description description
Groups are usually associated to an ACL to provide access to group members.  Usual permissions are Trx (transit, read, execute).  ‘m’ allows Put.
acl modify acl_name set group group_name [permissions]
For unauthenticated access (public), unauthenticated should have Trx permissions; for secured access unauthenticated should have the T permission.
acl modify acl_name set unauthenticated [permissions]
Users can be added to the ACL directly, but this is discouraged.  Best practice is to use a group.
acl modify acl_name set user user_name [permissions]