In working with a customer who is adding a new queue sharing group to support increased availability requirements there has been a great deal of discussion on how to size the structures. There is guidance available on the CFSizer tool (please see here: https://www-947.ibm.com/systems/support/z/cfsizer/mqseries/mqapplhelp.html ), but that information is dated – and may lead to incorrect conclusions at times. Continue reading “Sizing Structures for Queue Sharing Groups”
I think the ‘cloud’ might be dead. At least as far as IT goes. A new term will be found for the same thing, as the cloud is for the most part outsourcing.
You see I saw a local copy and print company advertising “hosting cloud services ” today. So when it gets to that level, when my local grocery store or pharmacy offers these services, it is no longer a fashionable term for the elites. Like a dress in last years color. You will still hear about it for a while of course, just like you see last years colors for a while.
And a new fashionable term that each photogenic self-promoting twitter nerd must be found to induce the masses to change course.
Not that I am old and cynical or anything.
Can a private queue be in two places?
Simply no, but it can look like it. In reviewing one customer’s data for performance problems I found the following (names are changed) :
As you can see ELKINSC.QUEUE1 shows up as being in both pageset 0/ bufferpool 0 and pageset 4/ bufferpool 3. When I asked the development lab some years ago about this, the explanation given is that this is because when the transaction started the no access to the bufferpool and pageset for that queue had been done. The physical location of the queue was not yet established, so these two fields were in the initial state. This can happen on first access or if a queue is consistently able to take advantage of put to a waiting getter. As the latch type (‘19’) (see Figure 2) indicates a buffer wait, and other transactions show non-zero values for the pageset and bufferpool, then the transaction reporting the zeros was very likely the first access to the queue. There was some discussion on fixing this in the SMF buffer, but as no customer has complained, it is not a priority for either level 3 support or development.
Another interesting thing about this data, one of the transactions that showed the long latch crossed SMF boundaries. The longest latch time and type are the same across many SMF accounting records (see Figure 1), and all these have the same Correlation – or task identifier information. So where it may look like there are several instances of long latches for transaction using ELKINS.QUEUE1, in reality there are only two instances.
After checking with the customer, just to be sure that they did not move the queue, the queue is physically located on pageset 4 bufferpool 3 and has not moved. As nearly all the queues used in the environment are using that same pageset and bufferpool combination, they have probably used the typical DEFAULT storage class or DEFINE LIKE to copy an existing queue definition. Because they are expecting growth in the use of these transactions and queues, I am giving two recommendations:
1) Add pages to bufferpool 3, which from the MQ statistics is under a bit of stress
2) Move some of the more active queues to other storage classes to reduce the contention for the underlying resources.
Contention for pages in the bufferpool, with all the transactions in the environment using queues in the same storage class, can slow down work even though they may not be showing the traditional signs of buffer stress.
This table is the result of a query against data processed by MQSMFCSV and loaded into DB2 as shown:
SELECT Q.QMgr, Correlation, Base_Name, Pageset_ID, BufferPool_ID, Get_Pageset_Count, Get_Messages_Skipped_Count, Get_Messages_Expired_Count, Put_Pageset_Access_Count, Put1_Pageset_Access_Count, Correl, Longest_Latch, Max_Latch_Wait_Time_us, Max_Latch_Wait_ID, Start_Time_Date, Start_Time_Time
FROM MQSMF.WQ Q , MQSMF.WTAS WTAS
(Q.QMgr = ”QML1” AND Correlation = Correl AND
Longest_Latch > ”00000000000000000” AND
Max_Latch_Wait_Time_us > 25000 )
Note that the latch wait time used is 25,000 microseconds. This value is arbitrary, I have chosen it because it seems a reasonable cut off point for workload that has semi-formal service level agreements of a response in a second. This value may be too high for some applications and may over-report issues for others. Knowing the application expectations and workload is key to setting this value appropriately for your environment.
When an MQ z/OS queue manager does a log switch a lot of activity can take place. Everything from the expected internal checkpoint, to shunting active units of recovery forward, to flushing messages that have been in bufferpools for 3 checkpoints to the pagesets – incurring real I/O. Log switches that take place during critical periods cannot always be avoided, the combination of message volume and size may mean that logs have to switch, but there are times when the frequency can be influenced even if it cannot be completely controlled.
As an example, I was looking at some customer SMF data and JES logs to help with a performance problem that is plaguing them. This customer has a very predictable daily peak, and they are experiencing some significant slowdowns on a regular basis. From the JES log it was easy to see that during this peak period they are doing a number of log switches, and in their case some of the switches can be avoided.
A quick search for the CSQJ002I message (‘END OF ACTIVE LOG DATA SET’) showed that the full active log that was in use for one of the time periods we were examining had a start RBA of 3DD6A998F000 and an end RBA of 3DD6A998EFFF. Calculating the number of log records that can be written is very simple the formula is (EndRBA+1)-Start RBA/1000), and I typically convert it to decimal. What I found for this customer was that the log file would only hold 144,000 4K records – which is not very much. I checked other logs and found the same thing. The first suggestion I made to help with performance was to increase the log file size to something more reasonable, like 3Gb which will hold around 6 times more records than their current allocation.
The second suggestion was to issue an ‘ARCHIVE LOG’ command about 15-30 minutes prior to the peak. At their current volume and log utilization, the combination of increasing the log file size and issuing this command shortly before the peak time should help them avoid a log switch during the time they need maximum speed and throughput.
While reviewing a customer’s MQ for z/OS SMF statistics data, I noticed that the LOG_CI count did not always match the individual log copy I/O counts. When I asked level 3 about this discrepancy, it turns out it is a difference in timing. The LOG_IO (QJSTCIWR) is updated before the write is started, the individual I/O counts are updated after the I/O has completed. So when the system is very active with persistent messages the counts may be off a little – typically just 1-2 depending on whether single or dual logging is used.
I do a fair number of MQ for z/OS health checks, usually as part of a parallel sysplex health check, and often get a lot of push back due to the perceived cost of turning on the MQ SMF data. What we typically as for is:
- One week or more of the MQ Statistics records (SMF 115, all classes).
- The MQ task accounting and channel accounting (SMF 116, all classes) for 2 peak SMF intervals (on-line and batch).
- The JES logs for the MSTR and CHIN address spaces.
- Some display of data, if the JES logs do not include the queue manager start up.
The complaint is often around the perceived cost of gathering this data. As I have had to address these objections a couple of times in the past week, I am gathering my thoughts here around some of the issues brought up. I hope this helps with the general understanding of why this data is important, the costs may not be as high as you think, and some mitigation that may be possible – especially when this is a stand alone MQ for z/OS Health check.
- Objection # 1 – We do not want to change the zPARMS for MQ as that requires an outage.
To collect the MQ SMF data there is no requirement for a change in zParms. All the collection can be turned on dynamically. The +cpf START TRACE commands work for both statistics and accounting data.
- Objection # 2 – “The cost of collecting that MQ Statistics is too high.”The MQ Statistics are very lightweight, cutting between 2-4 records per SMF interval per queue manager. If they are not on all the time, they should be. If they are not being reviewed at least monthly, then learning how to gather and review the data should be a high priority at your site.
MQ does not report into the z/OS health checker – which means that there are two primary sources of information about the overall health of the queue manager the JES logs and the statistical data (SMF115). Often by the time there is a message in the JES log, the problem is quite critical and could have been prevented.
The combination of the statistics and the JES log information gives a picture of the general health of the queue manager – neither by itself is complete. The cost of producing the statistical data is negligible, as the information is always collected – the only cost is to write out the 2-4 records per interval.
- Objection #3 – Why do you ask for a week of statistics, when the other subsystems are just asking for two one hour time slots?
Because of the asynchronous nature of MQ itself, having data from a longer period of time can help avoid problems. Depending on the processing model used by the applications the MQ peak processing time may be slightly different from batch jobs or other subsystems peak periods. We have experienced situations where when the MQ data was limited to an hour or so the recommendations made actually negatively impacted the queue managers because the MQ peak time was different.
- Objection # 4 – The cost of collecting the MQ Task Accounting data is too high, we’ve heard it is 10% overhead.Yes this can be costly, because the records are big and may be cut frequently. However, the task accounting data is invaluable to seeing how MQ is actually used, which queues are in use, and for finding performance problems in the application code itself. While I never recommend they be captured all the time, I do recommenced that they be gathered regularly and examined. This is to help spot issues, do proper capacity planning, and evaluate queue placement. Finally, when you do have a problem, being familiar with these records will make it easier to spot problems – it is hard to figure out what is abnormal, when you do not know what normal looks like.
The 10% figure is a myth, it was a ‘gut feel’ early on in MQ V6, never verified with testing. Reports from numerous customers show that collecting the MQ task accounting data (aka the class 3 data) does have a 3-7% overhead (depending on the method used to collect the data, how many long running transactions there are, and how busy the queue managers are).
In addition, if this is an independent MQ for z/OS health check you can control the timing of the record production a bit to limit the impact. What I have often suggested is:
A) Use the queue manager STATIME attribute to reduce the MQ SMF interval to 2 minutes, well prior to the typical peak time (if your SMF interval is 30 minutes, issue the +cpt SET SYSTEM STATIME(2) command no later than 35 minutes prior to the collection period.
B) Just prior to the peak time, use the START TRACE command, +cpf START TRACE(A) CLASS(*)
C) Allow that to run for 6 minutes.
D) Issue the STOP TRACE command, +cpf STOP TRACE(A) CLASS(*)
E) Issue the SET SYSTEM STATIME(??) command to restore the MQ interval to what is was prior to the data collection.
Based on a true story, the names have been omitted to protect the innocent, guilty, and the vaguely interested.
Anyone who has talked to me is aware of my interest in workload skewing that can be a natural part a shared queue environment. I have been involved in a number of situations, recommending mitigation techniques to a number of customers. But as with anything else, there is always more to learn about the causes, effects and interaction between the CF and MQ.
As a bit of background, some customers have observed what the development lab have termed “over notification” when using shared queues. Simply there are multiple application instances that have an MQGET with a wait active on a particular queue, when the Coupling Facility code detects a transition from empty to non-empty – a message has been put to a previously empty queue – and notifies MQ of the transition, MQ in turn notifies all the application instances that are waiting. All of the applications attempt to get messages, which if there is only one message means that one instance gets the message but there is CPU associated with all those other instances trying to get a message.
Over notification works to the advantage of some workloads, where the arrival of one message is typically the start of a deluge of traffic. This is particularly true when the queue is not likely to return to a zero depth for a long period of time, if the CF transition is never re-driven those ‘extra’ instances will never be notified that there are messages to be processed.
Reducing over notification can be an advantage when there are CPU constraints, there is less CPU used when there are fewer notifications going out and there is less contention between instances that can consume a limited number of messages. The MQ change team has developed a tuning option that helps control over notification.
It gets interesting when conditions change. In this particular instance, the customer had used the tuning option to control over notification in their production environment because one application was consuming a significant amount of CPU when the primary shared queue was basically being trickle fed most of the time.
However, when starting to test a different application, they did not implement the over notification tuning option. They initially could not drive a lot of workload, were running into CPU waits, and when examining the SMF data we found that they had a significant level of latching on type 24 latches – which is the primary symptom of churn on the ‘get wait’ chain often seen with the over notification. The tuning option was turned on, and greater throughput was achieved. We saw a significant reduction in the type 24 latching.
Additional capacity was added to the test environment to mimic their production environment and the situation changed. As more load was added, the LPAR that had the faster service times from the coupling facility was consuming 90% of the work, and we were not seeing activity on the second LPAR no matter how high the workload was driven, In addition, the inbound queue depths were building up, and messages were remaining on the queue for longer periods. At the same time there were a lot of application instances that did not appear to be active.
The development labs in Poughkeepsie and Hursley has a conversation about this and found that because of the suppression of notification on the part of MQ and the lack of the coupling facility transition (the queue never went back to 0 under the constant load) notification was causing the bottleneck. Turning off the tuning option reduce the skewing to a more expected 60/40 split based on the difference in service times, at the cost of some CPU because the reintroduced over notification was creating some of the original get-waiting churn and contention for the messages.
We anticipate that the over notification ‘churn’ can be controlled in the test environment when a disciplined study of the number of active application instances can be done. Perhaps that will help them reach equilibrium between CPU consumption and message throughput.
In the past few months I have heard of a few incidents caused by the sudden over use of a bufferpool and pageset that resulted in some significant slowdowns, application outages, or queue manager shutdowns. The incidents are similar in that there was an unexpected surge in messages arriving on a queue, which ultimately caused the pageset to go into expansion an eventually to fill. The problem was that the bufferpool and pageset were also used for some system operational queues, like the SYSTEM.COMMAND.INPUT queue.
The SYSTEM.COMMAND.INPUT queue defaults to using the SYSVOLAT storage class, pageset 3. This pageset is also used by the REMOTE storage class, which is the default STGCLASS value for the SYSTEM.CLUSTER.TRANSMIT.QUEUE and for other transmission queues. While we have long recommended that the SYSTEM.CLUSTER.TRANSMIT.QUEUE be placed in a dedicated storage class, using a devoted bufferpool and pageset, that recommendation is often much easier to make than to implement. Especially before MQ V8.
The problem arises when the pageset fills and commands to the queue manager cannot be put on the command input queue. And it can get really awful from that point on.
We’ve also recently seen this when a customer used the REMOTE storage class for their dead letter queue. The queue manager had not had a problem for the 15 years of its production life, until there was a surge of messages placed on the DLQ because of an application problem. Again, the underlying pageset filled up and no commands could be issued.
The recommendation is simple, isolate the SYSTEM.COMMAND.INPUT and a couple of other control queues to a separate storage class that points to an unique pageset and bufferpool to avoid the potential for a problem with an application, a channel, or the cluster to impact the ability to control the queue managers.
Recommended queues for the new storage class:
First, we have two sessions with seats available in 2016.
Chicago – November 29 thru December 1
San Francisco – December 13 thru 15
Please note that while the workshop number is still IMQ08 so far, the labs are running against V 9 code and are gradually updating labs, etc. as we have discovered or uncovered some interesting changes in some places!
If you would like to register, please contact your IBM representative or Chris Griego (firstname.lastname@example.org), Shalawn King (email@example.com), my teammate Mitch (firstname.lastname@example.org) or me (email@example.com).
We are starting to plan our 2017 adventures, I will post as we begin scheduling.
For those of you in the Baltimore/Washington DC area, there is an IBM Technical University event being held at the Hilton. This covers many z-centric topics, including 3 sessions on MQ.
The site for the event is:
This is NOT a marketing event, but a technical event, you can see the complete agenda here:
There is a fee for the event, but if your company is sending more than one person discounts are available. Please contact me (firstname.lastname@example.org) if your company is sending more than one person and you need the correct discount code. Or contact your local IBM account representative for assistance.