Rerun jobs on same server in DQM Load Balancing configuration

Anh-Thai NguyenAnh-Thai Nguyen Team Automic Posts: 31 Explorer ✭✭
edited February 27 in Dollar Universe - English

Scenario

We currently use DUAS in a DQM load balancing configuration (1 node for scheduling, with logical queues that point to multiple remote physical queues that reside on execution nodes)
If a job runs on a Logical queue that points to 2 servers, the job could end up running on any of the two servers (round robin)
LogicalQueue 
-> PhysicalQueue1 (ExecutionNode1)
-> PhysicalQueue2 (ExecutionNode2)

When the job reaches its launch window, it can start running on any of the 2 servers.
When we try to rerun the job, it also can rerun on any of the 2 servers.

Is there a way to force the job to rerun on the same server?
Rerunning the job on a different server does not make sense. ie: rerun job at different step --> step 1 to x run on server 1, step x to z run on server 2?


Answer

Unfortunately it's not possible to assign a job to a specific node.

Jobs submitted to a logical queue will be dynamically distributed across its associated physical queues in order to balance machine workload.

So a job submitted to a logical queue will potentially run on any of the physical queues associated with this logical queue and consequently on any node (where the physical queue for this execution resides). Notwithstanding, by default, DQM explores local physical queues first.


Comments

  • Adm10 GroupeAdm10 Groupe Emerainville, FRPosts: 72 Journeyman ✭✭✭

    Scenario

    We currently use DUAS in a DQM load balancing configuration (1 node for scheduling, with logical queues that point to multiple remote physical queues that reside on execution nodes)
    If a job runs on a Logical queue that points to 2 servers, the job could end up running on any of the two servers (round robin)
    LogicalQueue 
    -> PhysicalQueue1 (ExecutionNode1)
    -> PhysicalQueue2 (ExecutionNode2)


    Hello,

    In your post, I could read that you were using DQM in load balancing.
    I would like to set up this solution for a client but this does not work.
    Automic support can not answer me yet
    Could you tell me how you configured your DUAS?
    Are there any prerequisites for this use?

    Thank you for your help.

    Regards,

    Pierre-Eric
    DUAS 6.7.41
    UVMS  6.7.41
    UWC 6.7.41
    Windows 2012 Server R2 Enterprise
  • Hoai Truong-Thi-ThuHoai Truong-Thi-Thu Team Automic Posts: 67 Journeyman ✭✭✭
    @Adm10 Groupe

    Could you please give some more detail on what does not work?
  • Adm10 GroupeAdm10 Groupe Emerainville, FRPosts: 72 Journeyman ✭✭✭
    Hello,

    We have the "node1" which is used for planning and also as execution node and the "node2" that serves as execution node.

    When launching executions, on node1 no worries on the node2, the uproc breaks with in the joblog:

    Can not find launch: owls_api_fla_read returns 600
    Can not update launch: o_upd_sap_launch returns 600

    abort

    I do not know how to interpret this error message.

    Thank you for your help.

    Regards,
    Pierre-Eric
    DUAS 6.7.41
    UVMS  6.7.41
    UWC 6.7.41
    Windows 2012 Server R2 Enterprise
  • Adrian Fresno menendezAdrian Fresno menendez Team Automic Posts: 78 Journeyman ✭✭✭
    Hello, the errors o_upd_sap_launch  make me think that you are trying to launch sap jobs via a logical queue where the manager SAP is possibly configured just on one node.

    Can you check that?
  • Adrian Fresno menendezAdrian Fresno menendez Team Automic Posts: 78 Journeyman ✭✭✭
    We have been able to reproduce this issue on our environment and escalate to R&D.
    There is indeed a problem when launching uprocs to a Logical Queue of type SAP ( either SAP_XBP2 or CL_INT launching the command uxstr sap api ...).

    When the job is about to be launched on the remote physical queue, the following errors appear on the job log:
    Can not find launch: owls_api_fla_read returns 600
    Can not update launch: o_upd_sap_launch returns 600
     
    The reference of the problem is pb113132, you will be notified as soon as there is a correction for it.
  • Adm10 GroupeAdm10 Groupe Emerainville, FRPosts: 72 Journeyman ✭✭✭
    Hello,

    Thank you for your answer.

    Regards.

    Pierre-Eric
    DUAS 6.7.41
    UVMS  6.7.41
    UWC 6.7.41
    Windows 2012 Server R2 Enterprise
Sign In or Register to comment.