4.健全 | 4. Robustness

4鲁棒性

信使例子中有几件事情是错误的A Larger Example。例如,如果用户登录的节点在没有注销的情况下关闭,则用户将保留在服务器中User_List,但客户端消失。这使得用户无法再次登录,因为服务器认为用户已经登录。

或者,如果服务器在发送消息的过程中发生故障,会将发送方客户端永远挂在该await_result函数中会发生什么?

4.1超时

在改进信使程序之前,让我们看看一些通用原则,以乒乓程序为例。回想一下,当“ping”完成时,它通过将原子finished作为消息发送给“pong”来告诉“pong”,这样“pong”也可以完成。让“pong”完成的另一种方法是,如果在某个时间内没有收到来自ping的消息,则使“pong”退出。这可以通过添加超时来完成pong,如以下示例所示:

-module(tut19). -export([start_ping/1, start_pong/0, ping/2, pong/0]). ping(0, Pong_Node) -> io:format("ping finished~n", [] ping(N, Pong_Node) -> {pong, Pong_Node} ! {ping, self()}, receive pong -> io:format("Ping received pong~n", []) end, ping(N - 1, Pong_Node). pong() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong() after 5000 -> io:format("Pong timed out~n", []) end. start_pong() -> register(pong, spawn(tut19, pong, [])). start_ping(Pong_Node) -> spawn(tut19, ping, [3, Pong_Node]).

在编译完成并将文件tut19.beam复制到必要的目录后,在(pong @ kosken)上显示以下内容:

(pong@kosken)1> tut19:start_pong(). true Pong received ping Pong received ping Pong received ping Pong timed out

以下是在(ping @ gollum)上看到的:

(ping@gollum)1> tut19:start_ping(pong@kosken). <0.36.0> Ping received pong Ping received pong Ping received pong ping finished

超时设置为:

pong() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong() after 5000 -> io:format("Pong timed out~n", []) end.

超时(after 5000)在receive输入时开始。如果{ping,Ping_PID}收到超时,超时将被取消。如果{ping,Ping_PID}未收到,超时后的操作在5000毫秒后完成。after必须在最后receive,即在之前的所有其他消息接收规范中receive。也可以调用一个为超时返回一个整数的函数:

after pong_timeout() ->

一般来说,比使用超时监控分布式Erlang系统的部分更好。例如,如果您希望在特定时间内发送来自某个外部系统的消息,则超时通常适用于监控外部事件。例如,超时可以用来将用户从信使系统中注销,如果他们还没有访问它,例如十分钟。

4.2错误处理

在详细介绍Erlang系统中的监督和错误处理之前,让我们看看Erlang进程是如何终止的,或者以Erlang术语退出

执行exit(normal)或简单耗尽事情的过程具有正常退出。

遇到运行时错误(例如,除以零,错误匹配,试图调用不存在的函数等等)的进程退出并出现错误,即出现异常。其执行的方法exit(Reason),其中Reason是除了原子的任何Erlang项normal,还具有异常退出。

Erlang进程可以建立到其他Erlang进程的链接。如果一个进程调用link(Other_Pid)它,它会在它自己和被调用的进程之间建立一个双向链接Other_Pid。当一个进程终止时,它会向它所链接的所有进程发送一个叫做信号的东西。

该信号携带有关它被发送的PID和退出原因的信息。

接收正常退出的进程的默认行为是忽略该信号。

上面两种情况下(即异常退出)的默认行为是:

  • 绕过所有消息到接收进程。

  • 切断接收过程。

  • 将相同的错误信号传播到终止进程的链接。

通过这种方式,您可以使用链接将事务中的所有进程连接在一起。如果其中一个进程异常退出,则会终止事务中的所有进程。因为它通常是想创建一个流程并同时链接到它,所以有一个特殊的BIF,spawn_link它的功能与此相同spawn,但也创建了一个与生成的流程的链接。

下面是一个使用链接终止“pong”的乒乓球示例:

-module(tut20). -export([start/1, ping/2, pong/0]). ping(N, Pong_Pid) -> link(Pong_Pid), ping1(N, Pong_Pid). ping1(0, _) -> exit(ping ping1(N, Pong_Pid) -> Pong_Pid ! {ping, self()}, receive pong -> io:format("Ping received pong~n", []) end, ping1(N - 1, Pong_Pid). pong() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong() end. start(Ping_Node) -> PongPID = spawn(tut20, pong, []), spawn(Ping_Node, tut20, ping, [3, PongPID]).

(s1@bill)3> tut20:start(s2@kosken). Pong received ping <3820.41.0> Ping received pong Pong received ping Ping received pong Pong received ping Ping received pong

这是对乒乓程序的轻微修改,其中两个进程都来自相同的start/1函数,并且“ping”进程可以在单独的节点上生成。注意使用linkBIF。exit(ping)当它结束时“Ping”呼叫,并且这导致退出信号被发送到“pong”,其也终止。

可以修改进程的默认行为,以便在接收到异常退出信号时不会被终止。相反,所有信号都将转换为格式的正常消息,{'EXIT',FromPID,Reason}并添加到接收过程的消息队列末尾。此行为由以下设置:

process_flag(trap_exit, true)

还有其他几个进程标志,请参阅erlang(3)。以这种方式更改过程的默认行为通常不是在标准用户程序中完成的,而是留给OTP中的监督程序。然而,乒乓球节目被修改以说明退出诱捕。

-module(tut21). -export([start/1, ping/2, pong/0]). ping(N, Pong_Pid) -> link(Pong_Pid), ping1(N, Pong_Pid). ping1(0, _) -> exit(ping ping1(N, Pong_Pid) -> Pong_Pid ! {ping, self()}, receive pong -> io:format("Ping received pong~n", []) end, ping1(N - 1, Pong_Pid). pong() -> process_flag(trap_exit, true), pong1(). pong1() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong1( {'EXIT', From, Reason} -> io:format("pong exiting, got ~p~n", [{'EXIT', From, Reason}]) end. start(Ping_Node) -> PongPID = spawn(tut21, pong, []), spawn(Ping_Node, tut21, ping, [3, PongPID]).

(s1@bill)1> tut21:start(s2@gollum). <3820.39.0> Pong received ping Ping received pong Pong received ping Ping received pong Pong received ping Ping received pong pong exiting, got {'EXIT',<3820.39.0>,ping}

4.3增加鲁棒性的更大示例

让我们回到messenger程序并添加更改以使其更稳健:

%%% Message passing utility. %%% User interface: %%% login(Name) %%% One user at a time can log in from each Erlang node in the %%% system messenger: and choose a suitable Name. If the Name %%% is already logged in at another node or if someone else is %%% already logged in at the same node, login will be rejected %%% with a suitable error message. %%% logoff() %%% Logs off anybody at that node %%% message(ToName, Message) %%% sends Message to ToName. Error messages if the user of this %%% function is not logged on or if ToName is not logged on at %%% any node. %%% %%% One node in the network of Erlang nodes runs a server which maintains %%% data about the logged on users. The server is registered as "messenger" %%% Each node where there is a user logged on runs a client process registered %%% as "mess_client" %%% %%% Protocol between the client processes and the server %%% ---------------------------------------------------- %%% %%% To server: {ClientPid, logon, UserName} %%% Reply {messenger, stop, user_exists_at_other_node} stops the client %%% Reply {messenger, logged_on} logon was successful %%% %%% When the client terminates for some reason %%% To server: {'EXIT', ClientPid, Reason} %%% %%% To server: {ClientPid, message_to, ToName, Message} send a message %%% Reply: {messenger, stop, you_are_not_logged_on} stops the client %%% Reply: {messenger, receiver_not_found} no user with this name logged on %%% Reply: {messenger, sent} Message has been sent (but no guarantee) %%% %%% To client: {message_from, Name, Message}, %%% %%% Protocol between the "commands" and the client %%% ---------------------------------------------- %%% %%% Started: messenger:client(Server_Node, Name) %%% To client: logoff %%% To client: {message_to, ToName, Message} %%% %%% Configuration: change the server_node() function to return the %%% name of the node where the messenger server runs -module(messenger). -export([start_server/0, server/0, logon/1, logoff/0, message/2, client/2]). %%% Change the function below to return the name of the node where the %%% messenger server runs server_node() -> messenger@super. %%% This is the server process for the "messenger" %%% the user list has the format [{ClientPid1, Name1},{ClientPid22, Name2},...] server() -> process_flag(trap_exit, true), server([]). server(User_List) -> receive {From, logon, Name} -> New_User_List = server_logon(From, Name, User_List), server(New_User_List {'EXIT', From, _} -> New_User_List = server_logoff(From, User_List), server(New_User_List {From, message_to, To, Message} -> server_transfer(From, To, Message, User_List), io:format("list is now: ~p~n", [User_List]), server(User_List) end. %%% Start the server start_server() -> register(messenger, spawn(messenger, server, [])). %%% Server adds a new user to the user list server_logon(From, Name, User_List) -> %% check if logged on anywhere else case lists:keymember(Name, 2, User_List) of true -> From ! {messenger, stop, user_exists_at_other_node}, %reject logon User_List; false -> From ! {messenger, logged_on}, link(From), [{From, Name} | User_List] %add user to the list end. %%% Server deletes a user from the user list server_logoff(From, User_List) -> lists:keydelete(From, 1, User_List). %%% Server transfers a message between user server_transfer(From, To, Message, User_List) -> %% check that the user is logged on and who he is case lists:keysearch(From, 1, User_List) of false -> From ! {messenger, stop, you_are_not_logged_on}; {value, {_, Name}} -> server_transfer(From, Name, To, Message, User_List) end. %%% If the user exists, send the message server_transfer(From, Name, To, Message, User_List) -> %% Find the receiver and send the message case lists:keysearch(To, 2, User_List) of false -> From ! {messenger, receiver_not_found}; {value, {ToPid, To}} -> ToPid ! {message_from, Name, Message}, From ! {messenger, sent} end. %%% User Commands logon(Name) -> case whereis(mess_client) of undefined -> register(mess_client, spawn(messenger, client, [server_node(), Name]) _ -> already_logged_on end. logoff() -> mess_client ! logoff. message(ToName, Message) -> case whereis(mess_client) of % Test if the client is running undefined -> not_logged_on; _ -> mess_client ! {message_to, ToName, Message}, ok end. %%% The client process which runs on each user node client(Server_Node, Name) -> {messenger, Server_Node} ! {self(), logon, Name}, await_result(), client(Server_Node). client(Server_Node) -> receive logoff -> exit(normal {message_to, ToName, Message} -> {messenger, Server_Node} ! {self(), message_to, ToName, Message}, await_result( {message_from, FromName, Message} -> io:format("Message from ~p: ~p~n", [FromName, Message]) end, client(Server_Node). %%% wait for a response from the server await_result() -> receive {messenger, stop, Why} -> % Stop the client io:format("~p~n", [Why]), exit(normal {messenger, What} -> % Normal response io:format("~p~n", [What]) after 5000 -> io:format("No response from server~n", []), exit(timeout) end.

增加了下列更改:

信使服务器陷阱退出。如果它收到退出信号,{'EXIT',From,Reason}这意味着客户端进程已终止或无法访问,原因如下:

  • 用户已注销(“注销”消息被删除)。

  • 到客户端的网络连接中断。

  • 客户端进程所在的节点已经关闭。

  • 客户端进程进行了一些非法操作。

如果收到如上所述的退出信号,则使用该功能{From,Name}从服务器删除元组。如果服务器运行的节点出现故障,则会向所有客户端进程发送退出信号(由系统自动生成):导致所有客户端进程终止。User_Listserver_logoff{'EXIT',MessengerPID,noconnection}

另外,该await_result功能中引入了五秒的超时。也就是说,如果服务器在五秒内(5000毫秒)内没有回复,则客户端终止。这仅在客户端和服务器链接之前的登录顺序中需要。

一个有趣的例子是,如果客户端在服务器链接到它之前终止。由于链接到一个不存在的进程会导致一个退出信号{'EXIT',From,noproc}被自动生成,因此这被处理。这就好像该过程在链接操作之后立即终止。